본문 바로가기

ChatGPT/AWS Serverless

[Serverless][DynamoDB] Time Series Statistics Manager

반응형

Full code:

https://github.com/gboysking/dynamodb-time-series-manager

 

GitHub - gboysking/dynamodb-time-series-manager: Managing Time Series Statistics with TimeSeriesStatisticsManager and AWS Dynamo

Managing Time Series Statistics with TimeSeriesStatisticsManager and AWS DynamoDB - GitHub - gboysking/dynamodb-time-series-manager: Managing Time Series Statistics with TimeSeriesStatisticsManager...

github.com

This blog post will guide you through the process of implementing a Time Series Statistics Manager using AWS DynamoDB in Node.js. We will discuss the benefits of this approach and how you can utilize it for your projects.

Table of Contents

  1. Introduction
  2. Why Use DynamoDB for Time Series Data?
  3. TimeSeriesStatisticsManager: A High-level Overview
  4. Implementing TimeSeriesStatisticsManager
  5. Components of TimeSeriesStatisticsManager
  6. Usage Examples
  7. Conclusion

Introduction

Time series data is a sequence of data points collected or recorded over time, usually at regular intervals. This type of data is prevalent in various domains, including finance, health, IoT, and many others. Efficient management of time series data is crucial for processing, analyzing, and interpreting the data for various purposes.

In this blog post, we will discuss a custom JavaScript module called 'TimeSeriesStatisticsManager' that simplifies the management of time series data. This module allows you to store, update, and retrieve time series statistics with ease.

Why Use DynamoDB for Time Series Data?

DynamoDB is a highly scalable, fully managed NoSQL database service provided by AWS. It offers low latency, high throughput, and seamless integration with other AWS services. These features make it an excellent choice for managing time series data, which often requires fast and efficient read and write operations.

Some benefits of using DynamoDB for time series data include:

  • Scalability: DynamoDB can handle large amounts of data and can be easily scaled as your application grows.
  • Performance: The service provides low latency and high throughput, making it ideal for time series data that requires quick read and write operations.
  • Durability: DynamoDB ensures that your data is durable and highly available, with automatic backups and support for multi-region replication.
  • Flexibility: With DynamoDB's flexible schema design, you can easily store and manage time series data with various attributes and time partitions.
  • Integration: As part of the AWS ecosystem, DynamoDB integrates seamlessly with other AWS services, making it simple to manage and analyze your time series data.

TimeSeriesStatisticsManager: A High-level Overview

The 'TimeSeriesStatisticsManager' is a flexible and extensible module designed to handle time series statistics efficiently. It provides various features to store, update, and retrieve time series data across different time partitions, such as minutes, hours, days, months, and years.

The module is implemented in JavaScript and is compatible with AWS DynamoDB, enabling seamless integration with your existing AWS infrastructure. It provides a simple API for interacting with the time series data, making it easy to add new data points or retrieve existing data for analysis and visualization.

Some key features of the 'TimeSeriesStatisticsManager' include:

  • Easy management of time series data across multiple time partitions
  • Seamless integration with AWS DynamoDB
  • Straightforward API for storing, updating, and retrieving time series data
  • Flexible and extensible design for adapting to various use cases

Implementing TimeSeriesStatisticsManager

The 'TimeSeriesStatisticsManager' is a TypeScript class that simplifies the management of time series data in DynamoDB. It takes care of creating and updating the necessary tables, as well as providing methods for adding and retrieving statistics.

Here are the important function contents:

  1. Constructor: The constructor initializes the 'TimeSeriesStatisticsManager' instance, setting up the DynamoDB client, table name, and time partitions. It also creates the DynamoDB table if it does not already exist.
constructor(options: TimeSeriesStatisticsManagerOptions) {
    if (options.client) {
        this.client = DynamoDBDocument.from(options.client);
    } else {
        this.client = DynamoDBDocument.from(new DynamoDBClient({}));
    }

    if (options.timePartitions) {
        this.timePartitions = [...options.timePartitions];
    } else {
        this.timePartitions = [...timePartitions];
    }

    if (options.table) {
        this.table = options.table;
    } else {
        this.table = "statistics";
    }

    this.state = 'INITIALIZING';
    this.onReadyPromises = [];

    Promise.resolve()
        .then(() => {
            return this.createTableIfNotExists();
        })
        .then(() => {
            this.state = 'INITIALIZED';
            this.resolveReadyPromises();
        })
        .catch((error) => {
            this.state = "FAIL";
            this.rejectReadyPromises(error);
        });
}


2. onReady: This method returns a promise that resolves when the table is initialized and ready for use.

onReady(): Promise<void> {
    return new Promise((resolve, reject) => {
        if (this.state === 'INITIALIZED') {
            resolve();
        } else if (this.state === 'FAIL') {
            reject();
        } else {
            this.onReadyPromises.push(resolve);
        }
    });
}


3. createTableIfNotExists: This method checks if the table exists, and if not, creates it with the appropriate schema.

async createTableIfNotExists(): Promise<void> {
    try {
        await this.client.send(new DescribeTableCommand({ TableName: this.table }));
    } catch (error: any) {
        if (error.name === "ResourceNotFoundException") {
            const params = {
                AttributeDefinitions: [
                    { AttributeName: "topic_period", AttributeType: "S" },
                    { AttributeName: "time_partition", AttributeType: "N" }
                ],
                KeySchema: [
                    { AttributeName: "topic_period", KeyType: "HASH" },
                    { AttributeName: "time_partition", KeyType: "RANGE" }
                ],
                ProvisionedThroughput: {
                    ReadCapacityUnits: 5,
                    WriteCapacityUnits: 5
                },
                TableName: this.table
            };

            await this.client.send(new CreateTableCommand(params));

            // Wait until table is active
            await this.waitUntilTableExists();
        } else {
            console.error(
                "Error checking for the existence of the DynamoDB table:",
                error
            );
            throw error;
        }
    }
}


4. getTimePartition: A helper method to get the time partition object for a given period.

private getTimePartition(period: string): TimePartition {
    return this.timePartitions.find((tp) => tp.name === period);
}


5. addStatistic: This method adds a new statistic to the specified topic and timestamp. It updates the count for each time partition.

public async addStatistic(topic: string, timestamp: number): Promise<void> {
    await this.onReady();

    for (const partition of this.timePartitions) {
        const timePartitionValue = Math.floor(timestamp / partition.interval) * partition.interval;

        const updateParams = {
            TableName: this.table,
            Key: {
                topic_period: this.createTopicPeriod(topic, partition.name),
                time_partition: timePartitionValue
            },
            UpdateExpression: "ADD #count :incr",
            ExpressionAttributeNames: {
                "#count": "count"
            },
            ExpressionAttributeValues: {
                ":incr": 1
            },
            ReturnValues: "UPDATED_NEW"
        };

        try {
            const updateCommand = new UpdateCommand(updateParams);
            await this.client.send(updateCommand);
        } catch (error) {
            console.error(`Error updating ${partition.name} statistic:`, JSON.stringify(error, null, 2));
            throw error;
        }
    }
}


6. getStatisticsPeriod: This method retrieves statistics for a given topic and period between specified start and end times.

public async getStatisticsPeriod(topic: string, period: string, startTime: number, endTime: number): Promise<Statistic[]> {
    await this.onReady();

    const periodTopic = this.createTopicPeriod(topic, period);
    const partition = this.getTimePartition(period);
    startTime = Math.floor(startTime / partition.interval) * partition.interval;
    endTime = Math.floor(endTime / partition.interval) * partition.interval;

    const queryParams = {
        TableName: this.table,
        KeyConditionExpression: "topic_period = :topic_period  AND time_partition BETWEEN :startTime AND :endTime",
        ExpressionAttributeValues: {
            ":topic_period": periodTopic,
            ":startTime": startTime,
            ":endTime": endTime
        }
    };

    try {
        const response = await this.client.send(new QueryCommand(queryParams));
        if (response.Items) {
            return response.Items.map((item) => {
                const [topic, period] = item.topic_period.split("#");
                return {
                    topic,
                    period,
                    count: item.count,
                    time_partition: item.time_partition,
                };
            }) as Statistic[];
        } else {
            return [];
        }
    } catch (error) {
        console.error("Error getting statistic:", JSON.stringify(error, null, 2));
        throw error;
    }
}


7. getStatistics: This method retrieves statistics for a given topic between specified start and end times, across all time partitions.

public async getStatistics(topic: string, startTime: number, endTime: number): Promise<Statistic[]> {
    await this.onReady();

    const results = await Promise.all(this.timePartitions.map(async (partition) => {
        const result = await this.getStatisticsPeriod(topic, partition.name, startTime, endTime);
        return result;
    }));

    return results.flat();
}


To use the 'TimeSeriesStatisticsManager', simply import the class, create an instance with the desired configuration, and use the provided methods to manage your time series data in DynamoDB.

Components of TimeSeriesStatisticsManager

The 'TimeSeriesStatisticsManager' class consists of several key components that work together to manage time series data in DynamoDB. Here's a breakdown of these components:

  1. DynamoDBClient: The DynamoDB client is responsible for handling all communication with the DynamoDB service.
  1. TimePartitions: Time partitions define the different time intervals for aggregating statistics, such as minutes, hours, days, months, and years.
  1. Table: The table is the DynamoDB table that stores the time series data. The table schema includes a composite primary key consisting of the topic_period and time_partition attributes.
  1. State Management: The 'TimeSeriesStatisticsManager' class manages the state of the DynamoDB table, ensuring that it is created and initialized before any operations are performed.
  1. Methods: The 'TimeSeriesStatisticsManager' class provides several methods for managing time series data in DynamoDB, including adding statistics, retrieving statistics for a specific period, and retrieving statistics across all periods.

By leveraging these components, the 'TimeSeriesStatisticsManager' class simplifies the process of working with time series data in DynamoDB, allowing you to focus on your application's core functionality.

Usage Examples

Here are some examples of how to use the 'TimeSeriesStatisticsManager' class in your application:

Initializing the TimeSeriesStatisticsManager

import { TimeSeriesStatisticsManager } from "dynamodb-time-series-statistics";

const manager = new TimeSeriesStatisticsManager();

Adding a Statistic

const topic = "page_views";
const timestamp = Math.floor(Date.now() / 1000);

await manager.addStatistic(topic, timestamp);

Retrieving Statistics for a Specific Period

const topic = "page_views";
const period = "day";
const startTime = Math.floor(Date.now() / 1000) - 7 * 24 * 60 * 60;
const endTime = Math.floor(Date.now() / 1000);

const statistics = await manager.getStatisticsPeriod(topic, period, startTime, endTime);
console.log(statistics);

Retrieving Statistics Across All Periods

const topic = "page_views";
const startTime = Math.floor(Date.now() / 1000) - 7 * 24 * 60 * 60;
const endTime = Math.floor(Date.now() / 1000);

const statistics = await manager.getStatistics(topic, startTime, endTime);
console.log(statistics);


By using the 'TimeSeriesStatisticsManager' class, you can easily manage and retrieve time series data in your DynamoDB table.

 

Conclusion

In this article, we introduced the 'TimeSeriesStatisticsManager' class, a powerful and flexible solution for managing time series data in DynamoDB. By utilizing this class, you can easily store and retrieve time series data with various time granularities in a single DynamoDB table. We also covered the components, implementation, and usage examples of the 'TimeSeriesStatisticsManager'.

The 'TimeSeriesStatisticsManager'` simplifies the process of working with time series data in DynamoDB, making it a valuable tool for developers who need to manage and analyze time-based information in their applications.

This article was written with the help of ChatGPT.

반응형