DynamoDB: The Basics [software] :: product software …

This post introduces concepts and principles used to design DynamoDB tables as I have been learning how to leverage this technology in a new project.

This introduction will compare and contrast the core concepts and ideas with those found in relational database management systems (RDBMS).

Key takeaways

While using DynamoDB the last couple of months, one of the big takeaways is that data modeling is a vital part and that the modeling process itself looks structurally similar to relational database design. The high-level process still looks like this:

Define how entities relate to each other (e.g. ERDs).
Determine your data access patterns based on business requirements. In the RDBMS world this will help you decide how normalized your relational schema should be; in DynamoDB world you have different levers but the access patterns drive the DynamoDB table design significantly.
Design your primary key and secondary indices based off of your data access pattern needs. Specifics documented via examples below.

However, some relational database concepts or ideas will hinder your thought process, specifically:

one table for one entity and/or relationship
normalization
JOIN across tables (we implicitly "join" within the same DynamoDB table across partitions)

Terminology

DynamoDB Concept	RDBMS Analog	Description
Table	set of related tables	DynamoDB tables less rigidly defined
Partition	table	Multiple entities can be modeled inside the same DDB table
Item	record	Key-Value pairs describing a data value
Primary Key	primary key	Uniquely identify each item in a DynamoDB table
Attribute	column	Attributes are more flexible and differ across items

DynamoDB API

The API consists of operations on:

items (requires you to identify the full primary key), includes batch operations
queries
scans (like a full table scan, which you want to avoid)

DynamoDB Primary Keys

You can have two kinds of primary keys:

Simple primary key (partition key)
Composite primary key (partition key + sort key)

The partition key is used to disperse data across shards. Items with the same partition key reside in the same partition (some developers may be use the analogy of a "shard" for their initial intuitions).

Sort keys are used to create ranges of items within a partition.

So far we have been modeling complex enough data domains such that composite primary keys are essential for our query access model.

Secondary Indices

Retrieving items using a DynamoDB table's primary key is the most efficient way, but at times we will still need to support a query pattern that isn't supported by the primary key. This is where, to avoid inefficient scan operations, we are able to use secondary indices.

There are two types of secondary indices:

Global Secondary Index (GSI): can be used to provide different partition and sort keys.
Local Secondary Index (LSI): used with a composite primary key where the partition key is the same but the sort key is different.

Our DynamoDB tables typically use LSIs and a couple use GSIs.

Data Modeling for DynamoDB

There are a few explicit steps that we borrow from our prior experience of data modeling in the relational world:

identify entities
identify attributes
identify access patterns
identify self-describing mechanism for primary keys
identify secondary indices based on access patterns as input

Defining naming conventions

It doesn't matter if your fields are Capitalized, snake_cased, kebab-cased, camelCased, or PascalCased just pick something and make it consistent. The naming convention should be defined for:

Table names
Primary Key names
Attribute names

Just keep it simple and make sure it is applied consistently across your services (that one team will work on).

It seems silly to do first, but it will save a lot of irritation or rework later.

Entities

Think about the nouns in your domains and then the relationships between them. This provides a good starting point for identifying your entities. This is basically the identical process to identifying entities for relational database design. The one difference is that we will typically model many entities in one DynamoDB table together whereas a table in the relational database way of thinking serves a fundamentally different aim given a fundamental primitive of relational database is the JOIN which is absent in DynamoDB.

Attributes

Attributes are usually easily identified after finding your entities. This is identical to identifying the columns for relational schema design before normalizing data.

Access patterns

The way you identify access patterns to design a DynamoDB table well (for your current needs) is much like you would identify access patterns in the RDBMS world: by understanding the business requirements of the software. Second/third systems have the benefit of having clarity of access and usage patterns, but the risk in migrating data from one datastore to another is that the migrations aren't particularly simple, not to mention cutting over 24/7 systems over without downtime or data loss.

When reading user stories or technical stories it is possible to infer how data would need to be queried.

Self-describing keys

Defining partition and sort keys that are self-describing based on a scheme of our entity model allows us to encode consistent querying and writing of items to provide a higher-level domain-oriented API on top of the lower-level DynamoDB API.

Data Access Patterns in DynamoDB

A key design decision is choosing effective partition and sort keys for your DynamoDB tables based on your access patterns.

Common Use Cases

Query by a single attribute: Use that attribute as the partition key. For example, if querying products by ProductID, make ProductID the partition key.
Query across multiple attributes: Use a composite key with one attribute as the partition key and another as the sort key. For example, for querying ForumPosts by ForumName and PostDate, make ForumName the partition key and PostDate the sort key.
Query by date range: Use a timestamp as the sort key to allow querying date ranges efficiently. For example, with ForumPosts, PostDate (timestamp) could be the sort key to query posts within a date range.
Query by hierarchy: Model hierarchical data by using concatenated attributes as partition or sort keys. For example, categorize Products by Category/SubCategory as the partition key: "Electronics/Laptops".

Considerations for Partition and Sort Key Schemes

Ensure writes are distributed evenly by using high-cardinality attributes as the partition key. For example, a UUID or other randomized GUID vs a user chosen username.
Sort keys can model one-to-many relationships within a partition. For example, a User partition key with Posts as the sort key.
You can add secondary indexes later if needed, but partition and sort keys are set at table creation though their scheme can evolve to match your new requirements, but migration to the new scheme might be required so versioning your schemes can help.

Choosing effective partition and sort keys based on access patterns is crucial for high performance. The keys themselves can encode relationships and hierarchy when designed thoughtfully.

Summary

There is a lot I didn't cover especially with respect to cost minimization. I will attempt to revisit that topic and recommended practices based on our experiences as we learn more.

Frequently Asked Questions

What is DynamoDB?

DynamoDB is a fully managed NoSQL database service provided by Amazon Web Services (AWS). It provides fast performance and scalability for applications that need a flexible, non-relational database.

How does DynamoDB differ from relational databases?

DynamoDB differs from relational databases in a few key ways:

It is document-oriented and stores data in JSON format, not tables and rows.
It uses primary keys, not SQL joins, to connect data.
It auto-scales storage and throughput capacity on demand.
It provides built-in security, backup, and in-memory caching.

What are the core components of DynamoDB?

The core components of DynamoDB include:

Tables - A collection of data records
Items - A data record in a table
Attributes - Columns or data fields within an item
Primary Key - Unique identifier for an item in a table
Secondary Indexes - Alternative query options to the primary key

How to query by a single attribute in DynamoDB?

To query by a single attribute in DynamoDB, use that attribute as the table’s partition key. The partition key uniquely identifies each item, so querying on the partition key attribute will allow fetching items efficiently.

How to query across multiple attributes in DynamoDB?

To query across multiple attributes in DynamoDB, use a composite primary key with one attribute as the partition key and another as the sort key. The partition key groups items together and the sort key creates an ordered range within each partition.

How to query by date range in DynamoDB?

To query by date range efficiently in DynamoDB, use a timestamp attribute as the sort key. This allows querying a date range by specifying ‘between’ conditions on the sort key value.

How to query by hierarchy in DynamoDB?

To model hierarchical data in DynamoDB, concatenate attributes together to construct the partition key or sort key values. For example, ‘Electronics|Laptops|Dell’ to represent nested product categories.

If you enjoyed this content, please consider sharing via social media, following my accounts, or subscribing to the RSS feed.