This post introduces concepts and principles used to design DynamoDB tables as I have been learning how to leverage this technology in a new project.
This introduction will compare and contrast the core concepts and ideas with those found in relational database management systems (RDBMS).
Key takeaways
While using DynamoDB the last couple of months, one of the big takeaways is that data modeling is a vital part and that the modeling process itself looks structurally similar to relational database design. The high-level process still looks like this:
- Define how entities relate to each other (e.g. ERDs).
- Determine your data access patterns based on business requirements. In the RDBMS world this will help you decide how normalized your relational schema should be; in DynamoDB world you have different levers but the access patterns drive the DynamoDB table design significantly.
- Design your primary key and secondary indices based off of your data access pattern needs. Specifics documented via examples below.
However, some relational database concepts or ideas will hinder your thought process, specifically:
- one table for one entity and/or relationship
- normalization
JOIN
across tables (we implicitly "join" within the same DynamoDB table across partitions)
Terminology
DynamoDB Concept | RDBMS Analog | Description |
---|---|---|
Table | set of related tables | DynamoDB tables less rigidly defined |
Partition | table | Multiple entities can be modeled inside the same DDB table |
Item | record | Key-Value pairs describing a data value |
Primary Key | primary key | Uniquely identify each item in a DynamoDB table |
Attribute | column | Attributes are more flexible and differ across items |
DynamoDB API
The API consists of operations on:
- items (requires you to identify the full primary key), includes batch operations
- queries
- scans (like a full table scan, which you want to avoid)
DynamoDB Primary Keys
You can have two kinds of primary keys:
- Simple primary key (partition key)
- Composite primary key (partition key + sort key)
The partition key is used to disperse data across shards. Items with the same partition key reside in the same partition (some developers may be use the analogy of a "shard" for their initial intuitions).
Sort keys are used to create ranges of items within a partition.
So far we have been modeling complex enough data domains such that composite primary keys are essential for our query access model.
Secondary Indices
Retrieving items using a DynamoDB table's primary key is the most efficient way, but at times we will still need to support a query pattern that isn't supported by the primary key. This is where, to avoid inefficient scan operations, we are able to use secondary indices.
There are two types of secondary indices:
- Global Secondary Index (GSI)
- can be used to provide different partition and sort keys.
- Local Secondary Index (LSI)
- used with a composite primary key where the partition key is the same but the sort key is different.
Our DynamoDB tables typically use LSIs and a couple use GSIs.
Data Modeling for DynamoDB
There are a few explicit steps that we borrow from our prior experience of data modeling in the relational world:
- identify entities
- identify attributes
- identify access patterns
- identify self-describing mechanism for primary keys
- identify secondary indices based on access patterns as input
Defining naming conventions
It doesn't matter if your fields are Capitalized
, snake_cased
, kebab-cased
, camelCased
, or PascalCased
just pick something and make it consistent. The naming convention should be defined for:
- Table names
- Primary Key names
- Attribute names
Just keep it simple and make sure it is applied consistently across your services (that one team will work on).
It seems silly to do first, but it will save a lot of irritation or rework later.
Entities
Think about the nouns in your domains and then the relationships
between them. This provides a good starting point for identifying your
entities. This is basically the identical process to identifying
entities for relational database design. The one difference is that we
will typically model many entities in one DynamoDB table together
whereas a table in the relational database way of thinking serves a
fundamentally different aim given a fundamental primitive of
relational database is the JOIN
which is absent in DynamoDB.
Attributes
Attributes are usually easily identified after finding your entities. This is identical to identifying the columns for relational schema design before normalizing data.
Access patterns
The way you identify access patterns to design a DynamoDB table well (for your current needs) is much like you would identify access patterns in the RDBMS world: by understanding the business requirements of the software. Second/third systems have the benefit of having clarity of access and usage patterns, but the risk in migrating data from one datastore to another is that the migrations aren't particularly simple, not to mention cutting over 24/7 systems over without downtime or data loss.
When reading user stories or technical stories it is possible to infer how data would need to be queried.
Self-describing keys
Defining partition and sort keys that are self-describing based on a scheme of our entity model allows us to encode consistent querying and writing of items to provide a higher-level domain-oriented API on top of the lower-level DynamoDB API.
Data Access Patterns in DynamoDB
A key design decision is choosing effective partition and sort keys for your DynamoDB tables based on your access patterns.
Common Use Cases
- Query by a single attribute
- Use that attribute as the partition key. For example, if querying products by ProductID, make ProductID the partition key.
- Query across multiple attributes
- Use a composite key with one attribute as the partition key and another as the sort key. For example, for querying ForumPosts by ForumName and PostDate, make ForumName the partition key and PostDate the sort key.
- Query by date range
- Use a timestamp as the sort key to allow querying date ranges efficiently. For example, with ForumPosts, PostDate (timestamp) could be the sort key to query posts within a date range.
- Query by hierarchy
- Model hierarchical data by using concatenated attributes as partition or sort keys. For example, categorize Products by Category/SubCategory as the partition key: "Electronics/Laptops".
Considerations for Partition and Sort Key Schemes
- Ensure writes are distributed evenly by using high-cardinality attributes as the partition key. For example, a UUID or other randomized GUID vs a user chosen username.
- Sort keys can model one-to-many relationships within a partition. For example, a User partition key with Posts as the sort key.
- You can add secondary indexes later if needed, but partition and sort keys are set at table creation though their scheme can evolve to match your new requirements, but migration to the new scheme might be required so versioning your schemes can help.
Choosing effective partition and sort keys based on access patterns is crucial for high performance. The keys themselves can encode relationships and hierarchy when designed thoughtfully.
Summary
There is a lot I didn't cover especially with respect to cost minimization. I will attempt to revisit that topic and recommended practices based on our experiences as we learn more.
If you enjoyed this content, please consider sharing this link with a friend, following my GitHub, Twitter/X or LinkedIn accounts, or subscribing to my RSS feed.