DynamoDB: Basics

October 30, 2019

Introduction

This post introduces concepts and principles used to design DynamoDB tables as I have been learning how to leverage this technology in a new project.

This introduction will compare and contrast the core concepts and ideas with those found in relational database management systems (RDBMS).

Key takeaways

While using DynamoDB the last couple of months, one of the big takeaways is that data modeling is a vital part and that the modeling process itself looks structurally similar to relational database design. The high-level process still looks like this:

  • Define how entities relate to each other (e.g. ERDs).

  • Determine your data access patterns based on business requirements. In the RDBMS world this will help you decide how normalized your relational schema should be; in DynamoDB world you have different levers but the access patterns drive the DynamoDB table design significantly.

  • Design your primary key and secondary indices based off of your data access pattern needs. Specifics documented via examples below.

However, some relational database concepts or ideas will hinder your thought process, specifically:

  • one table for one entity and/or relationship

  • normalization

  • JOIN across tables (we implicitly "join" within the same DynamoDB table across partitions)

Terminology

DynamoDB Concept RDBMS Analog Description
Table set of related tables DynamoDB tables less rigidly defined
Partition table Multiple entities can be modeled inside the same DDB table
Item record Key-Value pairs describing a data value
Primary Key primary key Uniquely identify each item in a DynamoDB table
Attribute column Attributes are more flexible and differ across items

DynamoDB API

The API consists of operations on:

  • items (requires you to identify the full primary key), includes batch operations

  • queries

  • scans (like a full table scan, which you want to avoid)

DynamoDB Primary Keys

You can have two kinds of primary keys:

  • Simple primary key (partition key)

  • Composite primary key (partition key + sort key)

The partition key is used to disperse data across shards. Items with the same partition key reside in the same partition (some developers may be use the analogy as of a "shard" for their initial intuitions).

Sort keys are used to create ranges of items within a partition.

So far we have been modeling complex enough data domains such that composite primary keys are essential for our query access model.

Secondary Indices

Retrieving items using a DynamoDB table's primary key is the most efficient way, but at times we will still need to support a query pattern that isn't supported by the primary key. This is where, to avoid inefficient scan operations, we are able to use secondary indices.

There are two types of secondary indices:

Global Secondary Index (GSI)

can be used to provide different partition and sort keys.

Local Secondary Index (LSI)

used with a composite primary key where the partition key is the same but the sort key is different.

Our DynamoDB tables typically use LSIs and a couple use GSIs.

Data Modeling for DynamoDB

There are a few explicit steps that we borrow from our prior experience of data modeling in the relational world:

  • identify entities

  • identify attributes

  • identify access patterns

  • identify self-describing mechanism for primary keys

  • identify secondary indices based on access patterns as input

    Defining naming conventions

It doesn't matter if your fields are Capitalized, snake_cased, kebab-cased, camelCased, or PascalCased just pick something and make it consistent. The naming convention should be defined for:

  • Table names

  • Primary Key names

  • Attribute names

Just keep it simple and make sure it is applied consistently across your services (that one team will work on).

It seems silly to do first, but it will save a lot of irritation or rework later.

Entities

Think about the nouns in your domains and then the relationships between them. This provides a good starting point for identifying your entities. This is basically the identical process to identifying entities for relational database design. The one difference is that we will typically model many entities in one DynamoDB table together whereas a table in the relational database way of thinking serves a fundamentally different aim given a fundamental primitive of relational database is the JOIN which is absent in DynamoDB.

Attributes

Attributes are usually easily identified after finding your entities. This is identical to identifying the columns for relational schema design before normalizing data.

Access patterns

The way you identify access patterns to design a DynamoDB table well (for your current needs) is much like you would identify access patterns in the RDBMS world: by understanding the business requirements of the software. Second/third systems have the benefit of having clarity of access and usage patterns, but the risk in migrating data from one datastore to another is that the migrations aren't particularly simple, not to mention cutting over 24/7 systems over without downtime or data loss.

When reading user stories or technical stories it is possible to infer how data would need to be queried.

Self-describing keys

Defining partition and sort keys that are self-describing based on a scheme of our entity model has allowed us to encode consistent querying and writing of items to provide a higher-level domain-oriented API on top of the lower-level DynamoDB API.

Summary

There is a lot I didn't cover especially with respect to cost minimization. I will attempt to revisit that topic and recommended practices based on our experiences as we learn more.