DynamoDB DR Setup #
Solution Overview: #
CDC Data
- This option can be used when global table is not available
- DynamoDB has a native CDC feature called DynamoDB Streams
- This will hold the CDC data up to hours
- We will use the Lambda function to read the data from the DynamoDB streams.
- DDB streams will automatically trigger the lambda.
- Then the lambda will be reading the data from DDB streams and then replicating those events to the DR region DDB table.
Historical Data Export
- DynamoDB supports direct export to the S3 bucket.
- We will export the data into Native JSON format to S3.
- Then on the DR region, we’ll run an import job.
Monitoring
- Cloudwatch will be used to monitor the replication.
- In the lambda, we can see a metic called IteratorAge that will tell the last event’s timestamp that had been processed by Lambda.
Further Optimization:
- Lambda will be triggered as soon as possible from DynamoDB streams.
- But this will end up in more lambda executions. But the events will be processed in near real time.
- We can control this behavior by setting up the batch size or batch window to process the records from the streams.
Aurora DR setup: #
Solution Overview: #
- Aurora’s global database provides the capability to expand the cluster into multiple regions and manages the replication and failover out of the box.
- We can create the complete cluster on the DR region and that will replicate the data from the primary region.
- There is another cost-optimized way while setting up the Global cluster is, we can make the DR region cluster the Headless cluster.
- The headless cluster will only have the aurora’s storage without any writer or reader nodes.
- Headless clusters can be provisioned via the CLI tool only.
- During the disaster, we can add nodes into the headless cluster and start using the same cluster endpoint for the application connectivity.
Monitoring
- RDS cloudwatch metrics provide the aurora replica lag metric.
- Also, the minimum managed RPO is 20 seconds, but Aurora replicates within a few milliseconds.
- When Aurora reaches the lag of 20 seconds, then it’ll pause the writes on the primary until the lag gets reduced.