AWS

To setup the cross-region opensearch native snapshots, we need to have the following prerequisites.

  1. S3 bucket - The snapshot repository
  2. IAM Role - Assumed by ES service to manage the backups
  3. Bastion Server - To register the repo and run commands.

Security Group #

Make sure the OpenSearch’s security group should allow port 443 from the bastion server or the server that we use to run the commands.

S3 Bucket: #

Create an S3 bucket and name it as bhuvi-dr-es-backup

IAM Role: #

Create an elasticsearch IAM role in the name bhuvi-dr-OS-S3-role and use the following inline policy. This role will be used by Opensearch to interact with S3.

Trust Relationship:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": "es.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Inline Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::bhuvi-dr-es-backup"
            ]
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::bhuvi-dr-es-backup/*"
            ]
        }
    ]
}

Bastion Server IAM Role: #

We need to map the IAM role that we created in the above step, so for that we need to use an EC2 instance to run those commands. Its an one time work.

If we run a bastion server to manage all the commands, then attach the following permissions to the EC IAM Role.

This has the permissions for OpenSearch domain and the IAM role we created for Backup to S3. Under the resources, you can add all the Opensearch ARN.

IAM ROLE NAME: bhuvi-dr-os-admin-ec2

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::ACCOUNT_ID:role/bhuvi-dr-OS-S3-role"
        },
        {
            "Effect": "Allow",
            "Action": [
                "es:ESHttpPut",
                "es:ESHttpPost",
                "es:ESHttpDelete"
            ],
            "Resource": [
                "arn:aws:es:ap-south-1:ACCOUNT_ID:domain/*",
                "arn:aws:es:ap-south-2:ACCOUNT_ID:domain/*"
            ]
        }
    ]
}

Add Basedtion EC2 IAM role as an Admin User: #

We’ll be running some commands like adding role and other commands from the EC2 instance. So that Instance profile has to be added as the Admin user if we use Fine-grained access control.

  • Select the ES cluster → Security Control → Edit
  • Make sure you select the Set IAM ARN as master user(if the 2nd option is already selected, then change it the 1st one)
  • IAM ARN → Paste the ARN of the EC2 IAM role and save.

Add EC2 role as Admin

Map the snapshot manager role: #

The snapshot is going to managed by the Lambda function, we created an IAM role If we are using fine-grained access control, then we need to map the User/Role that is running the commands into the OpenSearch’s managed snapshot role.

We are communicating with Opensearch using awscurl python package, because we need to pass the access and secret keys for aws sigv4 authendication verification.

pip3 install awscurl

Now, run the following command to map the IAM role that we created to mange S3 backups.

Change URL with the ES URL

awscurl --region ap-south-1 --service es \
     -H "Content-Type: application/json" \
     -X PUT https://domain-1.ap-south-1.es.amazonaws.com/_opendistro/_security/api/rolesmapping/manage_snapshots \
     -d '{
          "backend_roles" : [ "arn:aws:iam::ACCOUNT_ID:role/bhuvi-snapshot-es-lambda" ]
         }'

Add Access policy if Fine-grained access control disabled: #

  • Select the ES cluster → Security Control → Edit
  • Under Access Policy, add the ARN of the EC2 IAM role.

Note: We’ll be add the lambda function’s IAM role into this Principal. So you can follow the same steps once the IAM role has been created.

OpenSearch Add Access Control

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::ACCOUNT_ID:role/bhuvi-dr-os-admin-ec2",
          "YOU'LL ADD THE LAMBDA IAM ROLE ONCE ITS CREATED"
          ]
      },
      "Action": [
        "es:ESHttpPut",
        "es:ESHttpPost",
        "es:ESHttpPatch",
        "es:ESHttpHead",
        "es:ESHttpGet"
      ],
      "Resource": [
        "arn:aws:es:ap-south-1:ACCOUNT_ID:domain/domain-1/*",
        "arn:aws:es:ap-south-1:ACCOUNT_ID:domain/domain-1"
      ]
    }
  ]
}

Create the snapshot repo: #

We need to create the S3 bucket as the repo for snapshots. Run the following command to create the repo

warning: Make sure in the base path, you should not add / in the last(repo name)

awscurl --region ap-south-1 --service es -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "type": "s3",
    "settings": {
    "bucket": "bhuvi-dr-es-backup",
    "endpoint": "s3.amazonaws.com",
    "base_path": "bhuvi-search-os-snapshot",
    "max_snapshot_bytes_per_sec": "250mb",
    "role_arn": "arn:aws:iam::ACCOUNT_ID:role/bhuvi-dr-OS-S3-role"
    }
  }' https://domain-1.ap-south-1.es.amazonaws.com/_snapshot/bhuvi-dr-search-s3-snap 
  • base_path - Folder name inside the S3 bucket
  • max_snapshot_bytes_per_sec - Data Transfer rate to the S3
  • bhuvi-dr-search-s3-snap - Name of the snapshot repo.

Create Lambda function to take Snapshot: #

IAM ROLE for the lambda - bhuvi-dr-OS-S3-role

Permissions: #

  • AWSLambdaBasicExecutionRole
  • AWSLambdaENIManagementAccess
  • Inline Policy:

You can add all the domain ARN under the resources

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::ACCOUNT_ID:role/bhuvi-dr-OS-S3-role"
        },
        {
            "Effect": "Allow",
            "Action": [
                "es:ESHttpPut",
                "es:ESHttpPost",
                "es:ESHttpDelete",
                "es:ESHttpGet"
            ],
            "Resource": [
                "arn:aws:es:ap-south-1:ACCOUNT_ID:domain/*",
                "arn:aws:es:ap-south-2:ACCOUNT_ID:domain/*"
            ]
        }
    ]
}

We are going to use this Lambda in a VPC configuration. So select the VPC and Subnets.

  • VPC
    • vpc (10.10.0.0/16) | bhuvi-vpc
  • Subnets
    • subnet-1 (10.11.72.0/25) | ap-south-1a, bhuvi-db-subnet-0
    • subnet-2 (10.11.72.128/25) | ap-south-1b, bhuvi-db-subnet-1
  • Security Group
    • sg-for lambda (bhuvi-dr-opensearch-snapshot-lambda)
    • Inbound Rule - All traffic to the same Security Group ID.

Once the Lambda function is created, it’ll create 1 ENI from each subnet. Go to EC2 → Network Interface and you can filter by the Lambda function security group ID to get the EN details. Then whitelist these 2 IP addresses into the ES domain’s security group on port 443.

ENI and IP address for Lambda

Lamda function code: #

We need 3rd party libs like requests and requests_aws4auth but these libs are not directly available on AWS lambda, so we need to install it locally and then zip and upload it to the lambda.

mkdir /tmp/lambda_libs
pip3 install requests requests_aws4auth  --target /tmp/lambda_lib

Now, create the file lambda_function.py inside the /tmp/lambda_lib

lambda_function.py code:

import boto3
import requests
from requests_aws4auth import AWS4Auth
from datetime import datetime

def lambda_handler(event, context):
    
    CURRENT_DATE = datetime.now().strftime("%Y%m%d")
    CREDENTIALS = boto3.Session().get_credentials()
    AWSAUTH = AWS4Auth(CREDENTIALS.access_key, CREDENTIALS.secret_key, 'ap-south-1' , 'es', session_token=CREDENTIALS.token)
    HEADERS = {"Content-Type": "application/json"}
    
    
    ES_URLS = [
        'https://domain-1.ap-south-1.es.amazonaws.com',
        'https://domain-2.ap-south-1.es.amazonaws.com', 
        'https://domain-3.ap-south-1.es.amazonaws.com'
        ]
    for url in ES_URLS:
        domain = url.split('/')[2].split('.')[0].rsplit('-', 1)[0].split('-')[3]
        snap_repo = 'bhuvi-dr-' + domain + '-s3-snap'
        curl_url = url + '/_snapshot/' + snap_repo + '/bhuvi-dr-' + domain + '-snapshot-' + CURRENT_DATE
        print('ES Domain Name: ' + domain)
        print('ES URL: ' + url)
        print('ES Snaphot Repo: ' + snap_repo)
        print('ES Snapshot Name: bhuvi-dr' + domain + '-' + CURRENT_DATE)
        print('CURL URL: ' + curl_url)
        print('\n\n Taking the snapshot')
        r = requests.put(curl_url, auth=AWSAUTH, headers=HEADERS, timeout=600)
        if str(r.status_code) == '200':
            
            print('Status Code: ' + str(r.status_code))
            print('Response: ' + r.text)
            print('\n\n Snapshot triggered')
            print('====================')
        else:
            return {
                        'statusCode': 500,
                        'body': f"Snapshot failed: {r.text}"
                    } 
            

Zip the file:

cd /tmp/lambda_libs
zip -r lambda_libs.zip .

Update the ZIP file:

Upload zip file to Lambda

Once this is uploaded, you can see the libs folders and lambda_function.py in the code source.

Lambda code

Schedule the Lambda to run daily: #

Create a cloudwatch event rule to take snapshots daily at 1 AM IST(21:30 UTC) - cron(30 21 * * ? *)

You can run the command to check the list of snapshots available.

awscurl --region ap-south-1 \
  --service es -X GET \
  'https://domain-1.ap-south-1.es.amazonaws.com/_cat/snapshots/domain-1-dr-s3-snap' 
domain-1-dr-snapshot-20240119 SUCCESS 1705699837 21:30:37 1705699942 21:32:22 1.7m 211 1039 0 1039
domain-1-dr-snapshot-20240120 SUCCESS 1705786236 21:30:36 1705786332 21:32:12 1.5m 212 1044 0 1044
domain-1-dr-snapshot-20240121 SUCCESS 1705872636 21:30:36 1705872738 21:32:18 1.6m 213 1049 0 1049
domain-1-dr-snapshot-20240122 SUCCESS 1705959036 21:30:36 1705959138 21:32:18 1.6m 214 1054 0 1054

Lambda function to delete the older than 3 days snapshot #

We can follow the same steps as above, but change the Lambda code using the following.

Info: The Delete API call will be running more than 5min, but OpenSearch has a limitation that any API calls should not run more than 5mins, so we are just calling the API and setting timeout as 10 sec, but it’ll skip the error message and the Delete API call will be running in the background.

import boto3
import requests
from requests_aws4auth import AWS4Auth
from datetime import datetime,timedelta

def lambda_handler(event, context):
    ## Retention period is 3, if need we can increase 
    
    RETENTION = (datetime.now() - timedelta(days=3)).strftime("%Y%m%d")
    CREDENTIALS = boto3.Session().get_credentials()
    AWSAUTH = AWS4Auth(CREDENTIALS.access_key, CREDENTIALS.secret_key, 'ap-south-1' , 'es', session_token=CREDENTIALS.token)
    HEADERS = {"Content-Type": "application/json"}

    
    ES_URLS = [
                
              'https://domain-1.ap-south-1.es.amazonaws.com',
              'https://domain-2.ap-south-1.es.amazonaws.com', 
              'https://domain-3.ap-south-1.es.amazonaws.com'
        ]
    for url in ES_URLS:
        domain = url.split('/')[2].split('.')[0].rsplit('-', 1)[0].split('-')[3]
        snap_repo = 'bhuvi-dr-' + domain + '-s3-snap'
        curl_url = url + '/_snapshot/' + snap_repo + '/bhuvi-dr-' + domain + '-snapshot-' + RETENTION
        print('ES Domain Name: ' + domain)
        print('ES URL: ' + url)
        print('ES Snaphot Repo: ' + snap_repo)
        print('ES Snapshot Name: bhuvi-dr' + domain + '-' + RETENTION)
        print('CURL URL: ' + curl_url)
        print('\n\n Deleting the snapshot')
        # Use the DELETE method without waiting for the response
        try:
             requests.delete(curl_url, auth=AWSAUTH, headers=HEADERS, timeout=10)
        except requests.exceptions.Timeout: 
            pass
        print('Snaphot deletion triggered')