by Bhuvanesh

### Tags

When you installed the Debezium MySQL connector, then it’ll start read your historical data and push all of them into the Kafka topics. This setting can we changed via snapshot.mode parameter in the connector. But if you are going to start a new sync, then Debezium will load the existing data its called Snapshot. Unfortunately, if you have a busy transactional MySQL database, then it may lead to some performance issues. And your DBA will never agree to read the data from Master Node.[Disclaimer: I’m a DBA :) ]. So I was thinking of figuring out to take the snapshot from the Read Replica, once the snapshot is done, then start read the realtime data from the Master. I found this useful information in a StackOverflow answer.

If your binlog uses GTID, you should be able to make a CDC tool like Debezium read the snapshot from the replica, then when that’s done, switch to the master to read the binlog. But if you don’t use GTID, that’s a little more tricky. The tool would have to know the binlog position on the master corresponding to the snapshot on the replica.

Then I tried to implement in a realtime scenario and verified the statement is true. Yes, we made this in our system. Here is the step by step details from our PoC.

## Requirements:

• Master and Slave should be enabled with GTID.
• Debezium Connector Node can talk to both master and slave.
• log-slave-updates must be enabled on the slave(anyhow for GTID its requires).
• A user account for Debezium with respective permissions.
• Install Debezium connector.

## Sample data:

Create a new database to test this sync and insert some values.

### Create the MySQL Connector Config:

File Name: mysql.json

### Watch the status of the connector:

Open three terminal windows and start listening to the following topics.

NOTE: change the bootstrap-server as per your cluster’s IP.

1. connect-configs
2. connect-status

### Install the Connector:

Once you installed, from your connect-configs topic, you will get the following output.

And then from your connect-statustopic, you’ll get the status of your MySQL connector.

### Snapshot Status from the log file:

By default, the Kafka connector’s logs will go to syslog. You can customize this log location. So wherever you have the log file, you can see the snapshot progress there.

## Snapshot Complete:

Once your’ snapshot process is done, then the connect-offsets topic will have the binlog information of till where it’s consumed.

Then it’ll start applying the ongoing replication changes as well.

Now we have verified that the database’s snapshot has been done. Its time to swap the nodes. We’ll start consuming from the Master.

If you enable the Monitoring for the Debezium connector, then you see the lag from the JMX or Premetheus metrics.

Sometimes the metrics take a few more minutes to update. So once you are able to see the last binlog information from the connet-offsets and the lag <10, then the snapshot is done.

## Switch to Master:

The main important thing is to STOP the slave thread in your Read replica. This will prevent the changing the GTID in your connect-offsets topic.

To simulate the sync, we can add 1 new row in our MySQL table. So this will never replicate to your slave. But once you switch the node, it should start reading from this row.

We need to update the existing MySQL connector’s config and just change the "database.hostname" parameter.

Note: this JSON file format is different from the one which we used to register the connector. So make sure the syntax.

File Name: mysql-update.json

Run the below command to update the config file.

Once its updated, from the connect-offsets topic, you can see that the Debezium starts reading the data from the Next GTID.

Also from your topic, you can see the last row has been pushed.

This method helped us to sync the historical data from the Read replica to the Kafka topic without affecting the transactions on the Master node. Still, we are exploring this for more scenarios. I’ll keep posting new articles about this.