The snapshot in Debezium will do a historical data load from the source database to the Kafka topics. But generally its not a good practice to this if you have a huge data in your tables. Recently I have published many blog posts to perform this snapshot from Read Replica(with/without GTID, AWS Aurora). One guy commented that, in GCP the MySQL managed service is called CloudSQL. There we don’t have much control to stop replication, perform the modifications that we want. So how can we avoid snapshots in CloudSQL and take debezium snapshots from CloudSQL Read Replica? I have spent some time today and figured out a way to do this.
We can’t enable binlogs on read replica. So we have to setup an external read replica for this. If the external replica is a VM, then we can enable the log-slave-updates with GTID. Then we can follow this blog post to solve this problem. But I want to solve this by using CloudSQL read replica.
For this create a new read replica for your cloudsql.
If you already have a read replica, then create one more (because we’ll break the replication, so don’t use the active one).
Disable the replication on the new Replica and make a note of the master’s binlog information.
Then Promote the replica. So it’ll automatically enable the binlog.
Now create a connector to read data from the replica node.
Once the snapshot is done, manually update the connect-offset with Master’s binlog info.
Update the connector with Master’s IP address.
Then it’ll read from Master.
Proof Of Concept:
Read my previous blog posts to install and configure Confluent Kafka connect and etc.
CloudSQL Master IP - 172.24.0.13
CloudSQL Replica IP - 172.24.0.19
Create a new database to test this sync and insert some values.
Once your replica is in sync with Master, disable the replication from the CloudSQL Console.
Then login to the Replica and get the master’s binlog file name and position from the show slave status
Binlog File Name - Master_Log_File
Binlog Position - 10259755
Maser’s UUID - Master_UUID
Now promote the replica, it’ll enable the binlog on this replica server.
To simulate the complexity, add one more row on the master node(this row will not be replicated, since the replication is disables). So once the snapshot done, we’ll switch the MySQL IP. Then it should read this new row.
Create the MySQL Connector:
Use the below MySQL connector JSON config. (replace the MySQL details, kafka details, and Transformation things)