Neo4j’s multi datacenter deployments are well suited for a geo-distributed workload and also provide a better disaster recovery solution. But to be frank, its not an actual distributed databases like Google Spanner or CocroachDB. Here it’s just grouping/labeling your Neo4j Nodes with different data center names. Even though it has a lot more benefits, like load balancing to a particular group, replicating the data to read replica from the existing read replica instead of replicating from master and etc. Like my previous blog, this also just guides to setting up the Multi datacenter cluster in AWS and GCP.
This blog just gives you simple steps to create a fresh Neo4j Multi data center cluster. From AWS/GCP you just need to whitelist the IP address in the security group(AWS) and Firewall rules(GCP). Otherwise, all the steps are common for both deployments.
Install Neo4j on all the nodes:
5000 - discovery_listen_address
6000 - transaction_advertised_address
7000 - raft_advertised_address
7473 - HTTPS interface to access the Neo4j cluster in browser
7474 - HTTP interface to access the Neo4j cluster in browser
7687 - Used by Cypher Shell and by Neo4j Browser
6362 - Backup port to seed the data from the Leader node.
Please allow the above ports between all the nodes.
Configure Multi Datacenter Cluster:
In our setup, we use 2 nodes as a minimum number of nodes to form a cluster, also we always need 2 runtime nodes to make the cluster up and running. Update the following values in the /etc/neo4j/neo4j.conf file. Before doing this just stop the neo4j service and delete the store data (/var/lib/neo4j/data/databases/graph.db/)
Start the Neo4j
Now we can start the neo4j service it’ll form a 2 node cluster.
Adding the Replica (us-central)
Edit the neo4j.conf file on the node 3 and 4.
Adding the Replica (us-east-1)
Edit the neo4j.conf file on the node 5 and 6.
Start the Replica nodes:
Now start the neo4j service on all the read replica nodes, and then check the cluster status.
Our multi datacenter cluster is ready. It just gives you a simple configuration guide for multi data center design. But I didn’t cover the best practices here. May be I’ll write a new blog for that.