Migration from AWS ElastiCache Part 2
In the previous article, we explained how to migrate from AWS ElastiCache with cluster mode disabled to Montplex Cache with cluster mode disabled.
This article will focus on how to migrate from AWS ElastiCache with cluster mode enabled to Montplex Cache with cluster mode enabled.
Preparation
Prepare the Target Instance
In the VPC where AWS ElastiCache is located, create a non-cluster mode Montplex Cache by referring to the Montplex Cache creation process. It is recommended to choose the same availability zone as the ElastiCache instance to reduce cross-availability zone traffic during migration.
Prepare the Migration Scheduling Machine
Prepare an EC2 machine that can be remotely accessed in the VPC where AWS ElastiCache is located. A configuration of 4C8G or higher is recommended. It is also recommended that the availability zone is the same as the ElastiCache instance and the target instance Montplex Cache.
Install the Migration Tool
This article uses the Redis-shake tool for migration. https://github.com/tair-opensource/RedisShake
Download the migration tool.
wget https://github.com/tair-opensource/RedisShake/releases/download/v4.1.0/redis-shake-linux-amd64.tar.gz
tar -zxf redis-shake-linux-amd64.tar.gz
Determine Migration Type
Migration Types
Migrating from instance to instance can be divided into** online full migration** and online full + incremental migration.
Online Full Migration: Migrates data that already exists on the source side; new data will not be migrated.
Online Full + Incremental Migration: Migrates both existing data and new data.
Migration Differences
Due to limitations on the ElastiCache instance for some commands (e.g., config, sync, psync), direct data migration using master-slave replication is not possible. Therefore, the scan mode must be used for data migration. When migrating incremental data in scan mode, the notify-keyspace-events parameter must be enabled.
In summary:
Online Full Migration: Does not require modifications to the source instance configuration file.
Online Full + Incremental Migration: Requires modifications to the source instance configuration file.
Refer to the following to modify ElastiCache parameters:
To achieve online full + incremental data migration, the notify parameter for new keys in Redis needs to be modified.
How to Change the Aws ElastiCache Parameter
Configure the Migration Tool
Migration Modes
The Redis-shake provides three modes: restore, sync, and scan.
- Restore Mode: Mainly used for offline recovery of RDB files.
- Sync Mode: Uses sync, psync, etc. commands, mainly used for master-slave replication scenarios.
- Scan Mode: Uses the scan method, mainly for cloud Redis instances that cannot use config, sync, psync, etc. commands.
Due to command limitations on the ElastiCache instance (e.g., config, sync, psync), only scan mode can be used.
Redis Shake
Configuration File
The Redis-shake tool provides a default configuration file, shake.toml
, which includes settings for three migration modes.
Generating Migration Configuration File (Scan Mode)
Full Data Migration Configuration File:
When migrating in cluster mode, set cluster = true
on the source side. On the target side, we use a proxy with custom sharding logic, so set cluster = false
.
1function = ""
2
3[scan_reader]
4cluster = true # set to true if source is a redis cluster
5address = "cluster.1o85i0.clustercfg.apse1.cache.amazonaws.com:6379" # when cluster is true, set address to one of the cluster node
6username = "" # keep empty if not using ACL
7password = "" # keep empty if no authentication is required
8tls = false
9dbs = [] # set you want to scan dbs such as [1,5,7], if you don't want to scan all
10scan = true # set to false if you don't want to scan keys
11ksn = false # set to true to enabled Redis keyspace notifications (KSN) subscription
12count = 50 # number of keys to scan per iteration
13
14[redis_writer]
15cluster = false # set to true if target is a redis cluster
16sentinel = false # set to true if target is a redis sentinel
17master = "" # set to master name if target is a redis sentinel
18address = "k8s-proxy-engulapr-995a2c0196-ea82fd9391768799.elb.us-east-1.amazonaws.com:8125" # when cluster is true, set address to one of the cluster node
19username = "" # keep empty if not using ACL
20password = "40125694011253209205287240801039" # keep empty if no authentication is required
21tls = false
22off_reply = false # ture off the server reply
23
24[advanced]
25dir = "data"
26ncpu = 0 # runtime.GOMAXPROCS, 0 means use runtime.NumCPU() cpu cores
27pprof_port = 0 # pprof port, 0 means disable
28status_port = 0 # status port, 0 means disable
29
30# log
31log_file = "shake.log"
32log_level = "info" # debug, info or warn
33log_interval = 5 # in seconds
34
35rdb_restore_command_behavior = "rewrite" # panic, rewrite or skip
36
37pipeline_count_limit = 1024
38
39target_redis_client_max_querybuf_len = 1024_000_000
40
41target_redis_proto_max_bulk_len = 512_000_000
42
43aws_psync = ""
44empty_db_before_sync = false
Full + Incremental Data Migration Configuration File:
1function = ""
2
3[scan_reader]
4cluster = true # set to true if source is a redis cluster
5address = "cluster.1o85i0.clustercfg.apse1.cache.amazonaws.com:6379" # when cluster is true, set address to one of the cluster node
6username = "" # keep empty if not using ACL
7password = "" # keep empty if no authentication is required
8tls = false
9dbs = [] # set you want to scan dbs such as [1,5,7], if you don't want to scan all
10scan = true # set to false if you don't want to scan keys
11ksn = true # set to true to enabled Redis keyspace notifications (KSN) subscription
12count = 50 # number of keys to scan per iteration
13
14[redis_writer]
15cluster = false # set to true if target is a redis cluster
16sentinel = false # set to true if target is a redis sentinel
17master = "" # set to master name if target is a redis sentinel
18address = "k8s-proxy-engulapr-995a2c0196-ea82fd9391768799.elb.us-east-1.amazonaws.com:8125" # when cluster is true, set address to one of the cluster node
19username = "" # keep empty if not using ACL
20password = "40125694011253209205287240801039" # keep empty if no authentication is required
21tls = false
22off_reply = false # ture off the server reply
23
24[advanced]
25dir = "data"
26ncpu = 0 # runtime.GOMAXPROCS, 0 means use runtime.NumCPU() cpu cores
27pprof_port = 0 # pprof port, 0 means disable
28status_port = 0 # status port, 0 means disable
29
30# log
31log_file = "shake.log"
32log_level = "info" # debug, info or warn
33log_interval = 5 # in seconds
34
35rdb_restore_command_behavior = "rewrite" # panic, rewrite or skip
36
37pipeline_count_limit = 1024
38
39target_redis_client_max_querybuf_len = 1024_000_000
40
41target_redis_proto_max_bulk_len = 512_000_000
42
43aws_psync = ""
44empty_db_before_sync = false
Execute Migration
Migrate Full Data
Note: When migrating full data, do not enable Redis keyspace notifications (as it affects business performance). Set ksn = false
.
1./redis-shake scan.toml >shake.log 2>&1 &
2ubuntu@ip-10-0-1-211:~$
3ubuntu@ip-10-0-1-211:~$ tail -f shake.log
42024-06-18 09:23:35 INF GOMAXPROCS defaults to the value of runtime.NumCPU [2]
52024-06-18 09:23:35 INF not set pprof port
62024-06-18 09:23:35 INF address=cluster.0d6hxg.clustercfg.use1.cache.amazonaws.com:6379, reply=a8ac90080a0842e4e410f6d2c4a46905371f053f 10.0.1.143:6379@1122 master - 0 1718702615661 2 connected 0-5461
7ee0dcd4a6c7494a94a7209bdd3b363a07148c250 10.0.1.29:6379@1122 master - 0 1718702614000 0 connected 10923-16383
8cb1a26882039429b3411b1498b2874a3543c63dc 10.0.1.156:6379@1122 myself,master - 0 1718702615000 1 connected 5462-10922
92024-06-18 09:23:35 INF create ScanClusterReader: cluster.0d6hxg.clustercfg.use1.cache.amazonaws.com:6379
102024-06-18 09:23:35 INF create RedisStandaloneWriter: k8s-proxy-engulapr-995a2c0196-ea82fd9391768799.elb.us-east-1.amazonaws.com:8125
112024-06-18 09:23:35 INF not set status port
122024-06-18 09:23:35 INF start syncing...
132024-06-18 09:23:40 INF read_count=[364807], read_ops=[73057.19], write_count=[364807], write_ops=[73057.19], src-1, scan_dbid=[0], scan_percent=[74.44%], need_update_count=[94936]
142024-06-18 09:23:45 INF [reader_10.0.1.143_6379] scanStandaloneReader dump finished.
152024-06-18 09:23:45 INF [reader_10.0.1.143_6379] scanStandaloneReader restore finished.
162024-06-18 09:23:45 INF read_count=[803637], read_ops=[95852.42], write_count=[803636], write_ops=[95852.42], src-2, need_update_count=[17408]
172024-06-18 09:23:46 INF [reader_10.0.1.29_6379] scanStandaloneReader dump finished.
182024-06-18 09:23:46 INF [reader_10.0.1.156_6379] scanStandaloneReader dump finished.
192024-06-18 09:23:46 INF [reader_10.0.1.29_6379] scanStandaloneReader restore finished.
202024-06-18 09:23:46 INF [reader_10.0.1.156_6379] scanStandaloneReader restore finished.
212024-06-18 09:23:46 INF all done
After the full data migration is complete, the task will exit automatically.
For a source cluster with 3 shards, you will see 3 dump and 3 restore processes for the shards.
Migrate Full + Incremental Data
Note: When migrating incremental data, you need to enable Redis keyspace notifications (as it affects business performance). Set ksn = true
.
1./redis-shake scan.toml >shake.log 2>&1 &
2[1] 2157
3ubuntu@ip-10-0-1-211:~$
4ubuntu@ip-10-0-1-211:~$ tail -f shake.log
52024-06-18 09:38:17 INF GOMAXPROCS defaults to the value of runtime.NumCPU [2]
62024-06-18 09:38:17 INF not set pprof port
72024-06-18 09:38:17 INF address=cluster.0d6hxg.clustercfg.use1.cache.amazonaws.com:6379, reply=a8ac90080a0842e4e410f6d2c4a46905371f053f 10.0.1.143:6379@1122 master - 0 1718703497249 2 connected 0-5461
8ee0dcd4a6c7494a94a7209bdd3b363a07148c250 10.0.1.29:6379@1122 myself,master - 0 1718703495000 0 connected 10923-16383
9cb1a26882039429b3411b1498b2874a3543c63dc 10.0.1.156:6379@1122 master - 0 1718703496244 1 connected 5462-10922
102024-06-18 09:38:17 INF create ScanClusterReader: cluster.0d6hxg.clustercfg.use1.cache.amazonaws.com:6379
112024-06-18 09:38:17 INF create RedisStandaloneWriter: k8s-proxy-engulapr-995a2c0196-ea82fd9391768799.elb.us-east-1.amazonaws.com:8125
122024-06-18 09:38:17 INF not set status port
132024-06-18 09:38:17 INF start syncing...
142024-06-18 09:38:22 INF read_count=[415289], read_ops=[86999.06], write_count=[415288], write_ops=[86999.06], src-1, scan_dbid=[0], scan_percent=[58.20%], need_update_count=[31592]
152024-06-18 09:38:27 INF read_count=[822866], read_ops=[81327.50], write_count=[822865], write_ops=[81328.50], src-2, need_update_count=[5215]
162024-06-18 09:38:32 INF read_count=[849902], read_ops=[0.00], write_count=[849902], write_ops=[0.00], src-0, need_update_count=[0]
172024-06-18 09:38:37 INF read_count=[849902], read_ops=[0.00], write_count=[849902], write_ops=[0.00], src-1, need_update_count=[0]
182024-06-18 09:38:42 INF read_count=[849902], read_ops=[0.00], write_count=[849902], write_ops=[0.00], src-2, need_update_count=[0]
192024-06-18 09:38:47 INF read_count=[849902], read_ops=[0.00], write_count=[849902], write_ops=[0.00], src-0, need_update_count=[0]
202024-06-18 09:39:52 INF read_count=[849914], read_ops=[0.00], write_count=[849914], write_ops=[0.00], src-1, need_update_count=[0]
212024-06-18 09:39:57 INF read_count=[880006], read_ops=[23183.41], write_count=[880005], write_ops=[23183.41], src-2, need_update_count=[27842]
222024-06-18 09:40:02 INF read_count=[995157], read_ops=[24200.85], write_count=[995156], write_ops=[24200.85], src-0, need_update_count=[99997]
232024-06-18 09:40:07 INF read_count=[1147592], read_ops=[29175.81], write_count=[1147591], write_ops=[29175.81], src-1, need_update_count=[99979]
242024-06-18 09:40:12 INF read_count=[1502537], read_ops=[69791.74], write_count=[1502536], write_ops=[69791.74], src-2, need_update_count=[82408]
252024-06-18 09:40:17 INF read_count=[1771719], read_ops=[22353.05], write_count=[1771719], write_ops=[22354.05], src-0, need_update_count=[0]
262024-06-18 09:40:22 INF read_count=[1771719], read_ops=[0.00], write_count=[1771719], write_ops=[0.00], src-1, need_update_count=[0]
After the full + incremental data migration is complete, the task will not exit automatically and will wait for event notifications.
Notes: From lines 16 to 19 in the log above, you can see that no incremental data is generated. Starting from line 20, we simulate new incremental data.
Simulate New Data from the Source
We write nearly 1 million data entries to ElastiCache in cluster mode:
1redis-benchmark -h cluster.0d6hxg.clustercfg.use1.cache.amazonaws.com -p 6379 -t set -n 1000000 -r 1000000 -d 150 --cluster
The redis-shake
log output will show that new incremental data has been generated:
2024-06-18 09:39:52 INF read_count=[849914], read_ops=[0.00], write_count=[849914], write_ops=[0.00], src-1, need_update_count=[0]
2024-06-18 09:39:57 INF read_count=[880006], read_ops=[23183.41], write_count=[880005], write_ops=[23183.41], src-2, need_update_count=[27842]
2024-06-18 09:40:02 INF read_count=[995157], read_ops=[24200.85], write_count=[995156], write_ops=[24200.85], src-0, need_update_count=[99997]
2024-06-18 09:40:07 INF read_count=[1147592], read_ops=[29175.81], write_count=[1147591], write_ops=[29175.81], src-1, need_update_count=[99979]
2024-06-18 09:40:12 INF read_count=[1502537], read_ops=[69791.74], write_count=[1502536], write_ops=[69791.74], src-2, need_update_count=[82408]
2024-06-18 09:40:17 INF read_count=[1771719], read_ops=[22353.05], write_count=[1771719], write_ops=[22354.05], src-0, need_update_count=[0]
2024-06-18 09:40:22 INF read_count=[1771719], read_ops=[0.00], write_count=[1771719], write_ops=[0.00], src-1, need_update_count=[0]