MySQL Master–Slave Replication: Setup, Troubleshooting & Best Practices
Easy-to-follow, production-ready guide with commands, checks and pro tips for healthy replication.
Why use Master–Slave Replication?
- Read scaling: Offload read traffic to one or more slaves.
- Backups & failover: Use slaves for safe backups or emergency promotion.
- Analytics & testing: Run heavy reporting queries on slaves.
- High availability foundation: Base for HA and multi-region setups.
Prepare your environment
Example IPs used in commands: MASTER=192.168.0.101, SLAVE=192.168.0.102. Both servers should have network connectivity and matching timezone settings. Use the same or compatible MySQL major versions (e.g., both MySQL 8.x).
Configure the Master
1. Edit MySQL config (my.cnf / mysqld.cnf)
# On master
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
binlog_do_db = mydatabase
expire_logs_days = 7
2. Restart MySQL
sudo systemctl restart mysql
3. Create replication user
-- run on master
CREATE USER 'replica'@'%' IDENTIFIED BY 'StrongPassword!';
GRANT REPLICATION SLAVE ON *.* TO 'replica'@'%';
FLUSH PRIVILEGES;
4. Lock tables and capture coordinates
Keep the session open until the dump completes.
USE mydatabase;
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
-- Note the File and Position values (e.g. mysql-bin.000001 and 12345)
5. Export the database
mysqldump -u root -p --single-transaction --databases mydatabase > mydatabase.sql
-- After dump completes, in the locked session:
UNLOCK TABLES;
Configure the Slave
1. Edit slave config
# On slave
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.log
read_only = 1
2. Restart MySQL and import dump
sudo systemctl restart mysql
mysql -u root -p < mydatabase.sql
3. Point slave to master coordinates
-- run on slave, replace MASTER_LOG_FILE and MASTER_LOG_POS with values from master
CHANGE MASTER TO
MASTER_HOST='192.168.0.101',
MASTER_USER='replica',
MASTER_PASSWORD='StrongPassword!',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=12345,
GET_MASTER_PUBLIC_KEY = 1;
START SLAVE;
4. Verify
SHOW SLAVE STATUS\G
-- Look for:
-- Slave_IO_Running: Yes
-- Slave_SQL_Running: Yes
-- Seconds_Behind_Master: 0 (or small)
Verify & Monitor
Regularly check replication health:
SHOW SLAVE STATUS\G— the primary single-check.- Monitor
Seconds_Behind_Masterto detect lag. - Use tools: pt-heartbeat (Percona Toolkit), Prometheus + mysqld_exporter, or managed DB monitoring.
Key Troubleshooting
1. Slave_IO_Running = No
Cause: network, auth or master info mismatch.
-- On slave:
SHOW SLAVE STATUS\G
-- Check Last_IO_Error for details
-- Test connectivity:
telnet 192.168.0.101 3306
mysql -h 192.168.0.101 -u replica -p
2. Slave_SQL_Running = No
Cause: SQL error applying relay event (schema mismatch or invalid statement).
SHOW SLAVE STATUS\G
-- Inspect Last_SQL_Error.
-- If safe to skip one offending statement:
STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;
Use skipping cautiously — skipping may cause data divergence. Investigate root cause first.
3. Replication lag
Common fixes: increase slave resources (CPU, I/O), tune queries, avoid long write transactions on master, and consider multi-threaded slave (MySQL 5.6+ supports slave_parallel_workers).
4. Binary log retention & re-sync
Ensure master retains binlogs long enough for slow/slipped slaves. Set expire_logs_days or manage with retention policy.
Pro Tips for Production
- Set
binlog_format=ROWfor safer replication of complex statements. - Use
--single-transactionwithmysqldumpfor InnoDB consistent snapshot. - Monitor and alert on
Slave_IO_Running,Slave_SQL_Running, andSeconds_Behind_Master. - Test failover & promotions in a staging environment before relying on automatic failovers.
- Consider GTID-based replication for simpler failover and reconfiguration (GTID_MODE=ON).
- Document master coordinates and keep secure access for the replication user.
FAQ
- Q: Can I use this for multi-master?
- A: This guide covers single-master, multi-slave replication. Multi-master needs additional conflict-handling solutions (e.g., Galera Cluster).
- Q: What if slave lags frequently?
- A: Profile slow queries on slave, increase I/O throughput, scale reads across more slaves, or use parallel replication (slave_parallel_workers).
- Q: Is GTID better?
- A: GTIDs simplify failover and reconfiguration but require understanding GTID semantics. Many production setups benefit from GTID once tested.