Replication stop – Relay log file corrupt
Okay it’s the second time this problem arise, so I’ll put a blog here just incase it happen again in near future. The replication on slave server just stop working with error
Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave.
Googling up and check several paramters:
on master server:
mysql> show master status\G; *************************** 1. row *************************** File: replication.011 Position: 130848689 Binlog_do_db: Binlog_ignore_db: 1 row in set (0.00 sec)
on slave server:
mysql> show slave status\G; *************************** 1. row *************************** Master_Host: 10.10.105.11 Master_User: repl Master_Port: 3306 Connect_retry: 60 Master_Log_File: replication.011 Read_Master_Log_Pos: 91831752 Relay_Log_File: dragon-relay-bin.183 Relay_Log_Pos: 3551175 Relay_Master_Log_File: replication.011 Slave_IO_Running: No Slave_SQL_Running: No Replicate_do_db: Replicate_ignore_db: Last_errno: 0 Last_error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Skip_counter: 0 Exec_master_log_pos: 74359528 Relay_log_space: 21023399 [root@dragon mysql]# tail -f dragon.rpxholding.com.err 081129 0:54:24 Error in Log_event::read_log_event(): 'read error', data_len: 148, event_type: 2 081129 0:54:24 Error reading relay log event: slave SQL thread aborted because of I/O error 081129 0:54:24 Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
Slave IO and SQL are not running. What caused the problem ? binlog file at master is not corrupted. That’s great and very good news. But relay log file at slave is corrupted. That’s what make replication can’t read it. Fixing the problem:
on slave server:
mysql> change master to master_log_file='replication.011',master_log_pos=74359676; mysql> slave start; mysql> show slave status\G; *************************** 1. row *************************** Master_Host: 10.10.105.11 Master_User: repl Master_Port: 3306 Connect_retry: 60 Master_Log_File: replication.011 Read_Master_Log_Pos: 135111081 Relay_Log_File: dragon-relay-bin.001 Relay_Log_Pos: 3204826 Relay_Master_Log_File: replication.011 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_do_db: Replicate_ignore_db: Last_errno: 0 Last_error: Skip_counter: 0 Exec_master_log_pos: 77564456 Relay_log_space: 60751451 1 row in set (0.00 sec) ERROR: No query specified
Where do I get parameter master_log_file=’replication.011′ and master_log_pos=74359676 ? it’s from “show slave status” command
- master_log_file=’replication.011′ is the last time slave read master log file and fortunately master still logging to that file too
(it’s from s) - master_log_pos=74359676 is the next log pos slave should execute a query from master. it’s from Exec_master_log_pos=74359528. So how do I know the next pos of 74359528 is 74359676? Simple! I do this:
# cd /var/lib/mysql # mysqlbinlog replication.011 > replication.011.txt # vi replication.011.txt -- search for 74359528 and the next pos is 74359676
Next question: What caused relay log file corrupted? (only my assumption) at the time replication stop slave server also doing a full backup and transferring a 3.5GB backup file to another server. this may cause NIC at full throttle and make relay log corrupt.
PS: I still don’t what to do if my master binlog files is corrupt. Perhaps I’ll be doing a full database mirroring (copying all databases files from master to slave when mysql down just to make both servers have the same database) to make sure data integrity
References:
- http://sql.dzone.com/news/troubleshooting-relay-log-corr
- http://www.pythian.com/blogs/876/when-show-slave-status-and-the-error-log-disagree
- http://www.mysqlperformanceblog.com/2008/07/07/how-show-slave-status-relates-to-change-master-to/
- http://www.brandonchecketts.com/archives/fixing-a-corrupt-mysql-relay-log
Edit: Feb 14, 2009
The replication stop again slave server. this time after thorough investigation, found out the server is nearly at full capation (especially the /var partition). Clean some stuffs, and I get 16GB of free space. Free space indeed an important thing I should check first
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.
February 9th, 2009 at 4:42 pm
[...] About « Replication stop – Relay log file corrupt [...]
March 7th, 2009 at 11:38 am
IF (with the bold and big size fonts ) the time is come,i need your help to teach me with replication method
March 7th, 2009 at 4:07 pm
I am sorry man, you have to hire me up to setup a full cluster mysql databases system with backup.
I can’t be buyed but I can be hired
j/k
April 15th, 2009 at 1:41 pm
and the time is come for centeng, hue..he.he…
April 22nd, 2009 at 1:44 pm
good luck centeng..!!!