Replication stop – Relay log file corrupt

Okay it’s the second time this problem arise, so I’ll put a blog here just incase it happen again in near future. The replication on slave server just stop working with error

Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave.

Googling up and check several paramters:

on master server:

mysql> show master status\G;
*************************** 1. row ***************************
File: replication.011
Position: 130848689
Binlog_do_db:
Binlog_ignore_db:
1 row in set (0.00 sec)

on slave server:

mysql> show slave status\G;
*************************** 1. row ***************************
Master_Host: 10.10.105.11
Master_User: repl
Master_Port: 3306
Connect_retry: 60
Master_Log_File: replication.011
Read_Master_Log_Pos: 91831752
Relay_Log_File: dragon-relay-bin.183
Relay_Log_Pos: 3551175
Relay_Master_Log_File: replication.011
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_do_db:
Replicate_ignore_db:
Last_errno: 0
Last_error: Could not parse relay log event entry. The possible
reasons are: the master's binary log is corrupted (you can check
this by running 'mysqlbinlog' on the binary log), the slave's
relay log is corrupted (you can check this by running 'mysqlbinlog'
 on the relay log), a network problem, or a bug in the master's or
slave's MySQL code. If you want to check the master's binary log or
slave's relay log, you will be able to know their names by issuing
'SHOW SLAVE STATUS' on this slave.
Skip_counter: 0
Exec_master_log_pos: 74359528
Relay_log_space: 21023399

[root@dragon mysql]# tail -f dragon.rpxholding.com.err
081129  0:54:24 Error in Log_event::read_log_event(): 'read error',
 data_len: 148, event_type: 2 081129  0:54:24 Error reading relay
log event: slave SQL thread aborted because of I/O error 081129 0:54:24
Slave: Could not parse relay log event entry. The possible reasons are:
the master's binary log is corrupted (you can check this by running
'mysqlbinlog' on the binary log),  the slave's relay log is corrupted
(you can check this by running 'mysqlbinlog' on the relay log), a network
problem, or a bug in the master's or slave's MySQL code. If you want to
check the master's binary log or slave's relay log, you will be able to
know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0

Slave IO and SQL are not running. What caused the problem  ? binlog file at master is not corrupted. That’s great and very good news. But relay log file at slave is corrupted. That’s what make replication can’t read it. Fixing the problem:

on slave server:

mysql> change master to master_log_file='replication.011',master_log_pos=74359676;
mysql> slave start;
mysql> show slave status\G;
*************************** 1. row ***************************
Master_Host: 10.10.105.11
Master_User: repl
Master_Port: 3306
Connect_retry: 60
Master_Log_File: replication.011
Read_Master_Log_Pos: 135111081
Relay_Log_File: dragon-relay-bin.001
Relay_Log_Pos: 3204826
Relay_Master_Log_File: replication.011
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_do_db:
Replicate_ignore_db:
Last_errno: 0
Last_error:
Skip_counter: 0
Exec_master_log_pos: 77564456
Relay_log_space: 60751451
1 row in set (0.00 sec)

ERROR:
No query specified

Where do I get parameter master_log_file=’replication.011′ and master_log_pos=74359676 ? it’s from “show slave status” command

  • master_log_file=’replication.011′ is the last time slave read master log file and fortunately master still logging to that file too :D (it’s from s)
  • master_log_pos=74359676 is the next log pos slave should execute a query from master. it’s from Exec_master_log_pos=74359528. So how do I know the next pos of 74359528 is 74359676? Simple! I do this:
    # cd /var/lib/mysql
    # mysqlbinlog replication.011 > replication.011.txt
    # vi replication.011.txt -- search for 74359528 and the next pos is 74359676

Next question: What caused relay log file corrupted? (only my assumption) at the time replication stop slave server also doing a full backup and transferring a 3.5GB backup file to another server. this may cause NIC at full throttle and make relay log corrupt.

PS: I still don’t what to do if my master binlog files is corrupt. Perhaps I’ll be doing a full database mirroring (copying all databases files from master to slave when mysql down just to make both servers have the same database) to make sure data integrity

References:

Edit: Feb 14, 2009

The replication stop again slave server. this time after thorough investigation, found out the server is nearly at full capation (especially the /var partition). Clean some stuffs, and I get 16GB of free space. Free space indeed an important thing I should check first :D

SocialTwist Tell-a-Friend

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

5 Responses to “Replication stop – Relay log file corrupt”

  1. Script to Monitor Replication | am3n portfolio Says:

    [...] About « Replication stop – Relay log file corrupt [...]

  2. centeng Says:

    IF (with the bold and big size fonts ) the time is come,i need your help to teach me with replication method :)

  3. am3n Says:

    I am sorry man, you have to hire me up to setup a full cluster mysql databases system with backup.

    I can’t be buyed but I can be hired :D

    j/k

  4. kumay Says:

    and the time is come for centeng, hue..he.he…

  5. am3n Says:

    good luck centeng..!!!

Leave a Reply

CommentLuv Enabled