Monday, April 11, 2011

Replication Event Checksum

MySQL replication is fast, easy to use, and reliable, but once it breaks, it can be very hard to figure out what the problem is. One of the concerns often raised is that events are corrupted, either through failing hardware, network failure, or software bugs. Even though it is possible to handle errors during transfer over the network using an SSL connection, errors here is rarely the problem. A more common problem (relatively) is that the events are corrupted either due to a software bug, or hardware error.

To be able to better handle corrupted events, the replication team has added replication event checksums to MySQL 5.6 Milestone Development Release. The replication event checksums are added to each event as it is written to the binary log and are used to check that nothing happened with the event on the way to the slave. Since the checksums are added to all events in the binary log on the master and transfered both over the network and written to the relay log on the slave, it is possible to track events corrupted events both because of hardware problems, network failures, and software bugs.

Figure 1. Master and Slave with Threads
The checksum used is a CRC-32 checksum, more precisely ISO-3309, which is the one supplied with zlib. This is an efficient checksum algorithm, but there is of course a penalty since the checksum needs to be generated. At this time, we don't have any measurements on the performance impact.

If you look at Figure 1 you can see an illustration of how events propagate through the replication system. In the figure, the points where a checksum could be generated or checked are marked with numbers. In the diagram, you can see the threads that handle the processing of events, and an outgoing arrow from a thread can generate a checksum while an arrow going into a thread can validate a checksum. Note, however, that for pragmatic reasons not all validations or generations can be done.

To enable validation or generation three new options were introduced:

This option is used to control checksum generation. Currently, it can accept two different values: NONE and CRC32, with NONE being default (for backward compatibility).

Setting binlog_checksum to NONE means that no checksum is generated, while setting it to CRC32 means that an ISO-3309 CRC-32 checksum is added to each binary log event.

This means that a checksum will be generated by the session thread and written to the binary log, that is, at point 1 in Figure 1.

This option can be set to either 0 or 1 (with default being 0) and indicates that the master should verify any events read from the binary log on the master, corresponding to point 2 in Figure 1. In addition to being read from the binary log by the dump thread events are also read when a SHOW BINLOG EVENTS is issued at the master and a check is done at this time as well.

Setting this flag can be useful to verify that the event really written to the binary log is uncorrupted, but it is typically not needed in a replication setting since the slave should verify the event on reception.

Similar to master_verify_checksum, this option can be set to either 0 or 1 (but defaults to 1) and indicates that the SQL thread should verify the checksum when reading it from the relay log on the slave. Note that this means that the I/O thread writes a checksum to the event written to the relay log, regardless of whether it received an event with a checksum or not.

This means that this option will enable verification at point 5 in Figure 1 and also enable generation of a checksum at point 4 in the figure.

If you payed attention, you probably noticed that there is no checking for point 3 in the figure. This is not necessary since the checksum is verified when the event is written to the relay log at point 4, and the I/O thread just does a straight copy of the event (potentially adding a checksum, as noted above).

So, how does it look when we encounter a checksum error? Let's try it out and see what happens. We start by generating a simple binary log with checksums turned on and see what we get.

Query OK, 0 rows affected (0.04 sec)

master> INSERT INTO t1(name) VALUES ('Mats'),('Luis');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

| Log_name          | Pos | Event_type | Server_id | End_log_pos | Info                                                      |
| master-bin.000001 | 261 | Query      |         1 |         333 | BEGIN                                                     |
| master-bin.000001 | 333 | Intvar     |         1 |         365 | INSERT_ID=1                                               |
| master-bin.000001 | 365 | Query      |         1 |         477 | use `test`; INSERT INTO t1(name) VALUES ('Mats'),('Luis') |
| master-bin.000001 | 477 | Query      |         1 |         550 | COMMIT                                                    |
4 rows in set (0.00 sec)
Here, everything looks as before, so no sign of a checksum here, but let's edit the binlog file directly and change the 's' in 'Mats' to a 'z' and see what happens. First with MASTER_VERIFY_CHECKSUM set to 0, and then with it set to 1.
| Log_name          | Pos | Event_type | Server_id | End_log_pos | Info                                                      |
| master-bin.000001 | 261 | Query      |         1 |         333 | BEGIN                                                     |
| master-bin.000001 | 333 | Intvar     |         1 |         365 | INSERT_ID=1                                               |
| master-bin.000001 | 365 | Query      |         1 |         477 | use `test`; INSERT INTO t1(name) VALUES ('Matz'),('Luis') |
| master-bin.000001 | 477 | Query      |         1 |         550 | COMMIT                                                    |
4 rows in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

ERROR 1220 (HY000): Error when executing command SHOW BINLOG EVENTS: Wrong offset or I/O error
Now, the error message generated is not the crystal clear, but there were an I/O error when reading the binary log: the checksum verification failed. You can see this because I could show the content of the binary log with MASTER_VERIFY_CHECKSUM set to 0, but not when set to 1. Since the checksum is checked when reading events from the binary log, we get a checksum failure when using SHOW BINLOG EVENTS.

So, if we restore the error and verify that it is correct by issuing a SHOW BINLOG EVENTS again, we can try to send it over to the slave and see what happens. The steps to do this (in case you want to try yourself) is:

  1. Start the I/O thread and let it create the relay log using START SLAVE IO_THREAD.
  2. Stop the slave using STOP SLAVE (this is necessary since the slave buffers part of the relay log).
  3. Manually edit the relay log to corrupt one event (I replaced the 's' with a 'z'.
  4. Start the slave using START SLAVE.
The result when doing this is an error, as you can see below. Removing the corruption and starting the slave again will apply the events as expected.
*************************** 1. row ***************************
              Master_Log_File: master-bin.000001
          Read_Master_Log_Pos: 550
               Relay_Log_File: slave-relay-bin.000002
                Relay_Log_Pos: 419
        Relay_Master_Log_File: master-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
                Last_IO_Errno: 0
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse
                               relay log event entry. The possible
                               reasons are: the master's binary log is
     Last_SQL_Error_Timestamp: 110406 09:41:40
1 row in set (0.00 sec)
Now, this is all very nice, but if you have a corruption, you also want to find out where the corruption is—and that preferably without having to start the server. To handle this, the mysqlbinlog program was extended to print the CRC checksum (if there is one) and also to verify it if you give the verify-binlog-checksum option to it.
$ client/mysqlbinlog --verify-binlog-checksum master-bin.000001
# at 261
#110406  8:35:28 server id 1  end_log_pos 333 CRC32 0xed927ef2  Query   thread_id=1...
SET TIMESTAMP=1302071728/*!*/;
# at 333
#110406  8:35:28 server id 1  end_log_pos 365 CRC32 0x01ed254d  Intvar
ERROR: Error in Log_event::read_log_event(): 'Event crc check failed! Most likely...
# End of log file
ROLLBACK /* added by mysqlbinlog */;
As you can see, an error is emitted for the offending event, and you can also see the CRC checksum value (which is 32 bits) in the output above, and it corresponds to the position where the slave stopped for my corrupted binary log.

This is just the beginning: there are many things that can be done using checksums, and many new things that are now possible to implement. If you think that this is a useful feature, please let us know, and if you think that it needs to be enhanced, changed, or extended, we would also like to hear from you.

Closing remarks

We would be very interested in hearing any comments you have on this feature and how it is implemented. If you want to try this out for yourselves then you can download the MySQL 5.6 Milestone Development Release where all this is implemented from the MySQL Developer Zone (

If you want to find out the details, the reference documentation for the replication checksum can be found together with the options mentioned above:

This is one of the features that are presented by Lars Thalmann today (April 11, 2011) at 2:30pm, at the "MySQL Replication" talk at Collaborate 11 and tomorrow (April 12, 2011) 10:50am "MySQL Replication Update" at the O'Reilly MySQL Conference & Expo.


erkules said...

Thats great! Having checksums with InnoDB already, now checksums on replication closes the gap! thx \o/

repls said...

what the editor do you use to edit the binlog directly? i use the vim to edit the binlog, but without issue the cmd 'SET GLOBAL MASTER_VERIFY_CHECKSUM=1' before run cmd with 'SHOW BINLOG EVENTS', then error occurs:
ERROR 1220 (HY000): Error when executing command SHOW BINLOG EVENTS: Wrong offset or I/O error
that why?

Mats Kindahl said...

The editor I use for editing anything is basically Emacs. I seldom use anything else. In this case, I edited the file in hexl-mode and changed the 's' to a 'z'.