Saturday, July 17, 2010

InnoDB crash recovery after power failure

On the Friday, 16th July 2010. There was a power outage at our data center in LA. After power is restored, MySQL service refused to start. I checked the error log and realized InnoDB has been crashed.
 
  1. The first thing I do is copy the raw database file as backup before doing anything:
    sudo cp -R /var/lib/mysql /var/lib/mysql-crash
     
  2. Edit the MySQL configuration file:
    sudo vi /etc/mysql/my.cnf
  3. Insert the following line in [mysqld] section to force InnoDB recovery mode:
    innodb_force_recovery = 1

    From MySQL manual:
If you are able to dump your tables with an option value of at most 4, then you are relatively safe that only some data on corrupt individual pages is lost.
I tried from mode 1 to 4, but MySQL service refused to start. That's mean I have to accept some losses. I tried mode 5 (SRV_FORCE_NO_UNDO_LOG_SCAN), still no luck. This make me really worry as mode 6 is the last option. If MySQL service doesn't start my last option would be restored yesterday's backup.

Thanks God! Mode 6 (SRV_FORCE_NO_LOG_REDO) works! MySQL service started! Nonetheless, I couldn't query any InnoDb table. Every time  I do, I got this error:
Lost connection to MySQL server during query (XXXX)
When I tried to use mysqldump, I got the following error:
Couldn't execute 'show table status like '__session'': Lost connection to MySQL server during query (XXXX)
I tried to ignore the __session table, but the next table came up with the same problem and so on.

It took me hours to look for solutions on the Internet. Again, the best solution was restore from backup. Another solutions is restore InnoDB from only an .ibd file. Solution from www.chriscalender.com would work well if I enable innodb_file_per_table option. This allowed each InnoDB table to have its own tablespace like MyISAM storage engine. There would be an .ibd file created for every InnoDB table. But it's too late now. I can still extract the data from the .ibd raw file, but it would cost me a lot of time and effort.

I almost gave up!

Until now, I still don't know what made me change the recovery mode to 1 and restart MySQL service. Eureka, MySQL server restarted without any error.

Now, I backup all databases using the following command:
mysqldump --routines --triggers --single-transaction --quick -uroot -p --all-databases > all-dbs-100716.sql

In order to restore, I shutdown MySQL service and renamed the crashed database datadir. I installed the new database structure and restored everything back to just before the crashed using the following commands:
sudo mysql_install_db
mysql -uroot mysql < all-dbs-100716.sql

Thanks God! After the investigation, the corrupted table was __session. Our PHP session data are stored in the database to enable session between sub-domain. Some data from this table is lost, but it doesn't matter at all.

Here are some lessons learned:
  1. Enable innodb_file_per_table. In the worst scenario, we can restore InnoDB from only an .ibd file.
  2. It's good that we backup our database daily. Here is a simple command which can be executed from cron job:
    mysqldump --routines --triggers --single-transaction --quick -uroot -pXXX database-name | gzip -9 > database-name-$(date +%F).sql.gz
  3. Change the InnoDB recovery mode from 1 to 6. If you cannot dump the databases, change back to mode 1. It might work! At least in our case. We already replicate the test with another server with the same crashed data.
  4. When doing the recovery, change MySQL port to any other port instead of 3306. This make sure that all other services cannot access MySQL. No DELETE/UPDATE/INSERT statements can be performed in recovery mode. But it doesn't hurt to take extra precautions.
  5. innodb_log_file_size is recommended to be about 25% of system RAM. Personally, I think this only apply to smaller/older server with less than 4GB RAM. In my case, I have innodb_log_file_size = 1GB even though my system have 16GB RAM. I don't think log file bigger than 1GB will do any benefit. However, setting the right figure would greatly reduce the insert and recovery time by at least 10 folds. Our database is about 2GB with more than 10 millions records. The default innodb_log_file_size is 5MB, which would takes forever to restore. I will do a test on this for the exact figures.
I am glad that I went through all these troubles. My old man told me over and over again that no pain, no gain!

No comments:

Post a Comment