The site crashed yesterday morning at around 9am, although it’s hardly noticable until a couple hours later. I didn’t notice it until I started picking stories to feature on the homepage. Whenever i tried to feature an article, it would stop the site from loading up.
From that, I knew that the MySQL database must be checking for corrupt tables. And if it’s doing that, a crash must have occurred. Checking out stats at my ISP confirmed that the server was restarted at 9am.
When a table’s being checked, it still could be read, so people can still access the site. It’s just when you I tried to update it, it would hold up everything else from running.
Eventually, one of the tables was found to be corrupted and needed to be repaired. When a table’s being repaired, you can’t read or write to it, so it’s down time for the site until it’s fully repaired. That took about three hours.
One of the worst things you can do while a table’s being repaired is to cancel it or restart the server. The entire corrupted table could be deleted, and there’s no way to undo that. I found that out the hard way with a previous site outage. You just have to wait it out, unless you have a backup that you can readily deploy.
Still, three hours is a long time to wait. When I checked a list of running processes, I spotted corrupted table with “repair with keycache.” Apparently, that’s a really slow way to repair a table.
The way to avoid that is to add or modify a couple settings in MySQL: “myisam_max_sort_file_size” and “myisam_max_extra_sort_file_size.”
Set those values to be higher than the size of the largest index in your database.
The index for the corrupt table that was being repaired was about 1GB, and that took three hours. My largest index is 15GB. That would take a couple days to complete if that table gets corrupted. That wouldn’t be fun.
I upgraded my storage space and set the values for the two config settings to 20GB. Should that 15GB table get corrupted, a repair job should only take hours, not days.