BizarroMaybeMySQLBug

Webs: Faemalia -:- Greatprawn -:- Playground -:- Technical -:- Tweak
Technical Web Sections: Register -:- Users -:- Changes -:- Index -:- Search -:- Statistics

The Problem

Here is something interesting. I guess it's not even a question so much as just a Very Strange Incident. There might be a MySQL bug behind it...?

I have a very simple database on 4.1.15 with a single InnoDB table for storing web sessions. We let them stack up over a long time, so now I'm deleting out the earliest rows starting from the minimum expiry value in the DB with queries like this:

select 'Working on 1136370000';delete from sessions where expiry <= 1136370000;
select 'Working on 1136373600';delete from sessions where expiry <= 1136373600;
select 'Working on 1136377200';delete from sessions where expiry <= 1136377200;

The table looks like this:

CREATE TABLE `sessions` (
  `sesskey` varchar(32) NOT NULL default '',
  `expiry` int(11) unsigned NOT NULL default '0',
  `value` text NOT NULL,
  PRIMARY KEY  (`sesskey`),
  KEY `expiry` (`expiry`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Here's what the graph looks like:

So I started the delete at about 17:00 yesterday. It caused "read next" to pick up, which is completely expected when using a range operator "<=" on an indexed column. Then at 01:00, we get spikes of "read rnd next," which indicate table scans, ie: a table for which no index is appropriate. This caused big DB problems. Then at about 15:00 today, I start running the DELETEs again, and all is well.

We are pretty sure this was caused by my DELETEing of old data, because here's the last two weeks (the missing data is an unrelated problem):

Notice that there were no problems until I began running my DELETEs, and then only starting during the LOWEST point of DB usage. There were no changes to the code, there are no other databases in this MySQL instance.

It's as if the "expiry" index became non-existant or unusable for a while. Seems bizarre, no?

Afterward

Changing the number of rows to delete at a time to a much smaller amount (by changing the interval of expiry to 600 rather than 3600) made the deletion of rows go far faster. This makes sense to me at a basic level, because the size of the transaction would be far smaller. I wonder if perhaps the large transaction size caused the table scans for some reason? Perhaps due to what percentage of rows were locked for DELETE?

Peter Zaitsev points out that rather than delete by a time interval, I might just delete a constant number of rows each time, like this:

delete from sessions where expiry <= unix_timestamp(now()) - 259200 limit 1000;
delete from sessions where expiry <= unix_timestamp(now()) - 259200 limit 1000;
delete from sessions where expiry <= unix_timestamp(now()) - 259200 limit 1000;

This seems the best method. You can pick the number of rows to delete each time, say 1,000 or 5,000. Then you're guaranteed to never end up deleting too many rows at one time (which may well have been what caused this issue).


Edit -:- Attach -:- Ref-By -:- Printable -:- More