MERGE table gotcha with PRIMARY KEY

I know that MERGE isn’t everyone’s favourite storage engine but we use them quite extensively and ran into a problem the other day that I thought I’d share.

A common usage pattern is to CREATE TEMPORARY TABLE LIKE an existing MyISAM table and then ALTER TABLE to turn it into a MERGE table and specify it’s UNION. Like this:

CREATE TABLE tmpMerge LIKE a;
ALTER TABLE tmpMerge ENGINE=MERGE, UNION=(a,b);

It’s a pattern we’ve been using for ages and seems to work well in most cases.

In one application we keep aggregated data in daily tables (inspired by this article). Within the application we work out which of the daily tables we need and build them into a temporary merge table before querying it. There’s more to it than that (there’s a pyramid of daily/monthly/yearly tables) but they’re not relevant right now.

You can replicate our usage pattern (and the bug we ran into) with this SQL:

CREATE TABLE clicks_20101004 (
   user_id INT,
   total_clicks INT,
   PRIMARY KEY (user_id)
) ENGINE=MyISAM;
INSERT INTO clicks_20101004 VALUES (1,10), (2,10);

CREATE TABLE clicks_20101005 (
   user_id INT,
   total_clicks INT,
   PRIMARY KEY (user_id)
) ENGINE=MyISAM;
INSERT INTO clicks_20101005 VALUES (1,10), (2,10), (3,10);

CREATE TABLE merge_demo LIKE clicks_20101005;
ALTER TABLE merge_demo ENGINE=MERGE, UNION=(clicks_20101004, clicks_20101005);

The table was created fine and querying the data it looks like everything’s happy:

mysql> SELECT * FROM merge_demo;
+---------+--------------+
| user_id | total_clicks |
+---------+--------------+
|       1 |           10 | 
|       2 |           10 | 
|       1 |           10 | 
|       2 |           10 | 
|       3 |           10 | 
+---------+--------------+
5 rows in set (0.00 sec)

mysql> SELECT user_id, SUM(total_clicks) FROM merge_demo GROUP BY user_id;
+---------+-------------------+
| user_id | SUM(total_clicks) |
+---------+-------------------+
|       1 |                20 | 
|       2 |                20 | 
|       3 |                10 | 
+---------+-------------------+
3 rows in set (0.00 sec)

All OK so far. Let’s try another query:

mysql> SELECT * FROM merge_demo WHERE user_id=1;
+---------+--------------+
| user_id | total_clicks |
+---------+--------------+
|       1 |           10 | 
+---------+--------------+
1 row in set (0.00 sec)

mysql> SELECT user_id, SUM(total_clicks) FROM merge_demo WHERE user_id=2 GROUP BY user_id;
+---------+-------------------+
| user_id | SUM(total_clicks) |
+---------+-------------------+
|       2 |                10 | 
+---------+-------------------+
1 row in set (0.00 sec)

Neither of these give the result we’d expect and it looks like it has something to do with the WHERE clause. The problem is the PRIMARY key that’s been defined in the parent MERGE. The documentation does talk about this and the inability for MERGE tables to enforce PRIMARY keys across their underlying tables.

If we want to still take advantage of the index in the user_id column we can fix it but replacing the PRIMARY key with a normal index like this:

mysql> ALTER TABLE merge_demo DROP PRIMARY KEY, ADD KEY (user_id);
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

Trying our problem queries again they seem to work fine:

mysql> SELECT * FROM merge_demo WHERE user_id=1;
+---------+--------------+
| user_id | total_clicks |
+---------+--------------+
|       1 |           10 | 
|       1 |           10 | 
+---------+--------------+
2 rows in set (0.00 sec)

mysql> SELECT user_id, SUM(total_clicks) FROM merge_demo WHERE user_id=2 GROUP BY user_id;
+---------+-------------------+
| user_id | SUM(total_clicks) |
+---------+-------------------+
|       2 |                20 | 
+---------+-------------------+
1 row in set (0.00 sec)

In terms of a generic fix in your application for this type of usage pattern I’d suggest you read in the output of SHOW CREATE TABLE a; and string substitute out the PRIMARY KEY, you could even tie in the ENGINE and UNION too and reduce the requirement for any ALTER TABLE to be performed.

About these ads

About James Cohen
LAMP geek with interests in building scalable web applications

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: