MySQL Server’s built-in profiling support

MySQL’s SHOW PROFILES command and its profiling support is something that I can’t believe I hadn’t spotted before today.

It allows you to enable profiling for a session and then record performance information about the queries executed. It shows details of the different stages in the query execution (as usually displayed in the thread state output of SHOW PROCESSLIST) and how long each of these stages took.

I’ll demonstrate using an example. First within our session we need to enable profiling, you should only do this in sessions that you want to profile as there’s some overhead in performing/recording the profiling information:

mysql> SET profiling=1;
Query OK, 0 rows affected (0.00 sec)

Now let’s run a couple of regular SELECT queries

mysql> SELECT COUNT(*) FROM myTable WHERE extra LIKE '%zkddj%';
+----------+
| COUNT(*) |
+----------+
|        0 | 
+----------+
1 row in set (0.32 sec)

mysql> SELECT COUNT(id) FROM myTable;
+-----------+
| COUNT(id) |
+-----------+
|    513635 | 
+-----------+
1 row in set (0.00 sec)

Followed up with some stuff that we know’s going to execute a bit slower:

mysql> CREATE TEMPORARY TABLE foo LIKE myTable;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO foo SELECT * FROM myTable;
Query OK, 513635 rows affected (33.53 sec)
Records: 513635  Duplicates: 0  Warnings: 0

mysql> DROP TEMPORARY TABLE foo;
Query OK, 0 rows affected (0.06 sec)

Now we’ve run the queries let’s look at their summary with SHOW PROFILES

mysql> SHOW PROFILES;
+----------+-------------+-------------------------------------------------------------------+
| Query_ID | Duration    | Query                                                             |
+----------+-------------+-------------------------------------------------------------------+
|        1 |  0.33174700 | SELECT COUNT(*) FROM myTable WHERE extra LIKE '%zkddj%'           | 
|        2 |  0.00036600 | SELECT COUNT(id) FROM myTable                                     | 
|        3 |  0.00087700 | CREATE TEMPORARY TABLE foo LIKE myTable                           | 
|        4 | 33.52952000 | INSERT INTO foo SELECT * FROM myTable                             | 
|        5 |  0.06431200 | DROP TEMPORARY TABLE foo                                          | 
+----------+-------------+-------------------------------------------------------------------+
5 rows in set (0.00 sec)

It’s not like any of those numbers are a surprise as we saw them from the client but it’s a handy record of the execution times and could easily be queried within an application just before the connection to a db is closed or e.g. at the end of the web request.

We can also dig deeper into each of the commands, let’s look at the first query we ran:

mysql> SHOW PROFILE FOR QUERY 1;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000033 | 
| checking query cache for query | 0.000073 | 
| Opening tables                 | 0.000013 | 
| System lock                    | 0.000007 | 
| Table lock                     | 0.000035 | 
| init                           | 0.000032 | 
| optimizing                     | 0.000014 | 
| statistics                     | 0.000016 | 
| preparing                      | 0.000014 | 
| executing                      | 0.000009 | 
| Sending data                   | 0.331296 | 
| end                            | 0.000016 | 
| end                            | 0.000003 | 
| query end                      | 0.000005 | 
| storing result in query cache  | 0.000105 | 
| freeing items                  | 0.000012 | 
| closing tables                 | 0.000007 | 
| logging slow query             | 0.000003 | 
| logging slow query             | 0.000048 | 
| cleaning up                    | 0.000006 | 
+--------------------------------+----------+
20 rows in set (0.00 sec)

Looks like almost all of the time there was spent executing the query, definitely one worth investigating further with EXPLAIN

Now let’s look at the slow INSERT sub-select we ran to see what took the time. I’ve enabled CPU profiling here too.

mysql> SHOW PROFILE CPU FOR QUERY 4;
+----------------------+-----------+-----------+------------+
| Status               | Duration  | CPU_user  | CPU_system |
+----------------------+-----------+-----------+------------+
| starting             |  0.000069 |  0.000000 |   0.000000 | 
| checking permissions |  0.000010 |  0.000000 |   0.000000 | 
| Opening tables       |  0.000217 |  0.000000 |   0.000000 | 
| System lock          |  0.000006 |  0.000000 |   0.000000 | 
| Table lock           |  0.000014 |  0.000000 |   0.000000 | 
| init                 |  0.000041 |  0.000000 |   0.000000 | 
| optimizing           |  0.000007 |  0.000000 |   0.000000 | 
| statistics           |  0.000014 |  0.000000 |   0.000000 | 
| preparing            |  0.000013 |  0.000000 |   0.000000 | 
| executing            |  0.000006 |  0.000000 |   0.000000 | 
| Sending data         |  4.326303 |  3.544221 |   0.324020 | 
| Creating index       |  0.000029 |  0.000000 |   0.000000 | 
| Repair by sorting    | 29.202254 | 17.133071 |  11.616726 | 
| Saving state         |  0.000040 |  0.000000 |   0.000000 | 
| Creating index       |  0.000007 |  0.000000 |   0.000000 | 
| Sending data         |  0.000389 |  0.000000 |   0.000000 | 
| end                  |  0.000009 |  0.000000 |   0.000000 | 
| end                  |  0.000012 |  0.000000 |   0.000000 | 
| query end            |  0.000006 |  0.000000 |   0.000000 | 
| freeing items        |  0.000015 |  0.000000 |   0.000000 | 
| closing tables       |  0.000007 |  0.000000 |   0.000000 | 
| logging slow query   |  0.000005 |  0.000000 |   0.000000 | 
| logging slow query   |  0.000040 |  0.000000 |   0.000000 | 
| cleaning up          |  0.000007 |  0.000000 |   0.000000 | 
+----------------------+-----------+-----------+------------+
24 rows in set (0.00 sec)

It seems that building the indexes for the new table were what took the time. The General Thread States page of the MySQL Documentation is a useful reference. Interestingly we can see the “logging slow query” state here too, something that sails by too quickly to ever see when looking at SHOW PROCESSLIST output.

This profiling support doesn’t fulfil the same role as MySQL’s EXPLAIN command and is only useful in some places but if you were to look at implementing profiling or instrumentation for your app this could be really handy.

You can find the full documentation for MySQL’s profiling support at SHOW PROFILES Syntax it appears to be supported from at least MySQL 5.0 and it’s worth noting is only available in the MySQL Community (non-Enterprise) builds.

MERGE table gotcha with PRIMARY KEY

I know that MERGE isn’t everyone’s favourite storage engine but we use them quite extensively and ran into a problem the other day that I thought I’d share.

A common usage pattern is to CREATE TEMPORARY TABLE LIKE an existing MyISAM table and then ALTER TABLE to turn it into a MERGE table and specify it’s UNION. Like this:

CREATE TABLE tmpMerge LIKE a;
ALTER TABLE tmpMerge ENGINE=MERGE, UNION=(a,b);

It’s a pattern we’ve been using for ages and seems to work well in most cases.

In one application we keep aggregated data in daily tables (inspired by this article). Within the application we work out which of the daily tables we need and build them into a temporary merge table before querying it. There’s more to it than that (there’s a pyramid of daily/monthly/yearly tables) but they’re not relevant right now.

You can replicate our usage pattern (and the bug we ran into) with this SQL:

CREATE TABLE clicks_20101004 (
   user_id INT,
   total_clicks INT,
   PRIMARY KEY (user_id)
) ENGINE=MyISAM;
INSERT INTO clicks_20101004 VALUES (1,10), (2,10);

CREATE TABLE clicks_20101005 (
   user_id INT,
   total_clicks INT,
   PRIMARY KEY (user_id)
) ENGINE=MyISAM;
INSERT INTO clicks_20101005 VALUES (1,10), (2,10), (3,10);

CREATE TABLE merge_demo LIKE clicks_20101005;
ALTER TABLE merge_demo ENGINE=MERGE, UNION=(clicks_20101004, clicks_20101005);

The table was created fine and querying the data it looks like everything’s happy:

mysql> SELECT * FROM merge_demo;
+---------+--------------+
| user_id | total_clicks |
+---------+--------------+
|       1 |           10 | 
|       2 |           10 | 
|       1 |           10 | 
|       2 |           10 | 
|       3 |           10 | 
+---------+--------------+
5 rows in set (0.00 sec)

mysql> SELECT user_id, SUM(total_clicks) FROM merge_demo GROUP BY user_id;
+---------+-------------------+
| user_id | SUM(total_clicks) |
+---------+-------------------+
|       1 |                20 | 
|       2 |                20 | 
|       3 |                10 | 
+---------+-------------------+
3 rows in set (0.00 sec)

All OK so far. Let’s try another query:

mysql> SELECT * FROM merge_demo WHERE user_id=1;
+---------+--------------+
| user_id | total_clicks |
+---------+--------------+
|       1 |           10 | 
+---------+--------------+
1 row in set (0.00 sec)

mysql> SELECT user_id, SUM(total_clicks) FROM merge_demo WHERE user_id=2 GROUP BY user_id;
+---------+-------------------+
| user_id | SUM(total_clicks) |
+---------+-------------------+
|       2 |                10 | 
+---------+-------------------+
1 row in set (0.00 sec)

Neither of these give the result we’d expect and it looks like it has something to do with the WHERE clause. The problem is the PRIMARY key that’s been defined in the parent MERGE. The documentation does talk about this and the inability for MERGE tables to enforce PRIMARY keys across their underlying tables.

If we want to still take advantage of the index in the user_id column we can fix it but replacing the PRIMARY key with a normal index like this:

mysql> ALTER TABLE merge_demo DROP PRIMARY KEY, ADD KEY (user_id);
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

Trying our problem queries again they seem to work fine:

mysql> SELECT * FROM merge_demo WHERE user_id=1;
+---------+--------------+
| user_id | total_clicks |
+---------+--------------+
|       1 |           10 | 
|       1 |           10 | 
+---------+--------------+
2 rows in set (0.00 sec)

mysql> SELECT user_id, SUM(total_clicks) FROM merge_demo WHERE user_id=2 GROUP BY user_id;
+---------+-------------------+
| user_id | SUM(total_clicks) |
+---------+-------------------+
|       2 |                20 | 
+---------+-------------------+
1 row in set (0.00 sec)

In terms of a generic fix in your application for this type of usage pattern I’d suggest you read in the output of SHOW CREATE TABLE a; and string substitute out the PRIMARY KEY, you could even tie in the ENGINE and UNION too and reduce the requirement for any ALTER TABLE to be performed.

What makes a good MySQL index? Part 2: Cardinality

edit: added “Low cardinality isn’t always bad” section after Morgan’s comment

As we’ve seen already column size is important for indexes. Cardinality is really important too, it’s the uniqueness of the values included in the index.

Indexes are used by MySQL (and in any RDBMS) to help find rows quickly. We want to make it as easy as possible for MySQL to find the relevant rows, the more precise or specific we are the less the number of rows MySQL has to fetch.

Example Table

For this post I’ve created a table with some dummy data in it. Here’s the SQL I used to create it (in case you want to) which was generated with this hack script.

The table is fairly simple. Note the three indexes: PRIMARY plus two composite indexes on (sex, age) and (age, sex)

CREATE TABLE `user` (
  `id` mediumint(9) NOT NULL auto_increment,
  `name` varchar(20) default NULL,
  `age` tinyint(3) unsigned default NULL,
  `sex` enum('MALE','FEMALE') default NULL,
  PRIMARY KEY  (`id`),
  KEY `age_sex` (`age`,`sex`),
  KEY `sex_age` (`sex`,`age`)
) ENGINE=MyISAM AUTO_INCREMENT=4917 DEFAULT CHARSET=utf8

If we run ANALYZE TABLE on the table and then SHOW INDEXES we can see the following:

mysql> SHOW INDEXES FROM user;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| user  |          0 | PRIMARY  |            1 | id          | A         |        4916 |     NULL | NULL   |      | BTREE      |         |
| user  |          1 | age_sex  |            1 | age         | A         |         100 |     NULL | NULL   | YES  | BTREE      |         |
| user  |          1 | age_sex  |            2 | sex         | A         |         196 |     NULL | NULL   | YES  | BTREE      |         |
| user  |          1 | sex_age  |            1 | sex         | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |
| user  |          1 | sex_age  |            2 | age         | A         |         196 |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.00 sec)

Assuming the male:female ratio in our dataset is 50:50 then a query with WHERE sex="MALE" is only ever going to help us find 50% of the rows. MySQL will then have to sift through that 50% of rows. Let’s demonstrate that:

mysql> EXPLAIN SELECT id FROM user WHERE sex="MALE";
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | user  | ref  | sex_age       | sex_age | 2       | const | 3335 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

If age has a fairly even distribution across the age range you could use a query with WHERE age=31 and MySQL should be able to limit the relevant rows down to just ~1% of the dataset. This will mean that any further searching, either using the next part of a composite index or analysis or scanning the records will be much faster. Again EXPLAIN proves this:

mysql> EXPLAIN SELECT id FROM user WHERE age=32;
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | user  | ref  | age_sex       | age_sex | 2       | const |    5 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

Cardinality and composite indexes

Keeping with our example dataset if we commonly query the table knowing the age and sex of the people. A typical query might look like:

SELECT id FROM people WHERE age=31 AND sex="MALE";

We’ve got the two composite indexes on the table (age, sex) and (sex, age) but which one will work best for this results. Based around everything learnt so far in this post (age, sex) should do. First let’s prove that the MySQL Optimiser agrees:

mysql> EXPLAIN SELECT id FROM user WHERE age=32 AND sex="MALE";
+----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys   | key     | key_len | ref         | rows | Extra       |
+----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+
|  1 | SIMPLE      | user  | ref  | age_sex,sex_age | age_sex | 4       | const,const |    2 | Using where |
+----+-------------+-------+------+-----------------+---------+---------+-------------+------+-------------+
1 row in set (0.00 sec)

How do I know what cardinality my data has?

If you’re looking at indexes that are there already ANALYZE TABLE followed by SHOW INDEXES FROM user as above will do the trick. If the data isn’t there yet you can do a simple SELECT DISTINCT like this:

mysql> SELECT COUNT(DISTINCT age) AS cardinality FROM user;
+-------------+
| cardinality |
+-------------+
|         100 |
+-------------+
1 row in set (0.00 sec)

Why isn’t MySQL using my index?

Just because you have the index there it doesn’t mean that MySQL will decide to use it. Have a look at the following similar queries:

mysql> EXPLAIN SELECT id FROM user WHERE age>80;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | user  | range | age_sex       | age_sex | 2       | NULL |  764 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT id FROM user WHERE age<80;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | user  | ALL  | age_sex       | NULL | NULL    | NULL | 4916 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

The first query uses the index but the second one chooses to do a table scan. That’s no co-incidence, MySQL is smart enough to know about the distribution of the data and decides that it’d be faster to just munch through all of the data rather than the 80% that it could return from the index lookup. When our query is restrictive however with the >80 query it knows it’ll be better to use the index. Most of the time the optimizer gets it right!

There’s something very important to learn from this issue of distribution of data. Unless you’re checking query optimisation with real world queries and a real world dataset then you’re not optimising properly! In some cases it could pay to create some sample data (if you can predict what it’ll be like) with a simple script like the one I spun up for the user table we used as an example.

Low cardinality isn’t always bad

It could be that although the cardinality is low the selectivity is still high because of the distribution of data. Let’s change the characteristics of the sex column in the table:

mysql> UPDATE user SET SEX="MALE";
Query OK, 2503 rows affected (0.58 sec)
Rows matched: 4916  Changed: 2503  Warnings: 0

mysql> UPDATE user SET SEX="FEMALE" WHERE id<10;
Query OK, 9 rows affected (0.00 sec)
Rows matched: 9  Changed: 9  Warnings: 0

mysql> SELECT sex, COUNT(*) FROM user GROUP BY sex;
+--------+----------+
| sex    | COUNT(*) |
+--------+----------+
| MALE   |     4907 | 
| FEMALE |        9 | 
+--------+----------+
2 rows in set (0.00 sec)

We’ve now got low cardinality (just two possible values) but if we’re looking for just "FEMALE" rows then they should be easy to find with this index. Again EXPLAIN agrees:

mysql> EXPLAIN SELECT id FROM user WHERE sex="MALE";
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | user  | ALL  | sex_age       | NULL | NULL    | NULL | 3687 | Using where | 
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.11 sec)

mysql> EXPLAIN SELECT id FROM user WHERE sex="FEMALE";
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | user  | ref  | sex_age       | sex_age | 2       | const |    9 | Using where | 
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

What makes a good MySQL index? Part 1: Column Size

We all know that it’s good practice for queries to use indexes but it’s not that clear-cut, there are good indexes and poor indexes. By making indexes more efficient you stand to increase query speeds and the places that MySQL can use the indexes effectively.

Column Size

The size of the column or columns you’re indexing is relevant because the less data the database server has to search through or index the faster it’ll be and the less storage you’ll use on disk. In this first post we’ll just be concentrating on column size optimizations.

There’s a full list of all data types and how much storage they require in the MySQL documentation: Data Type Storage Requirements

Common places where the wrong data type could be specified include:

Numeric types

This is probably the biggest offender and most mis-understood. Use the table on the Numeric Types page in the MySQL documentation to work out which the smallest field type is. For example if you were storing a person’s age TINYINT UNSIGNED with a range of 0-255 would make sense instead of wasting 3 more bytes by using and INT field.

Date and Time Types

If you’re storing a DATE (3 bytes) don’t store it in a DATETIME (8 bytes) column.

Many times the data you’re storing in a DATETIME (8 bytes) field could fit in a TIMESTAMP (4 bytes) field though they don’t behave in exactly the same way. Check the documentation for details.

Enumerated values

You might be storing string values that have a really low entropy. Let’s say you have an office field in your database that could either be “London” or “Paris”. It makes more sense to keep it in an ENUM(“London”, “Paris”) (1 byte) than a VARCHAR()

IP addresses

An IPv4 address is just 4 bytes of data but to make it more human readable we usually talk about it in four integers of 0-255. A good way to store this data would be in an INT UNSIGNED and to use the handy INET_ATON() and INET_NTOA() functions to convert the INT data to/from a human readable IP address.

MD5, SHA1 or other hashes

As with IP addresses the hexadecimal representation of the hash we’re used to is just to make it more human readable. To illustrate this an MD5 value could either be stored in a VARCHAR(32) or a much more efficient BINARY(16), an SHA1 will fit into a BINARY(20). You can use the functions HEX() and UNHEX() to cast to/from the hex/binary format.

PROCEDURE ANALYSE

PROCEDURE ANALYSE is built into MySQL and can help you a lot with choosing the best data type if the table/data exists already.

Let’s use wp_options a table which is shipped as part of WordPress as an example. Here’s the table structure:

CREATE TABLE `wp_options` (
`option_id` bigint(20) unsigned NOT NULL auto_increment,
`blog_id` int(11) NOT NULL default '0',
`option_name` varchar(64) NOT NULL default '',
`option_value` longtext NOT NULL,
`autoload` varchar(20) NOT NULL default 'yes',
PRIMARY KEY  (`option_id`,`blog_id`,`option_name`),
KEY `option_name` (`option_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

and here’s how we’d analyse selected columns in there:

mysql> SELECT option_id, blog_id, autoload FROM wp_options PROCEDURE ANALYSE() \G
*************************** 1. row ***************************
Field_name: foo.wp_options.option_id
Min_value: 1
Max_value: 3860
Min_length: 1
Max_length: 4
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 1169.5359
Std: 1416.1368
Optimal_fieldtype: SMALLINT(4) UNSIGNED NOT NULL
*************************** 2. row ***************************
Field_name: foo.wp_options.blog_id
Min_value: 0
Max_value: 0
Min_length: 1
Max_length: 1
Empties_or_zeros: 181
Nulls: 0
Avg_value_or_avg_length: 0.0000
Std: 0.0000
Optimal_fieldtype: ENUM('0') NOT NULL
*************************** 3. row ***************************
Field_name: foo.wp_options.autoload
Min_value: no
Max_value: yes
Min_length: 2
Max_length: 3
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 2.7514
Std: NULL
Optimal_fieldtype: ENUM('no','yes') NOT NULL
3 rows in set (0.00 sec)

It looks like there were some good recommendations from it for the option_id and autoload columns but that the blog_id field it got wrong. Presumably there’s just one blog with id=0 in there at the moment so it’s understandable.