Query optimization in MySQL with Subqueries

Recently I had to fetch data from several tables using ORDER BY and LIMIT statements in MySQL. The problem with LIMIT statement is that it's applied in the last step after all the rows are fetched. I needed to fetch rows that fit in boundaries specified by x1,y1,x2,y2 as well as information about the writer of these entries. Tables used in the process were:

CREATE TABLE `location` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `lat` double NOT NULL,
  `lng` double NOT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_latLng` (`lat`,`lng`),
  KEY `idx_lat` (`lat`),
  KEY `idx_lng` (`lng`)
) ENGINE=InnoDB

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `fullName` varchar(255) DEFAULT NULL,
  `gender` varchar(1) DEFAULT NULL,
  `birthdate` date DEFAULT NULL,
  `email` varchar(255) DEFAULT NULL,
  `username` varchar(255) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL,
  `language` varchar(3) DEFAULT 'en',
  `createdate` datetime NOT NULL,
  `lastaccessdate` datetime NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `email_uk` (`email`),
  UNIQUE KEY `userName_uk` (`username`)
) ENGINE=InnoDB

CREATE TABLE `entry` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `content` text NOT NULL,
  `title` varchar(4000) DEFAULT NULL,
  `location` int(11) NOT NULL,
  `user` int(11) NOT NULL,
  `createdate` datetime NOT NULL,
  `lastmodifieddate` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `entry_user_fk` (`user`),
  KEY `entry_location_fk` (`location`),
  KEY `idx_createdate` (`createdate`),
  CONSTRAINT `entry_location_fk` FOREIGN KEY (`location`) REFERENCES `location` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `entry_user_fk` FOREIGN KEY (`user`) REFERENCES `user` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB

Location table has ~250 records, entry table has ~13K records, and user table has ~50 records. This is auto-generated test data that we've used for development purposes.

The initial query was:

select e.*, u.*, l.*
from entry e, user u, location l
where e.user = u.id and e.location = l.id and ((l.lat between x1 and x2) and (l.lng between y1 and y2))
order by e.id desc
limit 0,10

It took ~1.5 second to complete. It is not very very bad but it must be improved. Then I decided to use location as a derived table:

select e.*, u.*, l.*
from entry e, user u, (select l2.* from location l2 where (l2.lat between x1 and x2) and (l2.lng between y1 and y2)) l 
where e.user = u.id and l.id = e.location
order by e.id desc
limit 0,10

This last one took ~1.5-2 seconds. The last change clearly didn't help very much!

I only need 10 rows from thousands so why to join thousands of rows then limit to 10? LIMIT statement can be moved into a derived table too:

select e.*, u.*, l.*
from (
    select * from e2 
    where e2.location in (select l2.id from location l2 where (l2.lat between x1 and x2) and (l2.lng between y1 and y2))
    order by e2.id desc
    limit 0,10
) e, user u, location l
where e.user = u.id and l.id = e.location
order by e.id desc

The last query took ~0.2 second to complete. Query times were obtained from MySQL profiler. Difference between queries can also be viewed using EXPLAIN statement:

Query 1:
+----+-------------+-------+--------+------------------------------------+--------------------+---------+--------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                      | key                | key_len | ref                | rows | Extra
                             |
+----+-------------+-------+--------+------------------------------------+--------------------+---------+--------------------+------+----------------------------------------------+
|  1 | SIMPLE      | l     | range  | PRIMARY,idx_latLng,idx_lat,idx_lng | idx_latLng         | 16      | NULL               |   68 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | e     | ref    | entry_user_fk,entry_location_fk  | entry_location_fk | 4       | test.l.id |  149 |
                             |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY                            | PRIMARY            | 4       | test.e.user     |    1 |
                             |
+----+-------------+-------+--------+------------------------------------+--------------------+---------+--------------------+------+----------------------------------------------+

Query 2:
+----+-------------+------------+--------+-----------------------------------+--------------------+---------+----------------+------+---------------------------------+
| id | select_type | table      | type   | possible_keys                     | key                | key_len | ref            | rows | Extra
                |
+----+-------------+------------+--------+-----------------------------------+--------------------+---------+----------------+------+---------------------------------+
|  1 | PRIMARY     |  | ALL    | NULL                              | NULL               | NULL    | NULL           |   23 | Using temporary;
 Using filesort |
|  1 | PRIMARY     | e          | ref    | entry_user_fk,entry_location_fk | entry_location_fk | 4       | l.id  |  149 |
                |
|  1 | PRIMARY     | u          | eq_ref | PRIMARY                           | PRIMARY            | 4       | test.e.user |    1 |
                |
|  2 | DERIVED     | l2        | ALL    | idx_latLng,idx_lat,idx_lng        | NULL               | NULL    | NULL           |  268 | Using where
                |
+----+-------------+------------+--------+-----------------------------------+--------------------+---------+----------------+------+---------------------------------+

Query 3:
+----+--------------------+------------+-----------------+------------------------------------+---------+---------+---------------+------+----------------+
| id | select_type        | table      | type            | possible_keys                      | key     | key_len | ref           | rows | Extra
    |
+----+--------------------+------------+-----------------+------------------------------------+---------+---------+---------------+------+----------------+
|  1 | PRIMARY            |  | ALL             | NULL                               | NULL    | NULL    | NULL          |   10 | Using filesort |
|  1 | PRIMARY            | u          | eq_ref          | PRIMARY                            | PRIMARY | 4       | e.id     |    1 |
    |
|  1 | PRIMARY            | l2        | eq_ref          | PRIMARY                            | PRIMARY | 4       | e.id |    1 |
    |
|  2 | DERIVED            | e2        | index           | NULL                               | PRIMARY | 4       | NULL          |   10 | Using where
    |
|  3 | DEPENDENT SUBQUERY | l          | unique_subquery | PRIMARY,idx_latLng,idx_lat,idx_lng | PRIMARY | 4       | func          |    1 | Using where
    |
+----+--------------------+------------+-----------------+------------------------------------+---------+---------+---------------+------+----------------+
M. Serdar Biçer @msbicer

Comments

Loz said…
How's that different from

select
e.*,
u.*,
l.*
from
entry e,
user u,
location l
where
l.lat between x1 and x2 and
l.lng between y1 and y2 and
l.id = e.location and
e.user = u.id
order by
e.id desc
limit
0, 10
Harrison said…
Why not just write it as a join without any subqueries at all:

select e.*, u.*, l.* from entry e, user u, location l where e.location = l.id and l.lat between x1 and x2 and l.lng between y1 and y2 and e.user = u.id order by e.id desc limit 0,10
Ellis said…
I think you have some typo's in your queries.
BrianDemant said…
The reason for the slownes of the first query is that you for got to "connect" the location (l) to the rest so you would get sizeof(e joined u) * size(l) records.
(if you dont understand then try to limit your records to a few in each table and run the query without limit)

add "and e.location = l.id" and it will take about a few ms (on my machine) just like the suggested join.
BrianDemant said…
woops .. I forgot to put SQL_NO_CACHE when I benchmarked.

Also forgot to say that you are correct .. the final solution is the best/quickest.
Gerger said…
@Loz and @Harrison,

Good point! Actually, in early stages of our development, it was accomplished using two different queries and we needed to combine them to a single query. It was just a scratch and maybe it caused a bias which prevented me to see shorter way.

@Ellis,

Thanks for your warning. They should be fixed now.
Gerger said…
I also replaced the 1st query with suggested queries. Thanks for feedbacks.
Matt Roberts said…
Yeah it is. Wew, I thought that I was certain for error knowing that it was the right one. Anyway Gerger, could you work for the 2nd query too. It was just a suggestion and answers will be much appreciated. Thanks.

Popular posts from this blog

Powerful Free Webinar Network for Oracle Developers

PL/SQL Developers! Formspider 1.9 is Out