15 Ways to Optimize Your SQL Queries

Previous article was on 10 Ways To Destroy A SQL Database that sort of teaches you what mistakes many company might make on their database that will eventually lead to a database destroy. In this article, you will get to know 15 ways to optimize your SQL queries. Many ways are common to optimize a query while others are less obvious.

Indexes

Index your column is a common way to optimize your search result. Nonetheless, one must fully understand how does indexing work in each database in order to fully utilize indexes. On the other hand, useless and simply indexing without understanding how it work might just do the opposite.

Symbol Operator

Symbol operator such as >,<,=,!=, etc. are very helpful in our query. We can optimize some of our query with symbol operator provided the column is indexed. For example,

SELECT * FROM TABLE WHERE COLUMN > 16

Now, the above query is not optimized due to the fact that the DBMS will have to look for the value 16 THEN scan forward to value 16 and below. On the other hand, a optimized value will be

SELECT * FROM TABLE WHERE COLUMN >= 15

This way the DBMS might jump straight away to value 15 instead. It's pretty much the same way how we find a value 15 (we scan through and target ONLY 15) compare to a value smaller than 16 (we have to determine whether the value is smaller than 16; additional operation).

Wildcard

In SQL, wildcard is provided for us with '%' symbol. Using wildcard will definitely slow down your query especially for table that are really huge. We can optimize our query with wildcard by doing a postfix wildcard instead of pre or full wildcard.

#Full wildcard
SELECT * FROM TABLE WHERE COLUMN LIKE '%hello%';
#Postfix wildcard
SELECT * FROM TABLE WHERE COLUMN LIKE  'hello%';
#Prefix wildcard
SELECT * FROM TABLE WHERE COLUMN LIKE  '%hello';

That column must be indexed for such optimize to be applied.

P.S: Doing a full wildcard in a few million records table is equivalence to killing the database.

NOT Operator

Try to avoid NOT operator in SQL. It is much faster to search for an exact match (positive operator) such as using the LIKE, IN, EXIST or = symbol operator instead of a negative operator such as NOT LIKE, NOT IN, NOT EXIST or != symbol. Using a negative operator will cause the search to find every single row to identify that they are ALL not belong or exist within the table. On the other hand, using a positive operator just stop immediately once the result has been found. Imagine you have 1 million record in a table. That's bad.

COUNT VS EXIST

Some of us might use COUNT operator to determine whether a particular data exist

SELECT COLUMN FROM TABLE WHERE COUNT(COLUMN) > 0

Similarly, this is very bad query since count will search for all record exist on the table to determine the numeric value of field 'COLUMN'. The better alternative will be to use the EXIST operator where it will stop once it found the first record. Hence, it exist.

Wildcard VS Substr

Most developer practiced Indexing. Hence, if a particular COLUMN has been indexed, it is best to use wildcard instead of substr.

#BAD
SELECT * FROM TABLE WHERE  substr ( COLUMN, 1, 1 ) = 'value'.

The above will substr every single row in order to seek for the single character 'value'. On the other hand,

#BETTER
SELECT * FROM TABLE WHERE  COLUMN = 'value%'.

Wildcard query will run faster if the above query is searching for all rows that contain 'value' as the first character. Example,

#SEARCH FOR ALL ROWS WITH THE FIRST CHARACTER AS 'E'
SELECT * FROM TABLE WHERE  COLUMN = 'E%'.

Index Unique Column

Some database such as MySQL search better with column that are unique and indexed. Hence, it is best to remember to index those columns that are unique. And if the column is truly unique, declare them as one. However, if that particular column was never used for searching purposes, it gives no reason to index that particular column although it is given unique.

Max and Min Operators

Max and Min operators look for the maximum or minimum value in a column. We can further optimize this by placing a indexing on that particular columnMisleading We can use Max or Min on columns that already established such Indexes. But if that particular column is frequently use, having an index should help speed up such searching and at the same time speed max and min operators. This makes searching for maximum or minimum value faster. Deliberate having an index just to speed up Max and Min is always not advisable. Its like sacrifice the whole forest for a merely a tree.

Data Types

Use the most efficient (smallest) data types possible. It is unnecessary and sometimes dangerous to provide a huge data type when a smaller one will be more than sufficient to optimize your structure. Example, using the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space. On the other hand, VARCHAR will be better than longtext to store an email or small details.

Primary Index

The primary column that is used for indexing should be made as short as possible. This makes identification of each row easy and efficient by the DBMS.

String indexing

It is unnecessary to index the whole string when a prefix or postfix of the string can be indexed instead. Especially if the prefix or postfix of the string provides a unique identifier for the string, it is advisable to perform such indexing. Shorter indexes are faster, not only because they require less disk space, but because they also give you more hits in the index cache, and thus fewer disk seeks.

Limit The Result

Another common way of optimizing your query is to minimize the number of row return. If a table have a few billion records and a search query without limitation will just break the database with a simple SQL query such as this.

SELECT * FROM TABLE

Hence, don't be lazy and try to limit the result turn which is both efficient and can help minimize the damage of an SQL injection attack.

SELECT * FROM TABLE WHERE 1 LIMIT 10

Use Default Value

If you are using MySQL, take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.

In Subquery

Some of us will use a subquery within the IN operator such as this.

SELECT * FROM TABLE WHERE COLUMN IN (SELECT COLUMN FROM TABLE)

Doing this is very expensive because SQL query will evaluate the outer query first before proceed with the inner query. Instead we can use this instead.

SELECT * FROM TABLE, (SELECT COLUMN FROM TABLE) as dummytable WHERE dummytable.COLUMN = TABLE.COLUMN;

Using dummy table is better than using an IN operator to do a subquery. Alternative, an exist operator is also better.

Utilize Union instead of OR

Indexes lose their speed advantage when using them in OR-situations in MySQL at least. Hence, this will not be useful although indexes is being applied

SELECT * FROM TABLE WHERE COLUMN_A = 'value' OR COLUMN_B = 'value'

On the other hand, using Union such as this will utilize Indexes.

SELECT * FROM TABLE WHERE COLUMN_A = 'value'
UNION
SELECT * FROM TABLE WHERE COLUMN_B = 'value'

Hence, run faster.

Summary

Definitely, these optimization tips doesn't guarantee that your queries won't become your system bottleneck. It will require much more benchmarking and profiling to further optimize your SQL queries. However, the above simple optimization can be utilize by anyone that might just help save some colleague rich bowl while you learn to write good queries. (its either you or your team leader/manager)

nice post! i had no idea about #2 and #4...

also, some discoveries i've made about queries:

- UNION is slow, try to use stored procedures: i've not tested this one on MySQL, but in PostGreSQL, it is much faster to use a SP than an union. Example:
SP:
SELECT getValue(column1) AS name, Operation(column2) AS number_of_products

is much much faster than:
(SELECT name, 0 AS number_of_products FROM customer WHERE a=b) UNION (SELECT 0 AS name, number_of_products FROM products WHERE y=z)

However, like i said, i've not tested this one on MySQL.

- PHP and MySQL: ok, strictly this isn't a part of query optimization, but it is always better to just retrieve the rows we are interested in, instead of all rows. Why? Memory (PHP and MySQL) and bandwidth consumption.
Example:
best:
SELECT id, name FROM customer
instead of:
SELECT * FROM customer
(Assuming "customer" haves an id, name, last_name, description, login, password, e-mail, etc.)

It also makes the code cleaner.

INNER/LEFT/RIGHT JOIN: ok, this is an untested one. A while ago, i was asked to optimize a MySQL query. Originally, i used joins, but turned out it was much faster to use them in the WHERE part.
Example:
Slower:
SELECT * FROM customer AS a LEFT JOIN products AS b ON a.id = b.id_customer
Faster:
SELECT * FROM customer AS a, products AS b WHERE a.id = b.id_customer

I don't know how or why, but it turned out that first case was much slower (10 seconds) than the second case (3,4 seconds).
It was however, a system already in production, so i hadn't the chance to play much with the query or with indexes 🙁

The client however, was very happy with the results xD

Greetings 😉

9 thoughts on “15 Ways to Optimize Your SQL Queries”

Veera

October 27 at 12:24 PM

I couldn't understand the 'Symbol Operator' point. Could you please explain it a little further?
Greg Jorgensen

October 27 at 1:25 PM

You are wrong about the NOT opertor. If you think about it you will realize that you can determine if there are NO black marbles in a bowl just as fast as you can determine if there is at least one black marble. There is no need to examine every marble; you can stop as soon as I find one black marble.

NOT EXISTS is exactly that: an EXISTS test that is logically negated. It's possible that a NOT EXISTS (or NOT LIKE or NOT IN) test will examine every row/character/list member if the searched item is not present, but that will happen for both EXISTS and NOT EXISTS.
Greg Jorgensen

October 27 at 1:30 PM

MAX and MIN do not "look for the maximum or minimum value in a column," and they aren't operators. The MIN and MAX functions are aggregate functions that operate on the selected rows (or groups of rows if GROUP BY is used). SELECT MAX(col) FROM table will find the maximum value of col, but the functions are more general than that. Indexes are expensive to maintain, and indexing columns just to speed up MIN and MAX is not great advice.
Clay

October 27 at 3:20 PM

@Greg : I agree with you that indexing columns just to speed up MIN and MAX is not a good advice. May be there is a misunderstanding on that point. I meant that MAX and MIN can be used on indexed column for better speed. Deliberating indexing a column because of a MIN or MAX is pure, NO NO. Thanks for the feedback 🙂

Well, regarding the NOT operator, if there are any algorithm available in the world that work like a human. May be your theory might just hit the right spot.
Clay

October 27 at 4:52 PM

@Veera : To make thing simple. If there are 15 available in that column it will directly point towards 15 (if there are no exact value, it will just be similar as <16) instead of going through the whole row finding which is the highest value that is smaller than 16 (might be 15,14,13,12,11, etc. the DBMS do not know until it look through them). If there is a equal to symbol, it means that there is a probability that it will jump directly to 15 and return the result directly to you.
unreal4u

October 28 at 1:08 AM

nice post! i had no idea about #2 and #4...

also, some discoveries i've made about queries:

- UNION is slow, try to use stored procedures: i've not tested this one on MySQL, but in PostGreSQL, it is much faster to use a SP than an union. Example:
SP:
SELECT getValue(column1) AS name, Operation(column2) AS number_of_products

is much much faster than:
(SELECT name, 0 AS number_of_products FROM customer WHERE a=b) UNION (SELECT 0 AS name, number_of_products FROM products WHERE y=z)

However, like i said, i've not tested this one on MySQL.

- PHP and MySQL: ok, strictly this isn't a part of query optimization, but it is always better to just retrieve the rows we are interested in, instead of all rows. Why? Memory (PHP and MySQL) and bandwidth consumption.
Example:
best:
SELECT id, name FROM customer
instead of:
SELECT * FROM customer
(Assuming "customer" haves an id, name, last_name, description, login, password, e-mail, etc.)

It also makes the code cleaner.

INNER/LEFT/RIGHT JOIN: ok, this is an untested one. A while ago, i was asked to optimize a MySQL query. Originally, i used joins, but turned out it was much faster to use them in the WHERE part.
Example:
Slower:
SELECT * FROM customer AS a LEFT JOIN products AS b ON a.id = b.id_customer
Faster:
SELECT * FROM customer AS a, products AS b WHERE a.id = b.id_customer

I don't know how or why, but it turned out that first case was much slower (10 seconds) than the second case (3,4 seconds).
It was however, a system already in production, so i hadn't the chance to play much with the query or with indexes 🙁

The client however, was very happy with the results xD

Greetings 😉
JW

October 28 at 1:49 AM

However, if that particular column was never used for searching purposes, it gives no reason to index that particular column although it is given unique

Um, I would say this is misleading too. What about unique columns that are frequently used in joins, but never searched?
Clay

October 28 at 6:40 AM

@JW : Yup, you don't need an index when no search is being done on a particular column. But if it is a unique and frequently use column, having an index will perform better.

For join, it depends on what DBMS you use. In MySQL, indexes perform more efficient when your join have the same data type and size. Although you don't need its result, columns that frequently require in joins need DBMS to search for the matching partners. Hence, having indexes will be better in joins too but criteria to make it efficient depends on each individual implementation of DBMS.
Clay

October 28 at 7:00 AM

@unreal4u : Thanks for sharing 😀

I'm not really familiar with stored procedure in MySQL at the moment since its quite new (MySQL v5.1 onwards). But in MySQL, stored procedure is used when a query is frequently used and have no data change. Hence, using a stored procedure will be more efficient. You can read Stored Procedure for more information. But it should work closely similar to PostGreSQL since the concept is the same.

Ya, its not good to always use *. For security reason too other than optimization point of view. (i was writing an example so i was lazy, sorry about that)

On the INNER/LEFT/RIGHT join, i also experience such situation with a few million of data table, when a join seems slower than a WHERE clauses. I read it somewhere before but i forgotten why is that so (should be at the documentation of MySQL).

@Mehedi : Welcome 🙂

Comments are closed.