Subqueries Aren't So Bad After All

I used to advise people not to use subqueries if they can help it.

You may recall that in my last tip, I posted the following query for finding duplicate Student Card numbers:

SELECT t1.* FROM Student AS t1
WHERE t1.StudentID>(SELECT MIN(StudentID)
FROM Student AS t2 
WHERE t2.StudentCard=t1.StudentCard)


The same results are returned by the following query, which does not use a subquery:

SELECT t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName
FROM Student AS t1
LEFT JOIN Student AS t2
ON t1.StudentCard=t2.StudentCard
WHERE t1.StudentID>t2.StudentID
GROUP BY t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName


So which is better?

If by "better" you mean easier to understand, I'll pick the subquery every time.

If by "better" you mean "faster", it's a tough call.

Without any indexes on this table other than the primary key, the first query gets bogged down with only a small number of records.

I tried both queries with 1000 students. Using a subquery, my dodgy old computer took more than 30 seconds to find the duplicates. The second query executed almost instantly.

After adding an index to the duplicated field, it was a different story, with both queries running
neck and neck, until I tried them with 50 000 students.

At that point, the subquery was faster:
1 minute versus 1 minute and 15 seconds.

There are two lessons to be drawn from this little exercise:

1. Test any received wisdom against your own data.

You might find that *your* database is the exception that proves the rule. Which query is faster will depend a lot on *your* data.

For instance, the test I described above, was done using a table where a high percentage of records were duplicates. Would I have got the same results if there were only a handful of duplicates out of the 50 000 records?

2. Subqueries aren't so bad after all...  as long as you have the right indexes.

Home About the Author Copyright © 2001 Adelle Hartley