Enumerating Duplicates

This week, I'm taking another look at duplicates, using the same table of Students that I used before Easter/Passover.

It is possible to enumerate each duplicate StudentCard, so that we get the following output:

StudentID StudentCard Surname GivenName Sequence
1 TS00001 Bloggs Jane 0
2 TS00002 Doe Joe 0
3 TS00003 Payne Thomas 0
4 TS00002 Doe Joe 1
5 TS00003 Pain Thomas 1
6 TS00002 Bloggs Melissa 2

(I've added an extra student, to make this example more interesting).

The first row with StudentCard=TS00002 has a sequence number of 0. The second row with StudentCard=TS00002 has a sequence number of 1, and so on.

The query that generated this output has an interesting join condition:

SELECT t1.StudentID, t1.StudentCard, t1.Surname,
t1.GivenName, COUNT(t2.StudentID) AS Sequence
FROM Student AS t1 LEFT JOIN Student AS t2
ON (t1.StudentID>t2.StudentID)
AND (t1.StudentCard=t2.StudentCard)
GROUP BY t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName

"Woah, that's a bizarre piece of SQL"

OK, well compare that with the following query:

SELECT t1.StudentID, t1.StudentCard, t1.Surname,
t1.GivenName, COUNT(t2.StudentID) AS Sequence
FROM Student AS t1 LEFT JOIN Student AS t2
ON t1.StudentCard=t2.StudentCard
WHERE t1.StudentID>t2.StudentID
GROUP BY t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName

This second query will return a subset of the first query:

StudentID StudentCard Surname GivenName Sequence
4 TS00002 Doe Joe 1
5 TS00003 Pain Thomas 1
6 TS00002 Bloggs Melissa 2

The neat thing about joins in SQL, is that you can put pretty much any condition in there that you can think of, not just equality.

If you don't see why this might be useful, have a look at the following:

SELECT t1.StudentID,
t1.StudentCard+"-"+COUNT(t2.StudentID) AS NewCard
t1.Surname, t1.GivenName
FROM Student AS t1 LEFT JOIN Student AS t2
ON (t1.StudentID>t2.StudentID)
AND (t1.StudentCard=t2.StudentCard)
GROUP BY t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName

This variation produces a new, unique card number for each student (based on their existing, non-unique card numbers):

StudentID NewCard Surname GivenName
1 TS00001-0 Bloggs Jane
2 TS00002-0 Doe Joe
3 TS00003-0 Payne Thomas
4 TS00002-1 Doe Joe
5 TS00003-1 Pain Thomas
6 TS00002-2 Bloggs Melissa

Home About the Author Copyright © 2001 Adelle Hartley