|
This week, I'm taking another look at duplicates, using the same table of
Students that I used before Easter/Passover.
It is possible to enumerate each duplicate StudentCard, so that we get the following output:
| StudentID |
StudentCard |
Surname |
GivenName |
Sequence |
| 1 |
TS00001 |
Bloggs |
Jane |
0 |
| 2 |
TS00002 |
Doe |
Joe |
0 |
| 3 |
TS00003 |
Payne |
Thomas |
0 |
| 4 |
TS00002 |
Doe |
Joe |
1 |
| 5 |
TS00003 |
Pain |
Thomas |
1 |
| 6 |
TS00002 |
Bloggs |
Melissa |
2 |
(I've added an extra student, to make this example more interesting).
The first row with StudentCard=TS00002 has a sequence number of 0. The second row with
StudentCard=TS00002 has a sequence number of 1, and so on.
The query that generated this output has an interesting join condition:
SELECT t1.StudentID, t1.StudentCard, t1.Surname,
t1.GivenName, COUNT(t2.StudentID) AS Sequence
FROM Student AS t1 LEFT JOIN Student AS t2
ON (t1.StudentID>t2.StudentID)
AND (t1.StudentCard=t2.StudentCard)
GROUP BY t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName
"Woah, that's a bizarre piece of SQL"
OK, well compare that with the following query:
SELECT t1.StudentID, t1.StudentCard, t1.Surname,
t1.GivenName, COUNT(t2.StudentID) AS Sequence
FROM Student AS t1 LEFT JOIN Student AS t2
ON t1.StudentCard=t2.StudentCard
WHERE t1.StudentID>t2.StudentID
GROUP BY t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName
This second query will return a subset of the first query:
| StudentID |
StudentCard |
Surname |
GivenName |
Sequence |
| 4 |
TS00002 |
Doe |
Joe |
1 |
| 5 |
TS00003 |
Pain |
Thomas |
1 |
| 6 |
TS00002 |
Bloggs |
Melissa |
2 |
The neat thing about joins in SQL, is that you can put pretty much any condition in there that you can
think of, not just equality.
If you don't see why this might be useful, have a look at the following:
SELECT t1.StudentID,
t1.StudentCard+"-"+COUNT(t2.StudentID) AS NewCard
t1.Surname, t1.GivenName
FROM Student AS t1 LEFT JOIN Student AS t2
ON (t1.StudentID>t2.StudentID)
AND (t1.StudentCard=t2.StudentCard)
GROUP BY t1.StudentID, t1.StudentCard,
t1.Surname, t1.GivenName
This variation produces a new, unique card number for each student (based on their existing, non-unique card
numbers):
| StudentID |
NewCard |
Surname |
GivenName |
| 1 |
TS00001-0 |
Bloggs |
Jane |
| 2 |
TS00002-0 |
Doe |
Joe |
| 3 |
TS00003-0 |
Payne |
Thomas |
| 4 |
TS00002-1 |
Doe |
Joe |
| 5 |
TS00003-1 |
Pain |
Thomas |
| 6 |
TS00002-2 |
Bloggs |
Melissa |
|