|
I have the following table:
| Table:
Employee |
| Field
# |
Name |
Type |
| Field1: |
EmployeeID |
autonumber, primary key |
| Field2: |
Name |
Text |
As part of our new fire safety plan, each employee has a "Buddy". In the event of a fire, everyone
is supposed to check that their buddy has escaped the building.
I can think of a couple of ways of recording this information in my database:
(i) Add the following field to the Employee table:
Field4: BuddyID number (EmployeeID of another record in the same table).
If I do this, then I need to maintain the buddy relationship by ensuring that when each employee
is assigned a Buddy, that employee is their own buddy's buddy.
That is, if John is Mark's buddy, then Mark must also be John's buddy.
(ii) Create a new table called Buddy with two fields:
Field1: Buddy1 int
Field2: Buddy2 int
(The two fields combine to make a composite primary key).
If I use this method, then I need to ensure that no employee is listed in the table twice.
Whichever method I use, I need to be able to display a list of Employees and their buddies.
Starting with method (i)...
The following SQL is incorrect:
SELECT t1.Name, t2.Name
FROM Employee AS t1 INNER JOIN Employee AS t2
ON t1.BuddyID=t2.EmployeeID
Syntactically, this query is OK, but it will display each employee twice:
| t1.Name |
t2.Name |
| Jane |
Amos |
| Amos |
Jane |
| Wilma |
Mark |
| Mark |
Wilma |
Let's run that query again, but this time select the EmployeeID's as well as the names:
SELECT t1.EmployeeID, t1.Name, t2.EmployeeID, t2.Name
FROM Employee AS t1 INNER JOIN Employee AS t2
ON t1.BuddyID=t2.EmployeeID
The results of this query follow.
| t1.EmployeeID |
t1.Name |
t2.EmployeeID |
t2.Name |
| 2 |
Jane |
1 |
Amos |
| 1 |
Amos |
2 |
Jane |
| 4 |
Wilma |
3 |
Mark |
| 3 |
Mark |
4 |
Wilma |
Compare the first and second rows. Jane (ID=2) appears on the left side in the first row, and on
the right side in the second row.
What if we were to only select rows where the employee with the greater EmployeeID appears on
the left side:
SELECT t1.Name, t2.Name
FROM Employee AS t1 INNER JOIN Employee AS t2
ON t1.BuddyID=t2.EmployeeID
WHERE t1.EmployeeID>t2.EmployeeID
It turns out that this will do what I want, and each pair of buddies will be listed only once.
OK, so what about method (ii)...
Create a new table called Buddy with two fields:
Field1: Buddy1 int
Field2: Buddy2 int
I can set up the same "buddy" relationships by entering the following rows into the Buddy table:
The query to return a list of all the employees and their buddies is
SELECT t1.Name, t2.Name
FROM (Buddy LEFT JOIN Employee AS t1 ON t1.EmployeeID=Buddy.Buddy1)
LEFT JOIN Employee AS t2 ON t2.EmployeeID=Buddy.Buddy2
In some RDBMS's, it is possible to prevent multiple instances of the same pairing from being entered into
the Buddy table, by following these steps:
1. Make Buddy1 and Buddy2 a composite primary key.
2. Add a constraint to the table, so that Buddy1 > Buddy2
3. Add UNIQUE constraints to both columns.
However, these steps are not sufficient to prevent the following scenario:
| Buddy1 |
Buddy2 |
| 4 |
3 |
| 3 |
2 |
| 2 |
1 |
EmployeeID's 3 and 2 have been listed twice each.
One way to check for this scenario is to count the number of employees, and then compare that with
the number of records in the Buddy table.
If the number of employees is even, then the number of records in the Buddy table should be exactly
half that number.
If the number of employees is odd, then there will be one employee who does not have a buddy, and the
number of records in the Buddy table should be half the remaining number of employees.
"So which is preferable. Method (i) or (ii)?"
My preference is for method (ii). It is easier to keep the data consistent using this method and by
following the steps outlined above.
Also, having a table to represent the Buddy relationship means that I have somewhere to store
additional information about that relationship.
For example, I could modify the structure of the Buddy table to include a location where the buddies
should meet after a fire:
| Table:
Buddy |
| Field
# |
Name |
Type |
| Field1 |
Buddy1 |
int |
| Field2 |
Buddy2 |
int |
| Field3 |
MeetingLocation |
text |
MeetingLocation is a property of the relationship, and not of the employees themselves.
It should be noted, that this type of structure is a special case and it is rarely appropriate. Only use
it if the number of items in the relationship will always be exactly two.
This would not be a good structure to store a list of synonyms for a thesaurus.
It seems obvious that you could have a table called "Synonym" with fields "Word1" and "Word2" to represent
the relationship between the words "chair" and "seat" or between "big" and "large".
However, this theory breaks down when you add the word "huge".
A better model in this instance is to have a table for each word group, and another table to represent
the membership of each group:
| Table:
Word |
| Field
# |
Name |
Type |
| Field1 |
WordID |
autonumber, primary key |
| Field2 |
Wrd |
Text |
| Field3 |
Meaning |
Text |
| Table:
WordGroup |
| Field
# |
Name |
Type |
| Field1 |
GroupID |
autonumber, primary key |
| Field2 |
GroupHeading |
Text |
| Table:
GroupMembership |
| Field
# |
Name |
Type |
| Field1 |
GroupID |
number |
| Field2 |
WordID |
number |
It is not absolutely necessary to add an autonumber field to the GroupMembership table, as the GroupID
and WordID can be combined to make a composite primary key.
My rule of thumb, is that the GroupMembership table should have an autonumber primary key, if it is
going to store any additional information about the group membership.
For example, the GroupMembership table could be extended to include antonyms as well as synonyms:
| Table:
GroupMembership |
| Field
# |
Name |
Type |
| Field1 |
GroupID |
number |
| Field2 |
WordID |
number |
| Field3 |
Type |
number (1=synonym, 2=antonym) |
| Field4 |
GroupMembershipID |
autonumber, primary key |
Adelle.
|