Fixing A Badly Designed Database

We've all had to do it. When I've had to do it, it's usually been a database that I've created.

In one of my early attempts at database design, I made a table called ListOfLists:

ListID List Day
1 sausages, carrots, lettuce Monday
2 pineapple, lemon, apple Tuesday
3 apple, cinnamon Wednesday

Keeping several items within a single field seemed like a good idea at the time, but it is an example of really bad design.

For starters, what if I want to have a list of 20 or 30 items. My "List" field would need to be large enough to accommodate the longest list.

Another common mistake is to have one field for each item:

ListID Item1 Item2 Item3 Day
1 sausages carrots lettuce Monday
2 pineapple lemon apple Tuesday
3 apple cinnamon   Wednesday

This is really no improvement on my original design, because I will run into the same problem with lists that contain more than 3 items.

Also, try writing a query to find all the lists which include "apple".

OK, so how best to fix this? I have thousands of records that need to be fixed!

What I need to do, is put my item data into a separate table:

ItemID ListID Item
1 1 lettuce
2 1 carrots
3 1 sausages
4 2 apple
5 2 lemon
6 2 pineapple
7 3 cinnamon
8 3 apple

My "ListOfLists" table will contain the same data as before, but without the "List" field:

ListID Day
1 Monday
2 Tuesday
3 Wednesday

The SQL to generate my Item table, is as follows:

SELECT ListOfLists.ListID, SUBSTRING(List,t1.Id,t2.Id) AS Item
INTO Item
FROM ListOfLists, Numbers AS t1, Numbers AS t2
WHERE (t1.Id+t2.Id=LEN(List)+2 OR SUBSTRING(List,t1.Id+t2.Id,1)=",")
AND (t1.Id=1 OR SUBSTRING(List,t1.Id-1,1)=",") AND CHARINDEX(",",SUBSTRING(List,t1.Id+1,t2.Id-2))=0 AND t2.Id>=2

This query relies on the following table, called "Numbers":

Id
1
2
3
4
5
6
7
8
9
10
...

"Numbers" must contain at least as many records as there are characters in the longest list.

To understand how this query works, consider the following variation, in which two extra fields are selected, and the WHERE clause is dropped:

SELECT t1.Id, t2.Id, ListOfLists.ListID, SUBSTRING(List,t1.Id,t2.Id) AS Item
FROM ListOfLists, Numbers AS t1, Numbers AS t2

Using the sample data I have presented, this query will output all of the possible substrings of all the lists.

A sample follows:

t1.Id t2.Id ListID Expr1
1 1 1 s
1 2 1 sa
1 3 1 sau
1 4 1 saus
1 5 1 sausa
1 6 1 sausag
1 7 1 sausage
1 8 1 sausages
1 9 1 sausages,
1 10 1 sausages,c
2 1 1 a
2 2 1 au
2 3 1 aus
2 4 1 ausa
2 5 1 ausag
2 6 1 ausage
2 7 1 ausages
2 8 1 ausages,
2 9 1 ausages,c
2 10 1 ausages,ca

The WHERE clause in my original query picks out those substrings which
(1) Start at the beginning of a list, or start 1 character to the right of a comma.
(2) Finish at the end of a list, or finish 1 character to the left of a comma.

Both queries make use of an implicit CROSS JOIN between "Numbers" and itself. CROSS JOINS are discussed in more detail in a previous tip

Adelle.

Home About the Author Copyright © 2001 Adelle Hartley