How to normalize a table using Access - Part 1 of 4
This is part one in a 4 part series
Normalizing data. The mere mention causes database developers to sit up in their seats with ears perked and fingers poised over the keyboard, and sends the layman looking for a rock to crawl under. It really isn't rocket science. In its simplest form, the act of normalizing data simply means that you're going to take any items that are duplicated in a database table and move those items to their own table, and then create a relationship between the original table and that new table.
Now for you purists out there, I'll admit that getting to 3rd and 4th normal form is more complicated than that. I think most realistic database developers will agree that working with a totally normalized database in a production environment is just, well, a pain in the butt. That's not to say that you should ignore design standards, just that working with joins across multiple levels of many different tables can be more trouble that it's worth.
I have provided a sample comma-separated text file that you can use for this example if you want (you can download it here). You should be able it import this fairly easily into your version of access. Save the file to your hard drive, create a new Access database, then go to File -> Get external data -> Import, and select Text files from the Type drop-down list. Once there you should be able to walk through the wizard without too much trouble, and import the data into a table called song. Make sure you select the box 'First Row Contains Fields Names' in the second dialog. The file is a short list of song details, and has enough records to give you an idea of how to do this.
I initially tried to explain this as a single article, but after including some of the necessary overviews I found that it was just too long. So I've decided to make it a series, with this being the first article in that series. I will be giving an overview of what we're trying to do here, with details on how to actually achieve this in subsequent articles. To get started, here's a screen print of some of the data that we're going to be working with:
I know that's a bit of an eye test, hopefully you've downloaded the example or have your own data so you can follow along. What we're trying to do to get to 2nd normal form is to take any data that is repetitive and pull that data into its own table, then create a relationship between the two tables. As you look at this data you can easily see the repetitive information:
The first thing you need to do is to and have a clear picture defined of what data sets you are going to need, and then to define the relationships between the different data sets. I think the easiest way to start clarifying relationships is to just talk it through, and to assess which data set belongs to or is owned by the other data sets. I'm going to use the song as the focal point in which to start, and as we look at the data we see that each song can have one artist, and each song can also be included in one album. In addition, each album can belong to one genre. I would usually start by just sketching this out, defining the different tables I'll need and drawing lines to show the relationships between those tables. You can see how this would look in the sketch below.
A fundamental rule in database design is that each table shall have a field defined as a primary key (more about that here), the primary key in each of these tables is defined in red, and is the name of the table with '_id' added to it. In addition, when you define a relationship between two tables you do so by pulling that primary key value from the foreign table into the target table, which then becomes a 'foreign key' in that target table, shown here in blue. Every table can have only one primary key, but it can have many foreign keys:
To completely normalize this table we would split each one of these categories into its own table. For the purpose of this example I'm going to split the records for the albums into a separate table. This should give you enough information to pull the other records into their own tables, or to use it as an example for fixing your own data. To visualize the steps that will be necessary to achieve this, look at the table below to see the repetitive album column:
First, we need to pull the album information into a separate table:
We will then add the id number from the new table into a new field called 'album_id' that we're going to add to the existing data:
Once we've done that we no longer need the album information in the song table, so that column can be deleted:
We can then join these two tables by creating a parent-child relationship between them, using the album_id as the primary-foreign key in the tables:
As I mentioned earlier, this is a long exercise so I've broken it into a series with multiple posts. In the next part of the series I will start to show how you will actually begin the process of pulling the album data out of the song table. I will be posting the next article in the series in the next day or so, and I'll have references to all of them once the series is complete.
- There is a pretty good overview about database normalization at About.com