How to normalize a table using Access - Part 2
This is part 2 in a 4 part series
This is the second article in the series, in the first article I showed the data that we would be working with and gave an overview of the normalization process. In this article I will go through the process of actually pulling this duplicate data into its own table. In the next article I will look at how to add the new foreign key values to the original table, and then delete the original column that held the offending duplicate data. In the final article I will create a permanent parent-child relationship between the two tables and explain referential integrity.
I have provided a sample comma-separated text file that you can use for this example if you want (you can download it here). You should be able it import this fairly easily into your version of access. Save the file to your hard drive, create a new Access database, then go to File -> Get external data -> Import, and select Text files from the Type drop-down list. Once there you should be able to walk through the wizard without too much trouble, and import the data into a table called song. Make sure you select the box 'First Row Contains Fields Names' in the second dialog. The file is a short list of song details, and has enough records to give you an idea of how to do this.
The easiest way to separate these duplicated records into their own table in Access is to group them first, then create the new table of those records using a make-table query. We're going to step through this process one step at a time. The first thing we need to do is to get the duplicated data summarized, or grouped by, to use an SQL term. We can do all of this using the graphic query interface supplied with Access. To get started select Queries from the main database window, then click on New and select Design View from the popup menu:
Now add the song table to the query (your Show Table dialog box will look different than this, as I have other tables from other examples showing here):
Now, what we want to end up with here is to show each one of the albums just once, then to create another table containing just those single entries. To get the albums in the query just double-click on the name in the table or drag and drop the album field down into the query grid:
If we were to look at the results now we would still be showing the total number of records, with all of the albums duplicated. To show each album just once we want to group by the album field. When you select group by you are telling the database 'Just show me each unique record once'. Typically, the reason you would want to group the data is so that you can find a total for that grouping, like the total number of songs in each album, or the total number of albums for each artist. In this case all we really want is the name of the album, but we still need to select the 'Totals' option. To do this you can select View -> Totals from the main menu, or click on the Totals icon from the main toolbar:
When you do this you will get another row added to the design grid called 'Total', and you will be able to select the function you would like to occur because of that total. In this case all we want to do is to 'Group By' the album field, which is the default for the total row:
Now when we view the data we will see each album listed only once:
Perfect. Now, the next thing we need to do is to create a new table containing these values. We could create the table manually and then (heaven forbid) type the values in (this wouldn't be bad here as we only have 6 values, but when you have hundreds, or thousands or more records re-typing is just dumb), or we could use an action query to create the table. An action query does just what the name implies - it performs some sort of action, as apposed to just selecting data. In this case we are going to use the 'make-table' action query. Return the query to design mode, then either click on the 'Query type' icon from the main toolbar:
Or, what I think is easier, right-click anywhere on the Diagram pane in the query window and select Query type -> Make-table Query...:
As soon as you do this you will be prompted for a table name, in this case let's type in 'album', and we're going to place the table in the current database. Click on OK:
And to your surprise, nothing happens. This is because all we've done so far is to set up the parameters for the action query, we now need to tell Access explicitly that we actually want to run the query. To do this we're going to go back up to the main toolbar and click on the 'Run' icon:
You will get another dialog box, telling you how many records will be placed in the new table, and giving you a chance to change your mind:
Phew, so far so good, we're more than half way done (hang in there)! This created a new table called 'album', but it's not set up the way we want it to be. The table that was created contains a single field with the name 'album', which will be confusing. In addition, every table created in any database should always have a field that is designated as a 'primary key' (you can find out more about primary keys here). We're going to want to rename the 'album' field to 'album_title', and we're going to need to create a new field in this table that we can designate as the primary key(we'll use the autocounter field type for this). So, go to the Table tab in the database window and open the album table in design mode. Change the name of the album field to album_title, and add another field called album_id. Select 'Autonumber' for the data type, then click on the Primary Key icon in the main toolbar. I've assumed that you're familiar with those actions and skipped a couple of pictures here (you can view this post if you need help with any of those commands), at this point you should be looking at a screen that looks like this:
Let's make the id the first field, it's just easier to keep track of the primary key if it is in the first slot. You can drag and drop it there, click in the field selector box in the very left column and drag it to the top of slot, like this:
OK, now we have each record uniquely defined with the id field (or we will as soon as we save the changes to the table), and we're going to use this field to link this table back to the song table. Each album has a corresponding id number, we're going to place this number in the song table so that it will tell us which album the song belongs to. This is called a foreign key, it's how you link tables together in a database. You should click on the save icon in the main toolbar:
When you do so Access will automatically populate the album_id field with numbers, starting with 1 and counting up to the total number of records. I've mentioned this before in other posts, but it's important so I'm going to mention it here again. Right now, immediately after creating a new autonumber field, is the only time when that field will accurately reflect the total number of records in the table. There are a number of circumstances in which automatically generated numbers would be discarded leaving holes in the sequence, and also Access will not re-sequence these records anytime you delete a record. Indeed, you certainly would not want to have records that had been previously associated with one id to be re-associated to a different id. Once you have a unique id associated with a record it must become a permanent identifier for that record, changing it would be like someone in the US changing their social security number (this would be a very bad thing, by the way) There actually is a way to automatically perform a 'cascade update' in Access, where the database would update any dependent records if you were to change a primary key (we'll see that in the final stage of this tutorial). Doing so, however, is just generally a bad idea, too many things can go wrong in the process.
Anyway, now when you look at this table in datasheet view it should look like this:
So now we've got each of the albums listed just once in a new table. In the next step we'll learn how to match the new album ids from our new table into the correct records in the initial table.
This is part 2 in a 4 part series
- There is a pretty good overview about database normalization at About.com