How to normalize a table using Access - Part 3
This is part 3 in a 4 part series
This is the third article in the series, in the first article I showed the data that we would be working with and gave an overview of the normalization process. In the second article I went through the process of actually pulling this duplicate data into its own table. In this article I will look at how to add the new foreign key values to the original table, and then delete the original column that held the offending duplicate data. In the final article I will create a permanent parent-child relationship between the two tables and explain referential integrity.
I have provided a sample comma-separated text file that you can use for this example if you want (you can download it here). You should be able it import this fairly easily into your version of access. Save the file to your hard drive, create a new Access database, then go to File -> Get external data -> Import, and select Text files from the Type drop-down list. Once there you should be able to walk through the wizard without too much trouble, and import the data into a table called song. Make sure you select the box 'First Row Contains Fields Names' in the second dialog. The file is a short list of song details, and has enough records to give you an idea of how to do this.
OK, now we have each album associated with its own unique id, and we're going to use that id field to link the album table back to the song table. To do this we're going to need to place the album id number in the song table so that it will tell us which album the song belongs to. This is called a foreign key, it's how you link tables together in a database. The first thing we need to do is to create a field where we can store the album's id in the song table. Open the song table in design mode, and insert a field called 'album_id'. Click in the row selector where you want to place the field, then right click and select 'Insert row' from the popup menu:
Name the field album_id, change the field data type to number, and it will default to the long integer field size:
The data types must match in any primary-foreign key relationship. The data type for the autonumber field type is a long integer, which is why we needed to make the foreign key field in the song table the same data type.
Save and close the song table. Now comes the trickiest part, we have to get the correct album_id from the album table inserted into this new field in the song table. The easiest way to do this is to create an ad-hoc join between the song and album tables, connecting the original album field to the album_title in the new table we just created. Open a new query in design mode, and add the song and album table to the query:
OK, we've got an issue here - Access automatically created a link between the two tables. This is one of those cases where Access is trying to anticipate what you're going to do, and will try to set things up for you. You can see that it created a join between the two tables, using the album_id as the join field. The reason it did this is that it saw that these two tables have a field with a common name, and 9 times out of 10 this is going to mean that you're going to need to create a join between those fields. All that you're doing when you create a basic join between tables is saying 'There is a relationship between these fields, and I need to connect them so that I can match up the correct records'. If we leave this join in place we won't get any results, because there are no values for album_id in the song table as of yet. We do have records in these two tables that match, however. The album_title field in the album table we just created is going to match the album field in the song table, because that's were we got the values from. So, we first want to delete the join that Access created for us. To do this hold the mouse pointed over the join, click on it once to highlight it, then right click without moving the mouse. Select 'Delete' from the popup menu:
This isn't making any kind of permanent change to the database, and is not changing or modifying any records. All you're saying is that these two tables do not have any relationship based on this field. They do, however, need to have a relationship defined between the fields we mentioned above. In order to do this all you need to do is to grab the 'album' field in the song table and drag it over the 'album_title' field in the album table:
That's called an 'ad-hoc' relationship, and just means you are defining a temporary relationship between these tables using these fields. What you are actually doing here is telling the database that the values in these two tables match. I think that it is easier to understand what is happening here if you take a look at some sample data. To do this I want you to pull the 'album' and 'album_id' fields from the song table down to the query grid, and the 'album_id' and 'album_title' fields from the album table down to the query grid:
Let's take a quick look at the data as it is right now. Go ahead and click on the 'View' icon on the main toolbar to see the data. You will notice that the album field from the song table (in SQL format you would write this as song.album) is the same as the album_title field in the album table (album.album_title). This is shown below by the top red arrow, and is what we told the database to do when we created the join in the previous step. You will also see that song.album_id is empty, and that album.album_id is showing the id values for those albums. All we need to do is to enter the album_id value from the album table into the album_id field in the song table, and we'll have the relationship between song and album defined. This is shown here by the bottom blue arrow:
Now, if you only had a few records you could simply type the values in, copying them from album.album_id to song.album_id. However, typing manually is prone to mistakes, and you certainly wouldn't want to do that for thousands of records. We can tell Access what we want to do, and it will fill the values in for us. To do this we're going to use another type of action query, this time one that's called an 'update query'. An update query does just what it sounds like, it updates database records.
Click on the 'View' icon again from the main toolbar to return the query to design view. Once again right-click anywhere over the Diagram pane, then select Query type -> Update query from the popup menu:
As soon as you do that the 'Sort' and 'Show' rows will be removed from the query grid, and will be replaced by one called 'Update To'. The reason for that is that you can not sort or show data in an update query, but you do need to tell the database what to update a field to. In other words, using this type of query will allow you to make a bulk update to all of the records defined by the query. So how do we tell it to update the song table with the correct album_id from the album table? Well, we've already joined the tables using the album title, so it already knows which is the correct id for each record. All we have to do is get it to add that id to the song table. We do this by simply referring to the album_id from the album table in the query grid using the square brackets, which is how Access defines a field from a table:
All we need to do is to type that in to the query grid in the 'Update To' field in the song table:
So, what we are doing here is telling the database to link the tables using the album titles, and then to copy the album_id from the album table into the song table. Go ahead and click on the 'Run' icon in the main toolbar. Access will warn you that you are about to 66 rows (if you're using the data I supplied), just click OK. Simple as that.
Now, let's look at the results. Change the query back to a select query by right-clicking in the Diagram pane and selecting Query Type -> Select Query. We need to get rid of the ad-hoc query we created earlier between song.album and album.album_title, so right click on the line and select Delete from the popup menu:
Now, we need to re-create the join that Access had put in there for us when we first added the tables. Now it will work, because we placed the correct values in song.album_id:
We need to see some additional data, so drag song.name and song.artist down to the query design grid. Also, we don't need to see some of the fields we were using before, so let's remove them from the query. Highlight the fields song.album, song.album_id, and album.album_id by placing your mouse at the top of the query grid until it turns into a solid black downward facing arrow, click and drag over the 3 columns, then press the delete key on your keyboard. Again, this will have no effect on the actual data, it just removes those fields from being shown in the query results:
Now let's take a look at what we've got. Click on the View icon in the main toolbar to see the query in datasheet view:
What you're looking at is the name and artist from the song table, and the album_title from the album table. We're done with this now, you can close this query without saving.
Now, we no longer need the album title listed in the song table, so let's delete that field. Go back to the main database window and open the song table in design mode. Highlight the field 'album', and press the delete key. Access will ask if you really want to permanently delete these records, select Yes:
If you are now to look at the song and album tables you will see the following:
I've drawn in some circles and arrows to highlight the fields, you can see where we now have the albums listed in their own table, and the correct corresponding foreign key entered into the original song table. Only one think left to do, and that is to create a permanent relationship between the tables, we'll take care of that in the final article in the series.
This is part 3 in a 4 part series
- There is a pretty good overview about database normalization at About.com