How to normalize a table using Access - Part 4
This is part 4 in a 4 part series
This is the fourth (and final) article in this series, in the first article I showed the data that we would be working with and gave an overview of the normalization process. In the second article I went through the process of actually pulling this duplicate data into its own table. In the third article I looked at how to add the new foreign key values to the original table, and then delete the original column that held the offending duplicate data. In this article I will create a permanent parent-child relationship between the two tables and explain referential integrity.
I have provided a sample comma-separated text file that you can use for this example if you want (you can download it here). You should be able it import this fairly easily into your version of access. Save the file to your hard drive, create a new Access database, then go to File -> Get external data -> Import, and select Text files from the Type drop-down list. Once there you should be able to walk through the wizard without too much trouble, and import the data into a table called song. Make sure you select the box 'First Row Contains Fields Names' in the second dialog. The file is a short list of song details, and has enough records to give you an idea of how to do this.
First of all, a huge congratulations to you for making it this far! I said in the first article of this series that normalizing data was not rocket science. While that is true, getting the job done does take focus, concentration, and some time. If you've made it this far you're doing really well. Fortunately, this final part is the easiest and quickest part of the whole process.
What we're going to do here is to define a permanent relationship between the song table and the album table we've created which shows the album titles. In addition, we're going to enforce referential integrity in the relationship between the two tables. To explain this, let's take a look at one of the previous images that shows both of the tables.
You can see here that every value in the album_id field in the song table (song.album_id) has a corresponding record in the album_id field in the album table (album.album_id). This means that right now there is referential integrity between these two tables. However, as the database is set up right now you could place a value in song.album_id that did not have a matching record in album.album_id, say for example a 7. This would break referential integrity, and in the vast majority of cases this would be a bad thing. We can instruct the database to not allow this to happen, that way you don't need to worry about it. Every time you enter a value in song.album_id the database engine will look to see if there is matching value in album.album_id, and if there is not the database will give you a not-so-nice error message and the database will refuse to accept the changes:
So, this is how we are going to achieve this little miracle. The first thing you need to do is to close any tables or queries that you have open. Once you've done that click on the 'Relationships' icon from the main toolbar:
This will bring up the relationships window, with nothing in it. You will need to add the album and song tables, to do this click on the 'Show table' icon in the main toolbar:
You will get a dialog box where you can select the tables to add to the relationships window. You can add the song and album tables one at a time, or you can select album, then hold down the Ctrl key and click song to select both tables at once, then click on Add, then click Close:
Now that the tables are added, you define a relationship here the same way we defined the ad-hoc relationships working within the queries in the previous steps. Simply click on album.album_id, and drag and drop it over song.album_id. The difference here is that you will get a popup dialog that is going to ask you for some additional details:
There are a couple of things you need to take note of here. The first is that this is where you will select the option that will enforce referential integrity. When you select that box two other options will become available to you: Cascade Update Related Fields, and Cascade Delete Related Fields. Let's take a quick minute and look at what these options will do for you.
- Cascade Update Related Fields - We talked about this earlier, this is the option where if you were to change the value of one of the album.album_id records, Access would automatically change the values in any of the matching song.album_id records.
- Cascade Delete Related Fields - This option kicks in if you try to delete a record from album where there are related records in song. For an example let's refer back to the first image on this page. Let's say you wanted to delete the album 'Walking Wounded' from the album table. You can see that this album is referenced in the song table, if you have referential integrity enforced you will not be able to delete that record from album, as this would leave records in song without a matching record in album. You will have 3 options here:
- Delete the number 6 from those records in song, leaving nothing in that field (making it Null), in effect saying that those songs do not have an album
- Change every 6 from those records in song to a different value that does exist in the album table, in effect saying that these songs belong to a different album
- Delete all of those records from song where there is a 6 in that field
If you have this option selected Access will automatically delete these records from the song table for you.
I've seen this argued both ways, but I do not like automatic cascade updates or deletes. If you have other users that will working with this database you have the potential for them to create great harm to your data. Cascade deletes are particularly dangerous, I would never select this option. For this exercise we're going to leave both of these un-selected, I would strongly urge you to not select these options when creating relationships in your databases.
Select Create from the dialog form and the relationship will be created for you. Access will check for referential integrity when you do this, and if you have records in song that do not have matching records in album you will not be able to create the relationship. You should end up with something that looks like this:
There are just a couple of final points I want to make here. First, I've circled the icons on the join bar that indicates a one-to-many relationship. All this means is that the album_id values in album are unique, and that there can be as many records as you want in song that will refer to those unique values in album. Second, this does NOT mean that every record in song MUST be assigned to a record in album. You could have records in song that do not have an album, those records would not have anything entered in the album_id field. If you want to enforce that, and say that every song MUST have or belong to an album you will need to do that in the song table. To do this all you need to do is require a value in song.album_id:
If you select 'Yes' in 'Required' for song.album_id then anytime you enter a new record in the song table there MUST be a record in album to which it can be assigned. It is logical, therefore, as you're thinking about your workflow, that you will need to enter the albums in first, then the songs.
This should give you an overview of how to normalize the data in your database. We covered a tremendous amount of territory here, and the first couple of times you do this it will see like a considerable undertaking. However, after you get your head around the process you will find that this really is pretty simple, and that you can normalize data like this without too much trouble.
This is part 4 in a 4 part series
- There is a pretty good overview about database normalization at About.com