Microsoft Excel How-to: Vlookup tutorial
My sister-in-law is a certified Excel genius. Back in the early 90's, when I was just getting interested in IT, she was writing programs in Excel running on DOS(6, I think?) that were being used to simulate nuclear meltdowns at actual nuclear power plants here in the states. They would enter in the baseline parameters, programmatically create a failure in some internal system, then run the simulation and 'see what happened'. They actually somehow linked it with the internal systems at the facility, right down to the sensors, alarms, monitors, etc. Really wild stuff. Whenever she would change some of the formulas she would wait until the weekend and then have their brand-spanken-new 486 DX2/66 with 8 mb of ram (8 mb of ram cost north of $2000 back then) chew on the spreadsheet for 3 days to finish the re-calculations. I've probably just violated a whole bunch of national security guidelines by writing this, so if this site's gone next weekend and you never hear from me again [hey, did I just see a black helicopter?!!?]...
Anyhooo, I remember her saying to me "You want to be a programmer? Start by figuring out a lookup function in Excel. If you can get your head around that then you 'may' just be smart enough ..."
Funny how things change. What was once the lofty domain of the geek elite is now common ground for every weekend warrior with a beat up computer on their desk. These days I see formulas in Excel from computers on our plant floor, written by grandmothers, that would have knocked me off a chair 10 years ago. Lookup functions fall firmly into this category - it just ain't rocket science anymore.
The hardest part of writing this was figuring out what data to use as an example. I use lookup functions all the time at my real job, but if I showed you one of those it would be revealing corporate stuff, and then I would have to hunt you all down and kill you. That's just a hassle (not to mention the negative impact it would have on my readership). I finally managed to find a site where I could download some stock market history for free. This will work, and I won't have to start looking for a new day job.
If you would like to follow along you can download the example Excel file by clicking here.
First, let's look at an easy example to try to get the concept down. The first question that needs to get answered is: when would you go to the trouble of using a lookup function? The most common answer would be when you have information on a spreadsheet, and you want to show or summarize that information, or pieces of that information, on the same or another spreadsheet. So, as an example, let's say that you're a fruit merchant (is there such a thing?), and you have a detailed list showing how much fruit you've sold by month:
Now, you want to create a summary list that only shows the totals somewhere else on this worksheet, in the workbook, or in another spreadsheet file. And you want that list to stay updated whenever you update the detail. In the table below you would use the vlookup function in the Total column to pull those values from this detail:
We should start by taking a close look at the arguments you'll need to work with for the function:
- lookup_value - this is what you're looking for in the detail, or source data (in the example above it would be apples, oranges, etc). You can enter actual text into the formula, but you'll usually use a reference to another cell (this will become clearer in a minute)
- table_array - this is the detail, or source data. It can be a reference to a range of cells, but it's always easier to use a predefined range here (if you're not sure how to define a range look here)
- col_index_number - this is simply the column from the range that you're going to want to show. In the example above you would be showing the totals from column 9 of the detail (counting from left to right, with the first column being 1)
- range_lookup - this has always been a bit of a puzzle to me. Not only its purpose, but also the way MS decided to set it up. Range_lookup is an optional value (that's what the square brackets mean ). If you leave this blank, or enter in the default value of true, the function will still return a value, or an approximate match, if it can't find an exact match for lookup_value. According to Microsoft:
- "If an exact match is not found, the next largest value that is less than lookup_value is returned."
Huh? The next largest value that is less than? There's got to be a better way to phrase this. I actually figured out what this meant and how to use it once, but it doesn't come back to me right now, and it's not something I really ever use (if you know, and find it useful leave a comment). Which is the weird part - why does it default to a value that you don't ever use? I've always thought that this should default to false, but what do I know. Anyway, you're almost always going to be entering 'false' here.
Now, if you were only working with a small data set like we're showing here you probably wouldn't bother going to the trouble. But what if you've got a data set that has thousands (or tens of thousands) of records. Finding bits and pieces of that data set can be a real problem. This is where the lookup functions really shine.
To give this a bit of a workout I've downloaded some historical data from the NASDAQ stock exchange from a website called Zeebob.com. In this case it's 3 months worth of details for all of the companies that are listed in the exchange. What I want to do is show the information for just the couple of companies that I've invested in (actually, I'm not invested in any of these, but just play along). Here's how we're going to set this up.
The first thing we need to do is to name the ranges that we're going to be working with. This will save you a ton of time in the long run, and also makes your formulas considerably easier to read. Here's a couple of keyboard shortcuts for selecting this data. Start by selecting the A1 cell in the upper left corner of the sheet. Now, hold down the Control and the Shift keys. While holding down these keys press the right arrow once (this will select all of the columns in the top row), and then press the down arrow once (this will select all of the rows to the end of the sheet):
Now type april_06 into the Name box in the upper left corner of the sheet. Important: You have to hit enter after entering text into the Name box. If you enter text and just click on something else the range name will not be saved:
Now we'll do the same with the other two sheets, naming the ranges may_06 and june_06, respectively. Remember, some of the rules to follow when naming sheets are no spaces or special characters. When you're done you should see three named ranges when you click on the down arrow at the Name box, and they should highlight the appropriate data when selected:
Let's add another sheet into the beginning of the workbook for the summary. The easiest way is to mouse over the sheet tabs at the bottom left of the workbook, right click and select Insert from the popup menu:
Now select Worksheet from the dialog, and click on OK:
Rename the sheet from Sheet1 to Summary by double-clicking on the tab of the new sheet:
OK, hang on for a minute while I do a little anal-retentive formatting (sorry, can't be helped).
[Sound of soft whistling in background...]
OK, I'm good. Here's how my sheet's formatted, you may (or may not) want to do something similar. I've also added the stock symbols of the companies that I want to track:
All that's left is adding the formulas. First select cell B3 on the summary sheet. Just think it through, what we want to do is to find the row that contains the symbol AAPL from the range april_06, then we want to pull the value from the 2nd column and place it into this cell, and we want to make sure that it's an exact match. The formula is going to look like this:
The only thing we haven't talked about is the dollar sign in the reference to cell A3. This is called a mixed reference, which is part absolute and part relative. If you're not familiar with cell references you should read this article first. In short what this means is that when I drag this formula to the right it will not change, because I've placed the dollar sign in front of the reference to the column(A), making it absolute. However, when I drag the formula down it will automatically increment, because I've left the row reference as relative. When entered into the spreadsheet it's going to look like this:
All right, now let's drag the formula to the right, so that all of the columns in the first row are filled in. When you do this, you'll notice something:
The value that points to the column we want (col_index_number) does not increment. I've never been able to find a way to get Excel to do this automatically (if you know how I would love to hear about it), so you'll need to go into the formulas and manually change the value for the column to the appropriate one. Here I've selected the cell C3, and I've changed the col_index_number in the formula bar from 2 to 3:
Simply repeat this for the rest of the columns, incrementing the col_index_number to 4,5,6, and 7. Once you've done that highlight the cells B3:G3, then grab and drag the fill box to fill the next 2 rows:
Now we just need to copy those vales down, to get the results for the next two months. Highlight the cells we just finished, from B3:G5, press the keys Ctrl and C to copy those values. Now select the cell B8:
and press the keys Ctrl and V to paste those values. This will copy the exact formulas into this block of cells. The only problem is that these won't reference the correct ranges, as we need this block to refer to the range may_06, not april_06. This is actually very simple to do for the whole block at once using the replace feature. With the block selected, press the keys Ctrl and H, which will bring up the find and replace dialog (you can also find this in the main menu under Edit -> Replace). Enter april_06 into the find field, and enter may_06 into the replace field:
IMPORTANT: Make sure you've got the range of cells selected before you do the replace, or you'll replace every instance of april_06 with may_06 in the entire sheet. This will usually be a bad thing. If you're sure you've still got the block selected go ahead and click on the button Replace All. This will correctly modify all of the formulas in this block to correctly reference the range in the sheet for April 06.
Repeat those last steps to finish filling in the block for the final set, replacing may_06 with june_06 in the final replace step. When you're done your sheet should look like this:
Well, I've got to hand it to you. If you made it all the way through this you've got some real determination. Maybe you should look into a career as a computer programmer, but stay away from nuclear power plants and black helicopters ;)
Questions? Comments? Don't be afraid to ask...