Merge Excel spreadsheets with common columns

Today I had a problem that is a non-starter when dealing with relational databases: I needed to do an inner join on two sets of data.

Instead what I had was two Excel spreadsheets, and it ended up taking much more time than I would have liked.

I wanted to "join" the spreadsheets based on a common email column. I thought this would be easy: there must be a built-in Excel wizard to do just that, right? Seemingly not, and the solution I ended up finding was to treat Excel like some Frankenstein SQL database.

At first I thought the "Consolidate" feature would accomplish what I needed. I wasn't managing to make it do what I wanted after looking at a few "How to use Excel consolidate" guides though. The guides I was finding were very "point-here, click-that, there you go—magic", without explaining what pointing here and clicking that actually did. I gave up on it.

Then I tried searching for "join tables excel" and got a lot of relevant add-ons that did exactly that, but they all cost money.

Finally I stumbled across a Microsoft article explaining how to do exactly what I wanted with Microsoft Query.

The only problem(s) with the article are:

  • It is aimed at mail merges in Word.
  • I wasn't able to add more than one Excel file as a data source, as they are able to do in the article.
  • It's a ten year-old article. Also it seems the Microsoft Query interface hasn't been updated since Windows 3.1.

It took me some back-and-forth between the article, Excel and Microsoft Query to get the results that I needed so I'm documenting the steps for posterity.

Get the data in the same place

All the data has to be in the same spreadsheet; I didn't find a way to add more than one spreadsheet to Microsoft Query.

In my case, I was trying to end up with the rows from Spreadsheet 1 whose email was the same as in Spreadsheet 2. I just copied the email column from Spreadsheet 2 into Spreadsheet 1.

Define names to use as tables

Microsoft Query uses names defined in Excel to create its "tables". I needed two of these "tables": one for Spreadsheet 1's information (id and email) and one for Spreadsheet 2's information (email, which I'd pasted into Spreadsheet 1).

To create the first table I selected the two columns I wanted and then clicked "Define Name" in the "Formulas" menu, "Named cells" panel. I called it "emailsall".

The second table was created the same way, except selecting the "emails" column that I had pasted in from Spreadsheet 2, and naming it "emailsshort".

Query the "tables"

To get the results of the faux join I wanted I created a new Excel spreadsheet and then clicked on the "Data" menu, then on "From Other Sources" inside the "Get External Data" panel, and finally "From Microsoft Query".

At the "Choose Data Source" screen I selected "Excel Files" and opened Spreadsheet 1. In "Choose Columns", I added the two "tables" that had just been created. When clicking "Next" the Wizard complained about not being able to join the tables itself and they had to be manually joined. I clicked "OK" and in the Microsoft Query window clicked the little "SQL" button.

In the SQL window I edited the query to do an inner join and got the results I needed. The query went something like this:

SELECT emailall.id, emailall.email
FROM emailall emailall
INNER JOIN emailshort on emailall.email = emailshort.email

Finally go to the "File" menu -> "Return Data to Microsoft Excel"

Resources

About johnny

Computers have interested me since I can remember and conveniently I studied computer science. I also enjoy performing in a local amateur theatre group and cycling. This is where I post solutions to problems I've had in the office or any other project, hopefully clearly enough to refer back to in the future.
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

8 Responses to Merge Excel spreadsheets with common columns

  1. Mark Pajak says:

    Hi

    I am having the same issue, having read the same 10 year old article. I have two sets of data that I need to merge into one report. Both are comming from the same database, but being extracted from different view tables, or modules - even though the source data lives in the same place. I cant quite get it to work yet as the field names that would become the primary keys are different so excel wont treat them as one column of values, or at least it is protesting!

    I am sure that this way of working is not logically the right way to do it - but somehow this is what I need to do!

  2. netbook user says:

    Wow, thank you for the great tutorial, I managed to deal with two Excel tables in a flash.

  3. Rowena says:

    I had the same problem several years ago and its cropped up again today.

    The solution last time was to downloaded a free trial of one of the add-in tools. It worked a treat and was very quick and logical to use. Trouble is now its expired and I can't remember what it was called!

    Thanks for sharing your work around - I'll give this a try if I can't find the add-in

  4. diego says:

    Workes on excell 2000.

    Yes, I still use this version...

    Thx for thearticle.

  5. Ranjani says:

    Awesome...very nice article...Thank you!

  6. Rick says:

    If you have a lot of columns this won't work b/c of limits in Query.

    Open Access, Import the Excel sheets into Access tables and do the joins in an Access query. Then export the results of the query back to Excel.

  7. TerryWa says:

    Rick's suggestion worked for me: "Open Access, Import the Excel sheets into Access tables and do the joins in an Access query. Then export the results of the query back to Excel." I've been downloading warranty info from Dell for hundreds of machines into one file. The information was keyed on the Service Tag/Asset Tag/Serial Number, but it didn't have the computer name associated with the Service Tag. I created a WMI Query for SCCM 2012 to create a list of computers having a Service Tag. I used Access to create a relationship between the two imported files, then 'Query Design' to create an 'inner join' on the Service Tag.

  8. Leonard says:

    Here are some other options:

    "When clicking “Next” the Wizard complained about not being able to join the tables itself and they had to be manually joined." When you get this error, you can also click "Ok" and then click and drag the field that you want to equate (email) from one Defined Field list (emailall) to the other (emailshort) in the window above the preview.

    Or you can continue with the SQL button suggested by Johnny and use a where clause instead of the INNER JOIN; that is WHERE emailall.email = emailshort.email;

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>