A Painless Introduction to the MySQL JOIN
if you already understand the JOIN clause, you could be forgiven for thinking it needs no explanation. like so much of programming, once you understand it it’s almost trivial. but figuring it out is pretty hard. let’s begin at (ok, near) the beginning.
you have two tables with data that you want to cross reference. the classic example is table a, containing customers, and table b, containing orders. you could have just created one table, but if you expect your customers to place more than one order, you end up with needless duplication of data. that is, with only one table, you would have to enter all of the customer information as well as all of the order information with each order. with two separate tables, you enter the customer information just once, and then for each order the customer places, you put a reference to the customer id in the orders table.
if you want to follow along, copy and paste the following code into a query window to create the tables:
CREATE TABLE customers (
customer_id int auto_increment not null primary key,
customer_name varchar(100),
customer_email varchar(100)
);
CREATE TABLE orders (
order_id int auto_increment not null primary key,
customer_id int not null,
item_name varchar(100),
item_price float(5,2)
);
#ADD SOME SAMPLE DATA#insert into customers (customer_name, customer_email) values (’Alice’,'alice@example.com’), (’Bob’,'bob@example.com’), (’Carol’,'carol@example.com’), (’Dave’,'dave@example.com’);
insert into orders (customer_id,item_name,item_price) values (1,’A toaster’,12.99), (1,’Bread’,2.50), (1,’Butter’,1.79), (3,’Butter’,1.79), (3,’Cereal’,3.23), (4,’Paper’,4.99), (5,’Bread’,2.50);
our example only allows for one item per order. if we wanted to allow for more than one item per order, we would probably create a third table (”order_items”) that would contain a reference to the orders table. the third table would not need a reference to the “customers” table, because each item in “order_items” is necessarily part of an “order”, and each order belongs to a certain customer. in a real-world system, we would probably want to use innoDB tables and foreign keys, but for now we will focus on the JOIN.
the JOIN allows us to take the data that we have spread out for ease of storage and look at it as though it were all in one table, while maintaining the association between the two tables. without the JOIN, we could still get all the data, it just wouldn’t make any sense:
SELECT * FROM customers, orders;
<28 rows found>
this can’t be the data we wanted: the result of this query is a cartesian join, and if you run it on large tables the result set quickly grows very large. it pairs every row of the first table with every row of the second table. if you have more than two tables in the SELECT clause, it outputs every row of each table with every row of all the other tables. most importantly, it ignores the logical association between the tables.
why would it do that?
the database doesn’t know and doesn’t care that we have a column we’re using as a cross-reference. JOIN uses the keyword “ON” to specify the cross-reference column.
SELECT * FROM customers JOIN orders ON customers.customer_id=orders.customer_id
| customers. customer_id | customer_name | customer_email | order_id | orders. customer_id | item_name | item_price |
|---|---|---|---|---|---|---|
| 1 | Alice | alice@example.com | 1 | 1 | A toaster | 12.99 |
| 1 | Alice | alice@example.com | 2 | 1 | Bread | 2.5 |
| 1 | Alice | alice@example.com | 3 | 1 | Butter | 1.79 |
| 3 | Carol | carol@example.com | 4 | 3 | Butter | 1.79 |
| 3 | Carol | carol@example.com | 5 | 3 | Cereal | 3.23 |
| 4 | Dave | dave@example.com | 6 | 4 | Paper | 4.99 |
that’s more like it. what we wanted, if you remember your venn diagrams, is the intersection of the two tables. that is, we want “all of the customers and their respective orders”. without the JOIN statement, we just get “all the customers and orders”.
but what about bob?
too true. a useful (or irritating, depending on whether you understand it) feature of the JOIN is that it automatically excludes rows from one table that do not have a corresponding entry in the other table. so the above query has only given us customers that have placed orders, and orders whose customer_id exists in the “customers” table. notably, we don’t have the customer “Bob” anywhere in the result set, or the order with order_id 7. the default JOIN in mysql is a pure inner JOIN: it gives only the rows where the join row exists in both tables. since Bob (customer id 4) has no orders, he doesn’t show up in the result set. likewise since order #7’s customer_id (5), doesn’t exist in the customers table, it doesn’t show up in the result set. we can get at this other data using the OUTER JOIN.
first lets try a query that gets all of the orders, along with the customer information if it exists:
SELECT * FROM customers LEFT JOIN orders ON customers.customer_id=orders.customer_id;
| customers. customer_id | customer_name | customer_email | order_id | orders. customer_id | item_name | item_price |
|---|---|---|---|---|---|---|
| 1 | Alice | alice@example.com | 1 | 1 | A toaster | 12.99 |
| 1 | Alice | alice@example.com | 2 | 1 | Bread | 2.5 |
| 1 | Alice | alice@example.com | 3 | 1 | Butter | 1.79 |
| 2 | Bob | bob@example.com | ||||
| 3 | Carol | carol@example.com | 4 | 3 | Butter | 1.79 |
| 3 | Carol | carol@example.com | 5 | 3 | Cereal | 3.23 |
| 4 | Dave | dave@example.com | 6 | 4 | Paper | 4.99 |
ah, there’s Bob. the LEFT JOIN has given us all of the rows of the customers table, along with any rows from the orders table that have a customer_id that exists in the customers table. similarly, we can get all of the orders, along with any existing customer information, by using a RIGHT JOIN:
SELECT * FROM customers RIGHT JOIN orders ON customers.customer_id=orders.customer_id;
| customers. customer_id | customer_name | customer_email | order_id | orders. customer_id | item_name | item_price |
|---|---|---|---|---|---|---|
| 1 | Alice | alice@example.com | 1 | 1 | A toaster | 12.99 |
| 1 | Alice | alice@example.com | 2 | 1 | Bread | 2.5 |
| 1 | Alice | alice@example.com | 3 | 1 | Butter | 1.79 |
| 3 | Carol | carol@example.com | 4 | 3 | Butter | 1.79 |
| 3 | Carol | carol@example.com | 5 | 3 | Cereal | 3.23 |
| 4 | Dave | dave@example.com | 6 | 4 | Paper | 4.99 |
| 7 | 5 | Bread | 2.5 |
alright, we lost bob, but now we have that order whose customer_id doesn’t exist in the customers table.
the LEFT and RIGHT joins are often used, along with a WHERE clause, to find “orphaned” rows. for instance, we could add a WHERE clause to the above RIGHT JOIN to give us only those rows where the order has no valid customer id:
SELECT * FROM customers LEFT JOIN orders ON customers.customer_id=orders.customer_id WHERE customers.customer_id IS NULL;
| customers. customer_id | customer_name | customer_email | order_id | orders. customer_id | item_name | item_price |
|---|---|---|---|---|---|---|
| 7 | 5 | Bread | 2.5 |
what if we want both?
the full outer join is harder to accomplish in mysql. but from our little exercise, we can tell it would be useful. a query that would give you ALL of the customers and ALL of the orders, corresponding to each other where a correspondance exists. mysql doesn’t support the FULL OUTER JOIN statement, but we can accomplish a FULL OUTER JOIN fairly easily using the UNION statement. UNION is a way of merging the result sets of two separate SQL queries. we’ll try it out:
SELECT * FROM customers
left join orders ON customers.customer_id=orders.customer_id
union
SELECT * FROM customers
right join orders ON customers.customer_id=orders.customer_id;
| customers. customer_id | customer_name | customer_email | order_id | orders. customer_id | item_name | item_price |
|---|---|---|---|---|---|---|
| 1 | Alice | alice@example.com | 1 | 1 | A toaster | 12.99 |
| 1 | Alice | alice@example.com | 2 | 1 | Bread | 2.5 |
| 1 | Alice | alice@example.com | 3 | 1 | Butter | 1.79 |
| 2 | Bob | bob@example.com | ||||
| 3 | Carol | carol@example.com | 4 | 3 | Butter | 1.79 |
| 3 | Carol | carol@example.com | 5 | 3 | Cereal | 3.23 |
| 4 | Dave | dave@example.com | 6 | 4 | Paper | 4.99 |
| 0 | 7 | 5 | Bread | 2.5 |
well, that worked pretty well. we got all the data from both tables, with corresponding data if it exists. whew!
Santosh damre said,
Wrote on December 19, 2007 @ 10:52 am
A very good Explaination of joins.I got clear picture of join after reading this doc……..
Neil said,
Wrote on June 26, 2008 @ 10:18 pm
Hi,
Thanks for this - just wanted to mention that the ‘insert into orders’ section of the first code example has a syntax error in it (there’s a single quote round one of the prices that shouldn’t be there):
BEFORE
insert into orders (customer_id,item_name,item_price) values (1,’A toaster’,12.99), (1,’Bread’,2.50), (1,’Butter’,1.79), (3,’Butter’,1.79), (3,’Cereal’,’3.23), (4,’Paper’,4.99), (5,’Bread’,2.50);
AFTER
insert into orders (customer_id,item_name,item_price) values (1,’A toaster’,12.99), (1,’Bread’,2.50), (1,’Butter’,1.79), (3,’Butter’,1.79), (3,’Cereal’,3.23), (4,’Paper’,4.99), (5,’Bread’,2.50);
Thanks,
Neil
Will said,
Wrote on June 26, 2008 @ 10:33 pm
Thanks, Neil! I fixed it in the article.