Here’s the surprising part: a query using IN can return zero rows even though the data you’re checking against is completely correct — no typos, no missing joins, nothing wrong with your logic. The culprit is a single NULL sitting somewhere in the subquery’s result set, and most people writing SQL never learn this until it costs them a debugging session. EXISTS doesn’t have this problem at all, which is one of several reasons the choice between the two matters more than it looks like it should.
This tutorial walks through that difference, and several others, as a sequence of steps — building from “what each one actually checks” up to “which one you should reach for in a given situation.”
Step 1: Understand What Each Keyword Is Actually Checking
IN checks whether a value appears anywhere in a list or a set of values returned by a subquery. Conceptually, it’s asking: “is this specific value present in that collection?”
EXISTS checks something different: whether a subquery returns any rows at all. It never inspects the actual values inside those rows — it only cares whether the subquery produces at least one row or zero rows. That distinction sounds small on paper, but it explains almost every behavioral difference that follows.
A useful way to picture it: IN compares a value against a list, while EXISTS asks a yes-or-no question about whether a related row exists somewhere else. One returns a membership test; the other returns a boolean signal.
Step 2: See How Each One Reads in a Real Query
Suppose you want every customer who has placed at least one order. With IN, you’d write a subquery that selects customer IDs from the orders table, then check whether each customer’s ID falls inside that list:
SELECT customer_id, customer_name FROM customers WHERE customer_id IN (SELECT customer_id FROM orders);
With EXISTS, the subquery instead references the outer table directly, correlating row by row:
SELECT customer_id, customer_name FROM customers c WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
Notice the shape of the second query. The subquery doesn’t select a specific column to compare — it selects a placeholder value (commonly 1) because EXISTS never looks at what’s selected, only whether a row shows up. That correlation, tying the subquery back to the current outer row via o.customer_id = c.customer_id, is the defining feature of how EXISTS is normally used.
Step 3: Run the NULL Trap for Yourself
This is the step worth pausing on, because it’s the single most common source of confusing bugs involving IN.
Imagine your orders table has a customer_id column, and one row in that table has a NULL in that column — maybe from a bad import, a cancelled order, or a data entry gap. Now run a query looking for customers who have never placed an order:
SELECT customer_id, customer_name FROM customers WHERE customer_id NOT IN (SELECT customer_id FROM orders);
If even one row in that subquery’s result contains NULL, this query returns zero rows. Not fewer rows — zero, regardless of how many customers genuinely have no orders. The reason is that SQL’s three-valued logic treats comparisons against NULL as unknown rather than false, and NOT IN requires every comparison in the list to definitively evaluate to false for a row to qualify. One unknown comparison poisons the entire list.
NOT EXISTS sidesteps this entirely, because it never compares against a list of values — it just checks whether a correlated row exists:
SELECT customer_id, customer_name FROM customers c WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
This version returns the correct result whether or not NULL values are floating around in the orders table. If you take away one rule from this entire tutorial, make it this one: avoid NOT IN against a subquery unless you’re certain that subquery’s column can never contain NULL.
Step 4: Test Both on a Small Dataset Before Trusting Either
Before writing either version into production code, it’s worth running both against a small, known dataset where you can predict the correct answer by hand. Build a customers table with five or six rows, an orders table referencing some but not all of those customers, and deliberately insert one NULL customer ID into the orders table.
Run the IN version and the EXISTS version side by side. With plain IN, you’ll likely see matching results. With NOT IN versus NOT EXISTS, that’s where the difference surfaces — one returns the customers with no orders, the other returns nothing. Seeing this gap on a dataset small enough to verify manually tends to stick far longer than reading about it in the abstract.
Step 5: Consider How Each One Tends to Perform
Beyond correctness, there’s a performance dimension worth understanding, though it’s less absolute than the NULL issue and depends heavily on your specific database engine and indexing.
EXISTS stops scanning as soon as it finds one matching row, since it only needs to confirm existence, not enumerate every match. That can make it efficient on large correlated subqueries, particularly when an index supports the join condition inside the subquery.
IN historically needed to materialize the full list of values returned by the subquery before comparing against it, which could be costly on very large subquery results — though modern query optimizers in PostgreSQL, MySQL, and SQL Server have gotten considerably better at rewriting IN and EXISTS into equivalent execution plans when the two are logically interchangeable. In many everyday queries against reasonably sized tables, the performance difference is negligible, and readability should probably decide the outcome instead.
The safest approach: don’t assume one is always faster. Check the execution plan for your specific query, on your specific database, with your specific indexes, before treating either one as the default performance choice.
Step 6: Match the Choice to the Shape of Your Question
A pattern emerges once you’ve worked through enough of these:
IN tends to read more naturally when you’re checking a column against a small, fixed, or independently meaningful list of values — something like WHERE status IN ('shipped', 'delivered', 'returned'). There’s no subquery correlation involved, and the list stands on its own.
EXISTS tends to read more naturally when the question is fundamentally about a relationship between two tables — “does this customer have any orders,” “does this product appear in any active promotion” — where the subquery needs to reference the outer row to make sense.
If your subquery doesn’t reference the outer query at all, IN is usually the simpler and clearer choice. If your subquery references the outer row through a WHERE clause tying the two together, EXISTS is usually the better fit, and it comes with the added benefit of sidestepping the NULL trap entirely.
Step 7: Apply a Simple Decision Rule Going Forward
Boiled down to something you can actually apply mid-query:
Use IN when checking a column against a static list of literal values, or against a subquery you’re certain returns no NULL values. Use EXISTS when the check depends on a relationship between two tables, especially inside a NOT IN / NOT EXISTS situation where NULL could realistically appear. When in doubt about whether NULL might sneak into a subquery’s result, default to EXISTS — it costs you nothing and removes an entire category of silent bug.
| Situation | Better Choice |
|---|---|
| Checking against a fixed list of literals | IN |
| Checking a relationship between two tables | EXISTS |
| Any “not in this set” logic where NULL is possible | EXISTS (as NOT EXISTS) |
| Subquery result is large and uncorrelated | Test both; let the execution plan decide |
| Readability matters more than a marginal performance gain | Whichever reads more naturally for the question |
The habit worth building isn’t memorizing which keyword is faster — it’s asking whether your subquery needs to reference the outer row, and whether NULL could plausibly show up in the values you’re comparing against. Answer those two questions honestly, and the right keyword tends to fall out on its own.