A subquery is just a query written inside another query. That single sentence is the entire definition, and yet subqueries consistently rank among the concepts that intimidate intermediate SQL learners the most.
The intimidation usually comes from seeing a complex, deeply nested example before understanding the simpler building blocks that make subqueries useful in the first place. This tutorial deliberately builds up gradually, starting with the simplest possible use case before introducing more complex patterns, which is the same sequence I use with my own students.
The Core Idea: A Query That Produces a Value, Used Inside Another Query
Sometimes, to answer a business question, you first need to calculate some intermediate value, and then use that value as part of a larger query. Rather than running two completely separate queries and manually copying a result from the first into the second, a subquery lets you embed the first query directly inside the second, so the database calculates and uses that intermediate value automatically, in a single combined operation.
The Simplest Case: A Subquery That Returns a Single Value
Imagine you want to find all sales transactions that were above the average sale amount. You need two pieces of information conceptually: the average sale amount across all transactions, and then a comparison of each individual transaction against that average.
Without a subquery, you might calculate the average separately first, note down the resulting number, and then write a second query using that specific number in a WHERE condition. This works, but it requires a manual step, and if your underlying data changes, your manually noted average becomes outdated unless you recalculate and update it yourself.
With a subquery, you embed the average calculation directly inside your main query’s WHERE clause, in parentheses. The structure becomes: SELECT your desired columns, FROM your sales table, WHERE sale amount is greater than, followed by an opening parenthesis, then a complete separate SELECT statement calculating the average sale amount from that same table, followed by a closing parenthesis.
The database calculates that inner average first, then uses the resulting single number as the comparison value for the outer WHERE condition, all within one single combined query execution. If your underlying data changes, simply running this same query again automatically recalculates the current average and applies it, with no manual update step required.
This specific pattern — a subquery that returns exactly one single value, used within a WHERE comparison — is the simplest and most approachable subquery pattern, and a solid starting point for genuinely understanding the concept before moving to more complex variations.
A Subquery That Returns Multiple Values, Used With IN
The previous example worked because the inner subquery returned exactly one single number (an average). Sometimes you need a subquery that returns multiple individual values instead, which requires a different comparison approach than a simple greater-than or equals operator.
Imagine you want to find every customer who has placed an order containing a specific product. You might first need to identify which order IDs contain that specific product (potentially multiple order IDs), and then find every customer associated with any of those specific order IDs.
The structure becomes: SELECT your desired customer columns, FROM your customers table, WHERE customer ID IN, followed by an opening parenthesis, then a separate SELECT statement finding the relevant customer IDs from your orders table where the product matches your specific product of interest, followed by a closing parenthesis.
Here, the inner subquery can return several different customer ID values (since multiple customers might have ordered that specific product), and the IN operator checks whether each customer in your outer query matches any one of those returned values, rather than requiring an exact match against a single specific value the way equals or greater-than would.
Subqueries in the FROM Clause: Treating a Query’s Result as a Temporary Table
A different, slightly more advanced pattern places a subquery directly in the FROM clause, rather than within a WHERE condition. This effectively treats the result of that inner query as if it were a temporary table that your outer query can then select from, filter, or join against, just like it would with any genuine permanent table.
This pattern is particularly useful when you need to first calculate some aggregated or transformed version of your data, and then perform additional operations on top of that already-transformed result, rather than trying to express everything in one single, deeply complex query.
For example, imagine you first need to calculate total sales per salesperson (using GROUP BY, as covered in the previous tutorial), and then you specifically want to find which salespeople rank in the top half of that distribution. Calculating both the individual totals and the comparison against the overall distribution within one single ungrouped query without any subquery becomes genuinely awkward to express directly. Using a FROM subquery, you first run your GROUP BY query calculating per-salesperson totals as the inner subquery, and then your outer query selects from that already-calculated result, applying whatever additional filtering or comparison logic you need against those pre-calculated totals.
When using a subquery in the FROM clause, most database systems require you to give that subquery’s result a temporary name (often called an alias), placed directly after the closing parenthesis of the subquery itself, so your outer query has a way to refer to that temporary result set.
Correlated Subqueries: When the Inner Query Depends on the Outer Query
The subquery examples so far have been independent — the inner query could theoretically run completely on its own, with no reference to anything happening in the outer query around it. A correlated subquery is different: it specifically references a column from the outer query within its own logic, meaning it cannot run independently and instead gets re-evaluated separately for every single row the outer query is processing.
A common use case: finding, for each individual salesperson, their single highest-value sale specifically, compared against their own personal sales history, rather than compared against everyone else’s combined sales history.
This typically looks like an outer query selecting from your sales table, with a WHERE condition comparing the sale amount to a subquery that calculates the maximum sale amount specifically for that same salesperson, referencing the outer query’s current salesperson value within the inner subquery’s own WHERE clause to create that connection between inner and outer queries.
Correlated subqueries are conceptually more demanding than the independent subqueries covered earlier, since the inner query is effectively being run repeatedly, once for each row in the outer query, with a slightly different filtering condition each time based on that specific outer row’s values. They are also frequently slower in actual execution for this same reason, particularly on larger tables, since the work of the inner query is being repeated many times rather than calculated once. For situations where a correlated subquery’s logic could instead be expressed using a JOIN combined with GROUP BY, the JOIN-based alternative is often, though not universally, the more efficient choice, and worth considering as an alternative if you find yourself reaching for a correlated subquery and performance becomes a concern.
EXISTS: A Specific Pattern for Checking Presence Rather Than Retrieving Values
A specific, common subquery pattern uses the EXISTS keyword rather than IN or a direct value comparison, specifically to check whether any matching row exists at all in a related table, without actually needing to retrieve or compare specific values from that related table.
For example, finding every customer who has placed at least one order, without needing any specific details about those orders beyond confirming that at least one exists, is a natural fit for EXISTS. The structure places EXISTS before a parenthesized subquery, and that subquery typically uses a correlated reference back to the outer query (similar to the correlated subquery pattern above) to check specifically for that outer row’s related matches, rather than checking against the entire unrelated dataset.
EXISTS is often more efficient than IN for this specific kind of presence-checking use case, particularly because EXISTS can stop searching the moment it finds even just one matching row, without needing to retrieve or compare every single matching value the way IN conceptually needs to gather a complete list of values to check against first.
When to Use a Subquery vs When to Use a JOIN
This is a genuinely common point of confusion, since many business questions can technically be answered using either approach, and beginners often are not sure which one to reach for.
As a general guideline: if you need to actually display columns from both the “main” data and the related data side by side in your final result, a JOIN is typically the more natural and often more efficient choice, since it is specifically designed to combine and display data from multiple tables together. If you only need to use the related table to filter or check something about your main data, without needing to actually display any of that related table’s columns in your final output, a subquery (particularly using IN or EXISTS) is often a cleaner, more readable expression of that specific intent.
Neither approach is universally “correct” — many real-world queries could be written either way and produce identical results, and choosing between them often comes down to which version more clearly expresses your actual intent to someone else (or your future self) reading the query later, alongside performance considerations on very large datasets where the specific execution approach can matter more.
A Progression for Building This Skill
Given how many distinct subquery patterns exist, attempting to learn all of them simultaneously is exactly the kind of overwhelm that makes subqueries feel intimidating. I recommend this specific learning sequence, matching the order covered in this tutorial:
First, get comfortable with a subquery returning a single value, used in a simple WHERE comparison — this is the gentlest possible introduction to the core concept.
Second, practice subqueries returning multiple values, used with IN — this introduces the idea that a subquery’s result is not always just one number.
Third, experiment with a subquery in the FROM clause, treating its result as a temporary table — this introduces a different placement and use case beyond just WHERE filtering.
Fourth, and only once the above feel comfortable, explore correlated subqueries and EXISTS, which require holding both the inner and outer query’s logic in mind simultaneously, and represent a genuine step up in complexity from the earlier, independent subquery patterns.
Rushing to correlated subqueries before the simpler independent patterns feel automatic is the most common reason subqueries feel overwhelming rather than simply being another tool in your regular toolkit, the way they eventually become with consistent practice.
What business question are you trying to answer, and does it need to combine data from two tables for display, or just use one table to filter or check something about the other? Describe your situation and I can help you decide whether a subquery or a JOIN fits better, and which subquery pattern specifically applies.