A colleague once asked me why his UNION query was running noticeably slower than he expected, given that the query itself looked fairly simple. The cause was not complexity in his actual query logic — it was his use of UNION instead of UNION ALL, for a situation where he genuinely did not need UNION’s specific additional behavior, and was paying an unnecessary performance cost as a direct result.
This tutorial covers exactly what each keyword does differently, when that difference genuinely matters for your actual results, and when it only matters for performance with no meaningful difference in your final output either way.
What UNION and UNION ALL Both Do
Both UNION and UNION ALL combine the results of two or more separate SELECT statements into a single combined result set, stacking the rows from each individual SELECT statement on top of each other vertically, rather than combining columns side by side the way a JOIN does.
For this to work correctly, each individual SELECT statement being combined must return the same number of columns, in a compatible order and with reasonably compatible data types across each corresponding position, even if the actual underlying table each SELECT statement is querying from is completely different from the others. The actual column names used in your final combined result typically come from whichever SELECT statement is listed first in your overall UNION sequence.
A common use case: combining current year transactions stored in one table with historical archived transactions stored in a separate older table, where both tables share the same essential structure, into one single combined result set you can analyze together as if they were one unified, larger dataset.
The Specific Difference: Duplicate Handling
This is the core distinction between the two keywords, and it is genuinely simple once stated directly: UNION automatically removes any duplicate rows from your combined result, keeping only distinct, unique rows across the entire combined set. UNION ALL keeps every single row from every SELECT statement involved, including any duplicates that might exist, whether those duplicates existed within one single SELECT statement’s own individual results, or arose specifically because the same identical row happened to appear in more than one of your separate SELECT statements being combined.
If your underlying data is structured such that genuine duplicates across your combined SELECT statements are simply impossible (for example, combining current and historical transaction tables where transaction IDs are guaranteed unique across both tables by design, with no possibility of the same transaction ID legitimately appearing in both), UNION and UNION ALL will produce functionally identical results in that specific scenario, since there are no actual duplicates for UNION’s deduplication behavior to ever remove in the first place.
If genuine duplicates are possible across your specific combined data, the two keywords will produce meaningfully different results, with UNION ALL’s result containing more total rows than UNION’s result, by exactly however many duplicate rows existed across your particular combination.
The Performance Implication
Removing duplicates, which is what UNION specifically does that UNION ALL does not, requires the database to compare every single resulting row against every other resulting row, to identify and then eliminate any genuine duplicates found. This comparison work has a real, measurable computational cost, which becomes increasingly significant as the total number of rows involved in your combination grows larger.
UNION ALL skips this entire comparison and deduplication step, simply stacking all your individual results together directly, without performing any additional duplicate-checking work at all. This makes UNION ALL meaningfully faster than UNION whenever genuine duplicates are not actually a concern for your specific situation, precisely because it skips an entire category of computational work that UNION inherently performs.
This was exactly the issue my colleague encountered: his specific combined data genuinely could not contain real duplicates, given how his particular tables were structured and the nature of the data within them, which meant UNION’s deduplication work was providing zero actual benefit to his final result, while still costing real computational time to perform that unnecessary work regardless.
The Practical Decision Rule
Given this tradeoff, the practical guideline is genuinely straightforward: use UNION ALL by default, unless you specifically know that duplicates are both possible in your particular situation, and that removing them is actually the correct, intended behavior for your specific business question at hand.
Many beginners default to UNION simply because it happens to be the more commonly mentioned keyword in introductory tutorials and documentation, without considering whether their actual specific situation genuinely needs deduplication at all, or whether UNION ALL would produce an identical correct result while running measurably faster.
Before choosing UNION specifically, ask yourself directly: is it actually possible for the exact same row to legitimately appear in more than one of my separate SELECT statements being combined, and if it does, do I genuinely want only one single copy of that row to appear in my final combined result. If the honest answer to either part of that question is no, UNION ALL is very likely both the faster and the equally correct choice for your specific situation.
A Concrete Example Showing the Difference
Imagine combining a list of customers who made a purchase this month with a separate list of customers who contacted customer support this month, specifically to identify every customer who was active in at least one of these two ways during the month overall.
If a particular customer both made a purchase AND contacted support during that same month, their customer ID would genuinely appear in both of your individual underlying SELECT statements being combined. Using UNION ALL here would show that specific customer twice in your combined result — once representing their purchase activity, once representing their support contact activity. Using UNION instead would show that customer only once in your combined result, since UNION’s deduplication specifically treats that genuinely repeated customer ID as a duplicate worth removing down to a single appearance.
Which specific behavior is actually correct here depends entirely on your particular business question. If you genuinely want a simple count of “how many total customers were active in some way this month,” UNION’s deduplication behavior is exactly correct, ensuring that doubly-active customer is sensibly counted only once rather than artificially inflating your total active customer count. If you instead specifically want to separately analyze and distinguish “how many total activity events happened this month, across both purchase and support contact types,” UNION ALL’s behavior of preserving every individual occurrence, including that customer’s two separate distinct activities, would actually be the more genuinely correct choice for that specific, differently framed business question.
This example illustrates that the choice between UNION and UNION ALL is not merely a performance optimization decision in every situation — it can also be a genuine correctness decision, depending specifically on what your particular underlying business question actually requires from your combined result.
UNION and Sorting
ORDER BY can be applied to your overall combined UNION or UNION ALL result, but it must be placed at the very end of your entire combined query, sorting the final combined result as a complete whole, rather than being placed within each individual SELECT statement separately, where it would typically have little meaningful effect on your overall final combined and potentially deduplicated result’s actual final ordering.
INTERSECT and EXCEPT: Related But Distinct Set Operations
Beyond UNION and UNION ALL specifically, some database systems also support INTERSECT (returning only rows that genuinely appear in both of your combined SELECT statements’ results) and EXCEPT or MINUS depending on your specific database system (returning rows that appear in your first SELECT statement’s result but specifically not in your second one).
These are used less frequently in typical everyday business reporting than UNION or UNION ALL, but are worth knowing exist for the specific situations where you genuinely need to identify an overlap or a specific difference between two distinct result sets, rather than simply combining everything from both of them together the way UNION and UNION ALL do.
A Quick Reference
| Situation | Correct Choice |
|---|---|
| Combining data with no possibility of genuine duplicates | UNION ALL (faster, identical result either way) |
| Combining data where duplicates are possible, and should be removed | UNION |
| Combining data where duplicates are possible, but should be preserved and counted individually | UNION ALL |
| Unsure whether duplicates are genuinely possible in your specific data | Default to UNION ALL, then verify your actual assumption about duplicates directly against real data |
What I Told My Colleague
I asked him the same direct question outlined above: could the exact same row genuinely appear in both of his separate underlying SELECT statements being combined. After actually checking his specific data directly, the honest answer was no — his particular current and historical transaction tables were structured in a way that made genuine duplication across them structurally impossible by design. Switching his query from UNION to UNION ALL produced the identical correct final result, while running measurably faster, since the deduplication work UNION had been quietly performing the entire time had genuinely never been finding any actual duplicates to remove in the first place.
Could the exact same row genuinely appear in more than one of the SELECT statements you are trying to combine? Describe your specific tables and situation, and I can help you determine whether UNION or UNION ALL is the actually correct choice for your case.