SQL UNION vs UNION ALL: Key Differences That Affect Both Results and Performance

UNION combines the result sets of two or more SELECT statements into a single result set and removes any duplicate rows in the process. UNION ALL does the same combining but keeps every row, duplicates included. That single word — ALL — is the entire difference in what each keyword instructs the database to do, and it’s responsible for nearly every meaningful gap in behavior, performance, and correctness between the two.

Both keywords require the same basic setup: each SELECT statement being combined must return the same number of columns, in a compatible order, with compatible data types. Get that part right and the rest of the decision comes down to five practical differences, ranked here from most consequential to least.

1. Duplicate Removal — The Difference That Changes Your Actual Results

This is the difference that matters most, because it can silently change what your query returns.

UNION scans the combined result set and eliminates any row that’s an exact duplicate of another row already present. “Exact duplicate” means every column value matches across the entire row — not just one column, all of them. If two SELECT statements each return a row for “John Smith, Sales, 2024,” UNION collapses those into a single row in the output.

UNION ALL skips that step entirely. Every row from every SELECT statement shows up in the final result, whether or not an identical row appears elsewhere in that combined output. If both source queries return that same “John Smith, Sales, 2024” row, UNION ALL happily returns it twice.

This distinction has a habit of hiding real bugs. Picture a report that unions sales figures from two regional tables, and a customer happens to place identical-looking orders in both regions on the same day. UNION would quietly merge those into one row, understating the true total. UNION ALL would preserve both, which is correct here since they represent two separate transactions that simply look alike on paper. Choosing the wrong keyword doesn’t throw an error — it just gives you a number that’s wrong in a way nobody notices until someone reconciles it against a different source.

The rule of thumb: reach for UNION ALL as your default, and only switch to UNION when you have a specific, articulable reason to believe duplicate rows shouldn’t count twice in your particular result.

2. Performance — Why UNION ALL Is Consistently Faster

UNION ALL is faster than UNION, and the reason is mechanical rather than incidental.

To remove duplicates, UNION has to compare every row in the combined result against every other row, which typically means sorting the entire result set (or building an equivalent hash structure) purely to identify which rows match. That sorting or hashing step costs CPU time and, on larger result sets, memory — and that cost exists purely to support deduplication. It has nothing to do with retrieving or combining the data itself.

UNION ALL skips that step completely. It appends one result set to another and returns the combined rows immediately, with no comparison pass across the output at all.

On a query returning a few dozen rows, this difference won’t register. On a query combining several large tables — millions of rows apiece — the deduplication pass in UNION can add meaningful time to a query, sometimes turning a query that runs in seconds into one that runs in minutes. If you already know your source data can’t produce duplicate rows across the combined SELECT statements — say, because you’re pulling from tables that are logically partitioned with no overlap — UNION ALL gets you the identical result to UNION, just without paying for a duplicate check that was never going to find anything.

3. Column Requirements — Identical for Both, No Exceptions

Neither keyword bends the rules on structure, and this is worth stating plainly because it trips up beginners regardless of which keyword they’ve chosen.

Every SELECT statement being combined must return the same number of columns. Three columns in the first SELECT statement means every other SELECT statement in that same UNION or UNION ALL needs exactly three columns too — not more, not fewer.

Column order matters more than column names. The database matches columns by their position in each SELECT statement, not by whatever alias you’ve given them. If your first SELECT statement returns customer_name, order_date, and amount, in that order, your second SELECT statement needs to return the equivalent values in that same order, even if it labels them differently or pulls them from a completely different table.

Data types need to be compatible across that same positional matching, though “compatible” leaves some room — most databases will implicitly convert an integer to a decimal, or a smaller text type to a larger one, without complaint. Attempting to combine a text column with a date column in the same position, on the other hand, will usually raise an error or produce a result that’s silently wrong, depending on the database engine and how permissive its type conversion rules are.

None of this changes based on whether you use UNION or UNION ALL. It’s a shared prerequisite for both, and it’s usually the first error message people encounter before they’ve even gotten to the point of thinking about duplicates.

4. Sorting Behavior — A Subtle Difference in How ORDER BY Applies

Both keywords support a single ORDER BY clause, but it has to come at the very end of the entire combined statement, applying to the combined result set as a whole rather than to any one individual SELECT statement within it.

This part behaves identically for UNION and UNION ALL — the difference here is subtler and easy to miss. Because UNION typically performs an internal sort as part of its deduplication process, some database engines can, in certain cases, return UNION results in something resembling sorted order even without an explicit ORDER BY, purely as a side effect of how the engine chose to implement duplicate removal. UNION ALL has no such internal sorting step, so its output order is far more likely to simply reflect whatever order the underlying SELECT statements happened to produce.

Neither behavior should ever be relied on. Row order is only guaranteed when you write an explicit ORDER BY clause on the combined statement — treating any other ordering as intentional is a mistake that surfaces the moment you change database versions, tweak an index, or move the query to a different engine entirely.

5. Use Case Fit — Matching the Keyword to the Actual Question

The clearest way to choose between them is to ask what your combined data is supposed to represent.

UNION ALL fits naturally when you’re combining data that’s already known to be distinct by nature — say, appending this year’s transactions to last year’s archived transactions, where a row appearing in both source tables would itself indicate a data problem rather than something to quietly merge away. It also fits when performance matters more than deduplication, and when you’d rather see a duplicate in your output (as a visible signal something’s off upstream) than have it silently disappear.

UNION fits when duplicates are a real, expected possibility, and when the business question you’re answering only makes sense as a de-duplicated list. Combining “customers who placed an order this month” with “customers who submitted a support ticket this month” to get one list of customers who did either — that’s a natural fit for UNION, since the same customer plausibly appears in both source queries, and counting them twice in the merged list would misrepresent the actual number of customers involved.

Putting the Ranking Into Practice

Rank	Difference	UNION	UNION ALL
1	Duplicate rows	Removed	Kept
2	Performance	Slower (extra sort/hash step)	Faster (no dedup step)
3	Column structure	Must match exactly	Must match exactly
4	Row order	Not guaranteed without ORDER BY	Not guaranteed without ORDER BY
5	Best fit	Merging into a distinct list	Appending known-distinct data

Default to UNION ALL unless you can name the specific reason duplicates need to disappear. That single habit resolves most of the confusion between these two keywords, because it forces the real question to the surface before you write a single line of SQL: are duplicate rows in this particular result a problem to be fixed, or accurate information to be preserved?

Are you combining tables where duplicate rows would be a genuine data problem, or tables where a matching row across both sides is expected and meaningful? Tell me what the combined result is supposed to represent, and I can help you decide which keyword — and which column order — actually fits.