How to Use SQL Aggregate Functions Correctly: SUM, COUNT, AVG, MAX, MIN Explained

This tutorial brings together SUM, COUNT, AVG, MAX, and MIN — the five aggregate functions covered piece by piece across earlier tutorials in this series — into one single, complete reference, with particular attention to the specific behaviors that catch beginners off guard, especially around NULL handling and the interaction between these functions and GROUP BY.

If you have read the GROUP BY tutorial already, much of the foundational concept here will feel familiar, since aggregate functions and GROUP BY are deeply connected concepts. This tutorial focuses specifically on each individual function’s particular behavior and edge cases, rather than repeating the broader GROUP BY mental model already covered there in detail.

What Makes a Function “Aggregate”

An aggregate function takes many individual row values and condenses them down into one single summary value. This is fundamentally different from how most other SQL expressions work, which typically operate on and return one value per individual row, rather than one combined value representing many rows together.

Aggregate functions can be used in two distinct contexts: without any GROUP BY at all, producing one single overall summary value across your entire result set, or combined with GROUP BY, producing one separate summary value for each individual group your data has been sorted into, as covered in detail in the GROUP BY tutorial.

SUM: Adding Values Together

SUM adds together every value in a specified column, across whichever set of rows it is being applied to (either your entire result, or each individual group if combined with GROUP BY).

SUM specifically ignores NULL values rather than letting them disrupt the entire calculation. If a column contains a mix of actual numbers and some NULL values, SUM calculates the total of just the actual non-NULL numbers present, simply treating any NULL values as if those specific rows were not part of the calculation at all, rather than the entire SUM result itself becoming NULL just because some individual values happened to be NULL.

A specific edge case worth knowing: if every single value being summed happens to be NULL (or if there are genuinely zero rows to sum at all), SUM’s result becomes NULL, not zero. This specific distinction — NULL representing “nothing was summed because there was nothing to sum” versus zero representing “we summed actual values and the genuine total happened to be zero” — occasionally matters for how you interpret or display that particular result, depending on your specific business context.

COUNT: Counting Rows or Non-NULL Values

COUNT has two genuinely distinct behaviors depending on exactly how you use it, which is a common point of confusion already mentioned briefly in the NULL tutorial but worth restating clearly here as part of this complete aggregate function reference.

COUNT applied to an asterisk counts every single row, regardless of whether any specific column within those rows might contain NULL values. This version is purely counting the existence of rows themselves.

COUNT applied to one specific named column only counts rows where that particular column is NOT NULL, specifically excluding any rows where that column has no value recorded. This means COUNT of asterisk and COUNT of a specific named column can produce genuinely different results from the exact same set of rows, whenever that specific column contains any NULL values at all.

COUNT DISTINCT, adding the word DISTINCT directly before your column name within the COUNT function, counts only unique distinct values within that column, rather than counting every single occurrence including repeated duplicate values. This is genuinely useful for answering questions like “how many unique customers placed an order” from an orders table where a single customer might appear across many separate individual order rows, where a plain COUNT would count every individual order row, while COUNT DISTINCT on the customer ID column would instead count each unique customer exactly once, regardless of how many separate orders that same specific customer happened to place.

AVG: Calculating the Average

AVG calculates the average (specifically the arithmetic mean) of a specified column’s values, across whichever set of rows it is being applied to.

AVG also specifically ignores NULL values, similar to SUM’s behavior, but with one detail worth understanding clearly: AVG’s calculation is the sum of non-NULL values divided specifically by the count of non-NULL values, not divided by the total count of every row including NULL ones. This means if a column has five total rows, but two of them are NULL, AVG calculates the average across just the three actual non-NULL values present, dividing their sum by three, rather than dividing that same sum by five (the total row count including the NULL ones). This distinction can produce a meaningfully different average than you might expect if you were assuming NULL values would effectively count as zero within that average calculation, which is a different (and generally incorrect, for most genuine business interpretations of “average”) assumption than how AVG actually behaves.

If you specifically need NULL values to be treated as zero for averaging purposes, rather than being excluded entirely from both the sum and the count, you would need to explicitly convert those NULLs into zero first (using COALESCE, covered in the NULL tutorial) before applying AVG, rather than relying on AVG’s own default NULL-excluding behavior, which is a meaningfully different calculation that may or may not actually match your specific intended business question.

MAX and MIN: Finding Extremes

MAX finds the largest value within a specified column, across whichever set of rows it is being applied to. MIN finds the smallest value. Both specifically ignore NULL values when determining that maximum or minimum, the same general NULL-ignoring pattern as SUM, COUNT (of a specific column), and AVG.

MAX and MIN are not limited to purely numeric columns — they also work meaningfully on date columns (finding the earliest or latest date) and even on text columns (finding the alphabetically first or last value), though their most common, intuitive use in typical business reporting tends to involve numeric or date-based columns specifically.

Combining Multiple Aggregate Functions in One Query

You are not limited to using just one single aggregate function per query. Listing several aggregate function expressions together within the same SELECT statement, each calculating something different from the same underlying set of rows, is entirely valid and genuinely common in real business reporting.

For example, a single query might calculate SUM of sale amount, COUNT of asterisk, and AVG of sale amount, all together within the same SELECT statement, FROM the same sales table, optionally combined with the same GROUP BY clause if you want these calculations broken down per group rather than as one single overall combined result.

The GROUP BY Connection, Restated Briefly

As covered in full detail in the dedicated GROUP BY tutorial, any column appearing in your SELECT statement that is not itself wrapped within an aggregate function must also appear in your GROUP BY clause, since the database needs a single, consistent value to display for that column within each individual group, rather than potentially many different conflicting values existing within that same group with no clear single answer to actually display.

Without any GROUP BY clause at all, aggregate functions calculate across your entire result set as a single implicit group, producing exactly one single overall summary row, rather than the function failing or requiring some kind of explicit grouping syntax to be present at all times.

A Common Mistake: Using WHERE to Filter Based on an Aggregate Result

As covered in the GROUP BY tutorial’s discussion of HAVING, attempting to use a WHERE clause to filter based on the result of an aggregate function (for example, attempting WHERE on a SUM total directly) will produce an error, since that calculated aggregate value genuinely does not exist yet at the specific point WHERE is being evaluated in SQL’s logical execution sequence. HAVING exists specifically to handle this exact situation, filtering based on values that only come into existence after the grouping and aggregation calculation has already completed, rather than WHERE’s role of filtering individual raw rows before any aggregation has happened at all.

A Quick Reference Summary

Function	What It Calculates	NULL Behavior
SUM	Total of all values	Ignores NULLs; result is NULL only if every value is NULL
COUNT(*)	Number of rows	Counts every row regardless of NULL content
COUNT(column)	Number of non-NULL values in that column	Specifically excludes NULL values from the count
COUNT(DISTINCT column)	Number of unique distinct values	Excludes NULLs and counts each distinct value once
AVG	Arithmetic mean	Divides by count of non-NULL values, not total row count
MAX	Largest value	Ignores NULLs when determining the maximum
MIN	Smallest value	Ignores NULLs when determining the minimum

Why This Foundational Set of Functions Matters So Much

SUM, COUNT, AVG, MAX, and MIN are the five functions underlying the overwhelming majority of genuine business reporting questions: total revenue, number of customers, average order value, highest single sale, earliest signup date, and countless similar variations across virtually every industry and business context imaginable. Genuinely understanding their specific individual behaviors, particularly around NULL handling and how each interacts correctly with GROUP BY and HAVING, is foundational to writing SQL that produces correct, trustworthy results, rather than results that merely look plausible at first glance while actually containing a subtle calculation error rooted in one of these specific behaviors covered throughout this tutorial.

This concludes the foundational series covered across these fifteen tutorials, moving from understanding what a basic SELECT statement does, through JOINs, GROUP BY, subqueries, and finally these aggregate functions alongside window functions and performance optimization. Each concept builds directly on the ones covered before it, which is precisely why working through them roughly in this same sequence, rather than jumping directly to advanced topics first, tends to produce the most solid, genuinely durable understanding of SQL as a coherent whole, rather than a collection of memorized, disconnected syntax patterns.

Which specific aggregate function calculation is not producing the result you expected? Describe your query and the columns involved, and I can help identify whether NULL handling, GROUP BY structure, or something else is the actual cause.