SQL Distinct vs. Group By

Sonalee Bhattacharyya
1 min readSep 30, 2022
Photo by Crissy Jarvis on Unsplash

In SQL, if you write the following two queries:

  1. SELECT DISTINCT genre FROM films
  2. SELECT genre FROM films GROUP BY genre

you get the same results.

This is because the GROUP BY clause returns the full list of genres, grouped by genre. So we only see distinct genres listed.

Similarly, DISTINCT only returns distinct genres.

However, we use Group By when we want to perform some kind of aggregation on the data. For example, average (AVG), sum (SUM), count (COUNT), etc.

Here is a full list of aggregate functions in SQL.

It is also important to remember that when you use GROUP BY, you must include any field that you have referenced in the select statement which does not have an aggregation function applied to it. However you can use fields in group by which are not selected.

For example,

SELECT release_date

FROM films

GROUP BY genre

will return an error, because name and release_date are not aggregated.

However,

SELECT MIN(release_date)

FROM films

GROUP BY genre

will return the earliest date a film was released in a particular genre.

Happy SQL-ing!!!

--

--

Sonalee Bhattacharyya

Mathematics lecturer transitioning to a career in data analysis