SQL Distinct vs. Group By
In SQL, if you write the following two queries:
- SELECT DISTINCT genre FROM films
- SELECT genre FROM films GROUP BY genre
you get the same results.
This is because the GROUP BY clause returns the full list of genres, grouped by genre. So we only see distinct genres listed.
Similarly, DISTINCT only returns distinct genres.
However, we use Group By when we want to perform some kind of aggregation on the data. For example, average (AVG), sum (SUM), count (COUNT), etc.
Here is a full list of aggregate functions in SQL.
It is also important to remember that when you use GROUP BY, you must include any field that you have referenced in the select statement which does not have an aggregation function applied to it. However you can use fields in group by which are not selected.
For example,
SELECT release_date
FROM films
GROUP BY genre
will return an error, because name and release_date are not aggregated.
However,
SELECT MIN(release_date)
FROM films
GROUP BY genre
will return the earliest date a film was released in a particular genre.
Happy SQL-ing!!!