Practice problem in SQL tutorial - subquerry
Regarding the following practice problem:
Write a query that displays the average number of monthly incidents for each category. Hint: use
tutorial.sf_crime_incidents_cleandate to make your life a little easier.
The given answer is:
SELECT sub.category, AVG(sub.incidents) AS avg_incidents_per_month FROM ( SELECT EXTRACT('month' FROM cleaned_date) AS month, category, COUNT(1) AS incidents FROM tutorial.sf_crime_incidents_cleandate GROUP BY 1,2 ) sub GROUP BY 1
Which I don't think would give the correct answer if there are overlapping months in the data - say January 2013 and January 2014 as all incidents happened in January would be grouped together regardless of the year.
I think the DATE_TRUNC would be the right one here as it keeps the year. What do you think?
Thank you for reaching out! Since the subquery was counting the number of months, either "EXTRACT" or "DATE_TRUNC" would be OK. If you have a moment, you can try to run the query using "DATE_TRUNK" like so: you can compare the results, they should be the same.
AVG(sub.incidents) AS avg_incidents_per_month
SELECT date_trunc('month', cleaned_date) AS month,
COUNT(1) AS incidents
GROUP BY 1,2
GROUP BY 1
Please let me know if that helps! If you are still having issues, please feel free to open a ticket with our success team via in-product chat or you can send an email directly to firstname.lastname@example.org. This will allow us to look at your metadata and ask you additional questions to further assist this matter, thanks!
Thanks very much for the response.
However, I don't think EXTRACT is correct at all when the dataset is longer (timewise). Consider the case I mentioned, suppose there are January 2013 and January 2014 data points in the dataset. The given answer will combine all incidents of January 2013 and January 2014 together as if it is only one month.
So to use the EXTRACT correctly, one needs also EXTRACT year and group by month and year in order to count the monthly incidents correctly.
DATE_TRUNC, on the other hand, retains the year and therefore doesn't need GROUP BY year like EXTRACT.
The OUTPUTs are the same when you run EXTRACT and DATE_TRUNC with the given dataset is because it only runs from November 2013 to February 2014 so there is no overlapping months with that dataset. In general cases, it would not work properly.
What do you think?