Practice problem in SQL tutorial - subquerry

Hi all,

Regarding the following practice problem:

Practice Problem

Write a query that displays the average number of monthly incidents for each category. Hint: use tutorial.sf_crime_incidents_cleandate to make your life a little easier.

The given answer is:

SELECT sub.category,
       AVG(sub.incidents) AS avg_incidents_per_month
  FROM (
        SELECT EXTRACT('month' FROM cleaned_date) AS month,
               category,
               COUNT(1) AS incidents
          FROM tutorial.sf_crime_incidents_cleandate
         GROUP BY 1,2
       ) sub
 GROUP BY 1

Which I don't think would give the correct answer if there are overlapping months in the data - say January 2013 and January 2014 as all incidents happened in January would be grouped together regardless of the year.

I think the DATE_TRUNC would be the right one here as it keeps the year. What do you think?

2replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Tung,

    Thank you for reaching out! Since the subquery was counting the number of months, either "EXTRACT" or "DATE_TRUNC" would be OK. If you have a moment, you can try to run the query using "DATE_TRUNK" like so: you can compare the results, they should be the same. 

     

    SELECT sub.category,
           AVG(sub.incidents) AS avg_incidents_per_month
      FROM (
            SELECT date_trunc('month', cleaned_date) AS month,
                   category,
                   COUNT(1) AS incidents
              FROM tutorial.sf_crime_incidents_cleandate
             GROUP BY 1,2
           ) sub
     GROUP BY 1

     

    Please let me know if that helps! If you are still having issues, please feel free to open a ticket with our success team via in-product chat or you can send an email directly to success@modeanalytics.com. This will allow us to look at your metadata and ask you additional questions to further assist this matter, thanks!

    Like
  • Hi Jinyan

     

    Thanks very much for the response.

     

    However, I don't think EXTRACT is correct at all when the dataset is longer (timewise). Consider the case I mentioned, suppose there are January 2013 and January 2014 data points in the dataset. The given answer will combine all incidents of January 2013 and January 2014 together as if it is only one month.

     

    So to use the EXTRACT correctly, one needs also EXTRACT year and group by month and year in order to count the monthly incidents correctly.

     

    DATE_TRUNC, on the other hand, retains the year and therefore doesn't need GROUP BY year like EXTRACT.

     

    The OUTPUTs are the same when you run EXTRACT and DATE_TRUNC with the given dataset is because it only runs from November 2013 to February 2014 so there is no overlapping months with that dataset. In general cases, it would not work properly.

     

    What do you think?

    Like
Like Follow
  • Status Answered
  • 5 mths agoLast active
  • 2Replies
  • 59Views
  • 2 Following