SQL Self Joins Tutorial problem and solution

The code example in the tutorial returns 16 distinct company names. However, an audit of the results reveals a problem with the funded_at dates.

SELECT DISTINCT japan_investments.company_name,
       japan_investments.funded_at as jpfund,
       gb_investments.funded_at as gbfund
  FROM tutorial.crunchbase_investments_part1 japan_investments
  JOIN tutorial.crunchbase_investments_part1 gb_investments
    ON japan_investments.company_name = gb_investments.company_name
   AND gb_investments.investor_country_code = 'GBR'
   AND gb_investments.funded_at > japan_investments.funded_at
 WHERE japan_investments.investor_country_code = 'JPN'
 ORDER BY 1

The results include companies with GB dates earlier than JPN dates. Changing to ORDER BY 2 in the query above reveals the dates are out of sequence: they are being seen as strings instead of dates.

Fortunately, your very next tutorial at the start of the advanced section concerns changing data types within queries. Thank you! So I applied what I learned there to the query above and got valid results.

SELECT DISTINCT japan_investments.company_name,
       japan_investments.funded_at as jpfund,
       gb_investments.funded_at as gbfund
  FROM tutorial.crunchbase_investments_part1 japan_investments
  JOIN tutorial.crunchbase_investments_part1 gb_investments
    ON japan_investments.company_name = gb_investments.company_name
   AND gb_investments.investor_country_code = 'GBR'
   AND cast(gb_investments.funded_at as date) > cast(japan_investments.funded_at as date)
 WHERE japan_investments.investor_country_code = 'JPN'
 ORDER BY 1

I then added this logic to the original tutorial example query and discovered there are 7 companies that meet the criteria, not 16.

SELECT DISTINCT japan_investments.company_name,
       japan_investments.company_permalink
  FROM tutorial.crunchbase_investments_part1 japan_investments
  JOIN tutorial.crunchbase_investments_part1 gb_investments
    ON japan_investments.company_name = gb_investments.company_name
   AND gb_investments.investor_country_code = 'GBR'
   AND cast(gb_investments.funded_at as date) > cast(japan_investments.funded_at as date)
 WHERE japan_investments.investor_country_code = 'JPN'
 ORDER BY 1

The best solution would be to correct the database so dates are dates, but at least I learned a workaround. Solving this problem turned out to be an excellent  lesson.

1reply Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hey Philip Parodayco glad to hear you learned a workaround and that your getting value out of our SQL school! Please let us know if you have any further feedback at all!!

    Reply Like
Like Follow
  • 4 wk agoLast active
  • 1Replies
  • 241Views
  • 2 Following