Question on Joins with Comparison Operators

Hey guys, 

I have been working my way through the tutorials and came to the page on joins with operators in the ON statement. Can somebody please explain in more detail what the difference would be in the results between the 2 queries on this page ( https://community.modeanalytics.com/sql/tutorial/sql-join-comparison-operators/ )?

I see it says they would  be different, but I am having a hard time understanding conceptually why. Thanks!

4replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Pete !  

    The first query on that page is using a LEFT JOIN that includes an AND statement:

    LEFT JOIN tutorial.crunchbase_investments_part1 investments
        ON companies.permalink = investments.company_permalink
       AND investments.funded_year > companies.founded_year + 5

    The  LEFT JOIN command tells the database to return all rows in the table in the FROM clause, regardless of whether or not they have matches in the table in the LEFT JOIN clause.

    The second query puts the check in the WHERE clause:

    WHERE investments.funded_year > companies.founded_year + 5

    By putting that statement in the WHERE clause, it will only return rows that meet that criteria.  This is why you get only 2,870 rows back in the WHERE clause query rather than the 27, 326 rows you get back when it's used in the LEFT JOIN clause.  Let me know if this helps explain the differences a bit better.  Thanks!

    Reply Like
  • Donna McGahan Thanks for the quick reply! I understand the WHERE clause, that is what I am used to. I guess I still don't understand what the AND clause in the ON clause of the first query actually does or when you would use it. 

    Reply Like 1
  • Probably the best way to explain the difference is to look at 

    COUNT(investments.investor_permalink) AS investors
    

    Since the first LEFT JOIN query returns all rows from the tutorial.crunchbase_companies table and only rows that match the 

    ON companies.permalink = investments.company_permalink AND investments.funded_year > companies.founded_year + 5

    from the investments table, you will get rows back that have 0 for 

    COUNT(investments.investor_permalink) AS investors

    In the other query that's using the WHERE clause, you won't get back rows with 0 for the investors column.

    Reply Like 1
    • Alex K
    • Alex_K.1
    • 4 mths ago
    • Reported - view

    the language on the tutorial page is misleading and difficult to understand.  the purpose of the join statement with the comparison operator is to limit the information in the 'investors' column to only those investments (presumably each by different investors) that occurred 5 years after the companies founding date.

    here, with the first example query you're including a ton of useless information:  all those companies with zero investments - by zero investors - that occurred 5 years later.  what is the point of including those additional ~24,500 rows? 

    further, the language indicates that you're limiting the data retrieved in the first query relative to the second: "Here’s an example using > to join only investments that occurred more than 5 years after each company’s founding year:"  >> you're including all companies that don't match that criteria.

    It's just like if you were to query for all of the companies in the table:  all 27,326 of them.  not a great page in the tutorial, but I'm glad there's a forum to see that at least 1 other person agrees.

    Reply Like
Like1 Follow
  • Status Answered
  • 1 Likes
  • 4 mths agoLast active
  • 4Replies
  • 565Views
  • 4 Following