Snowflake Query Optimiser Eliminating Redundant Joins

David Ruthven
3 min readDec 21, 2022

--

New Feature Dec-2022: Join Elimination

Recently Snowflake introduced an optimisation which leverages Primary and Foreign keys. In essence it is possible for the query optimiser to decide not to perform a join if none of the data from the joined table is required to satisfy the query.

You could argue that this mainly applies to sloppy coded SQL, but it can also apply to scenarios where a star schema is flattened via a database view which is often the case when supporting dashboard tools which like to reference denormalised datasets.

As an example I have used the Oracle Sales History (SH) sample schema, diagram below.

Image courtesy of Oracle Corporation.

As you can see this is a star schema with the SALES fact table enriched by several dimension tables with the CUSTOMERS dimension table being further enriched by the COUNTRIES table.

I created a database view to stitch (de-normalise) all these tables together. Notice this view is using another recent Snowflake feature to EXCLUDE columns when performing SELECT *. This view presents all 109 columns from all 7 tables combined.

CREATE OR REPLACE VIEW sh_all_vw
AS
SELECT
s.* exclude (channel_id, cust_id, prod_id, promo_id, time_id), ch.*, co.* exclude (country_id), cu.*, pd.*, pr.*, t.*
FROM
channels ch
, countries co
, customers cu
, products pd
, promotions pr
, sales s
, times t
WHERE
s.time_id = t.time_id
AND s.prod_id = pd.prod_id
AND s.promo_id = pr.promo_id
AND s.cust_id = cu.cust_id AND cu.country_id = co.country_id
AND s.channel_id = ch.channel_id
;

Query without Constraints

If I do not define Primary and Foreign keys and run a simple aggregation query which only requires the SALES and TIMES tables all the tables are joined regardless.

SELECT
/* Number of rows in Sales by Year */
calendar_year
, count(1)
FROM
sh_all_vw
GROUP BY
calendar_year
ORDER BY
calendar_year desc;

The execution plan looks like this. You can see that all tables referenced in the view above are joined.

Query with Constraints

Now we add all the Primary and Foreign key constraints as below.

// Add PRIMARY KEYS
ALTER TABLE times ADD CONSTRAINT "TIMES_PK" PRIMARY KEY ("TIME_ID") RELY;
ALTER TABLE products ADD CONSTRAINT "PRODUCTS_PK" PRIMARY KEY ("PROD_ID") RELY;
ALTER TABLE customers ADD CONSTRAINT "CUSTOMERS_PK" PRIMARY KEY ("CUST_ID") RELY;
ALTER TABLE countries ADD CONSTRAINT "COUNTRIES_PK" PRIMARY KEY ("COUNTRY_ID") RELY;

// Add UNIQUE KEYS
ALTER TABLE sales ADD CONSTRAINT "SALES_UK" UNIQUE ("PROD_ID", "CUST_ID", "PROMO_ID", "CHANNEL_ID", "TIME_ID") RELY;

// Add FOREIGN KEYS
ALTER TABLE sales ADD CONSTRAINT "SALES_CHANNEL_FK" FOREIGN KEY (channel_id) REFERENCES channels (channel_id) RELY;
ALTER TABLE sales ADD CONSTRAINT "SALES_CUSTOMERS_FK" FOREIGN KEY ("CUST_ID") REFERENCES "CUSTOMERS" ("CUST_ID") RELY;
ALTER TABLE sales ADD CONSTRAINT "SALES_PRODUCTS_FK" FOREIGN KEY ("PROD_ID") REFERENCES "PRODUCTS" ("PROD_ID") RELY;
ALTER TABLE sales ADD CONSTRAINT "SALES_PROMOTIONS_FK" FOREIGN KEY ("PROMO_ID") REFERENCES "PROMOTIONS" ("PROMO_ID") RELY;
ALTER TABLE sales ADD CONSTRAINT "SALES_TIME_FK" FOREIGN KEY ("TIME_ID") REFERENCES "TIMES" ("TIME_ID") RELY;

ALTER TABLE customers ADD CONSTRAINT "CUST_COUNTRIES_FK" FOREIGN KEY ("COUNTRY_ID") REFERENCES "COUNTRIES" ("COUNTRY_ID") RELY;

Note we are using the RELY option to indicate that the data in these tables complies with the constraints.

Running the same aggregation query as before we get the following much simpler plan which recognises we only need the SALES and TIMES tables to resolve this query.

In Summary

Above we have seen two recent Snowflake enhancements, the EXCLUDE clause and redundant join elimination leveraging UNIQUE, PRIMARY KEY and FOREIGN KEY constraints.

The data volumes in my example are too small to note any substantial performance difference but my recommendation is to review your schemas and apply and validate your Primary and Foreign key constraints as there is only upside.

Snowflake Documentation References

Understanding How Snowflake Can Eliminate Redundant Joins

SELECT with EXCLUDE

--

--

David Ruthven

Snowflake Sales Engineer — opinions expressed are solely my own and do not express the views or opinions of my employer.