Manage Cloud Costs with Snowflake

David Ruthven
6 min readDec 6, 2023

--

All cloud based solutions are usually accused at some point of being expensive. This is because they generally make it easier to consume increasing volumes of storage and compute resources, certainly by comparison to fixed capacity, low average utilisation on-premise systems. It’s a bit like taking delivery of a fast car and complaining shortly afterwards about getting a speeding ticket.

There is no escaping the fact that running on flexible cloud infrastructure requires customers to implement cost management. Only customers can decide which workloads they should invest in and how much they are prepared to spend. Snowflake has the finest grained and most transparent billing mechanism of any data platform and has easy to use methods for controlling and budgeting spend. The more use cases you deploy on Snowflake the lower your spend will be, due to Snowflake’s efficiency, and the more control you will have of your overall spend due to all your data and workloads running on your own data cloud.

Snowflake Operational Savings

One of the core design tenets of Snowflake is to abstract complexity to make access to and analysis of data as easy as possible for as large an audience as possible, regardless of audience size, analysis methods or data volumes or types. We do this by eliminating a lot of the operational burden typically required to manage environments. This operational burden exists both on premise and on cloud. The systems, storage, and network administrators have become cloud engineers and database administrators are still required for most cloud data warehouses.

Snowflake Compute Costs

Although Snowflake can scale up and scale out on-demand near instantaneously, the ability to do this is controlled through role-based access control privileges. The ability to scale up/down, out/in near instantaneously is unique to Snowflake, it is only possible because we are a managed service operating a pool of compute resource for assigning to user’s demand. This is fundamental to our ability to provide per second billing, another unique cost differentiator.

Snowflake License Costs

Snowflake is a SaaS service, the compute credit costs include all of the following:

  • Compute
  • All the features for the given edition
  • Centralised security
  • Centralised cost management
  • High Availability (within deployment regions)
  • Egress Charges (for Snowflake client drivers)
  • Managed Service (trickle upgrades, auto provisioning)
  • Premier Support

There are no extra costs for any existing or new features. Many new features are aimed at simplifying complexity and lowering costs, for example Dynamic Tables which allow you to declare transformation pipelines with incremental refresh using change tracking and a cascading refresh mechanism.

Some other platforms incur egress charges when connecting from remotely hosted BI tools, these costs are included with any client tool that uses a Snowflake client driver.

No more massive complex, risky and costly upgrades, an undertaking so demanding that many customers faced with upgrades pay extended support to put off the evil day. Talking of Support, Snowflake credits includes our award-winning Snowflake Premium Support.

The only other cost is storage costs that due to our highly efficient compression, which we periodically improve, costs less per TB than your CSP. If you have enormous volumes of data and have negotiated a particularly low price/TB with your CSP you can BYOS and use Snowflake to manage or access that data in Iceberg format, no need to ingest.

Snowflake Cost Avoidance

Many Snowflake features reduce costs almost as a side effect of making the platform faster and easier to use.

  • Zero cost Ingest (data sharing)
  • Zero cost Compute (DDL, results cache)
  • Zero cost Storage (cloning)

Data sharing avoids ingress charges for all subscribers. Any data ingested anywhere in the Snowflake network can be shared with any other Snowflake account with authorised permissions. Some of our large customers have hundreds or even thousands of shared data sets.

There is no compute cost attached to DDL or meta data queries (select count(*) …), these are handled by the global services layer. More spectacularly, results of all queries are retained for 24 hours and subsequent identical queries where the underlying data has not changed and the user has the same role will be returned instantly from the results cache at no cost.

Snowflake Cloning can be performed at table, schema and database level, no cost to copy nor storage costs aside from added or modified data private to the clone. Aside from cost and time savings this lowers the barrier to experimentation and testing.

Centralised Data Access and Cost Governance

All data managed by Snowflake regardless of how it is spread across different regions and cloud vendors is accessible to any Snowflake user through data sharing. This eliminates siloes but also means that all data is tightly governed through centrally administered policies. Not only is data, users and roles centrally governed but also compute resources and corresponding spend are centrally governed.

Cost Management Priorities

From my experience these are the set of tools and actions to implement incredibly effective centralised cost management.

  • Resource Monitors
  • Job/Query Execution time limits
  • Warehouse Utilisation (concurrency, idle time)
  • Warehouse Sizing (scale out, query acceleration)
  • SQL tuning

Resource Monitors allow you to set a compute credit budget at the account, warehouse (compute cluster) or group of warehouses level. This a simple and effective way to guarantee you don’t spend more than anticipated and if there is more demand than the budget allows a justification can be discussed with your relevant users. Apparently “It is easier to beg forgiveness than to seek permission”, not when it comes to cost management!

You can limit the execution time for a SQL job at the account, compute cluster, user or session level. The default is a rather generous 2 days, so worth trimming to something more appropriate. Again, if a job needs more than the timeout period, this is a good opportunity to validate if the job is necessary, optimised and using the correct compute cluster.

We have a new metric being released to easily show warehouse utilisation, although there are published queries which can show this today. Snowflake charges when a warehouse (compute cluster) is running, regardless if there are any active queries, although it will auto suspend by default when there are no active queries after a user configured time. You can think of a compute cluster like a bus, it costs the same whether there are no passengers or when full, so keep it busy with active queries.

One of the most common errors I see is not enabling multi-cluster. Customers often increase a single cluster size rather than enable multi-cluster. A multi-cluster can scale out to absorb increasing demand, implicitly this means the cluster is fully utilised so needs to add more capacity to avoid jobs queuing. It never kicks in if not needed so is a zero-cost option. The clusters scale back down again when not required. Obviously queries will have different resource requirements so to mitigate queries from hogging a cluster you can enable query acceleration which uses serverless compute to remove all the fetch and filter grunt work out of heavier queries. Again this only kicks in if needed and you can even check if a given query would be benefit beforehand, SYSTEM$ESTIMATE_QUERY_ACCELERATION().

The biggest culprit for excess compute usage is inefficient queries. The Snowflake optimiser does its best to mitigate these and produce an optimal plan, but this is not always possible. Typical evil queries scan too much data, often due to where clause not aligned with a cluster key; using the wrong join order, you need to have the smaller table on the build (left) side of the join; spilling to local and remote disk because the compute cluster has run out of memory and join explosion, more result rows than the largest side of the join. All these can be viewed in Snowflake’s graphical Query Profile or queried programmatically using GET_QUERY_OPERATOR_STATS().

Summary

As mentioned at the start of this article Snowflake has the finest grained and most transparent billing mechanism of any data platform and has easy to use methods for controlling and budgeting spend. I hope some of the above examples illustrate these points.

And …

It doesn’t end there, there are all manner of new capabilities being released to make cost management and optimisation even easier, all at no extra cost!

--

--

David Ruthven

Snowflake Sales Engineer — opinions expressed are solely my own and do not express the views or opinions of my employer.