The ABCD of Data Driven Insight

AI and BI need CD (Connected Data)

David Ruthven
4 min readAug 15, 2024
Connecting Data with Snowflake

Introduction

Sun Microsystems used to say ‘The Network is the Computer’ we could now paraphrase that in Snowflake to say ‘Snowflake’s connected data cloud is the platform for Insight’, okay maybe a bit more convoluted but the implications are significant. All Snowflake accounts can be connected together via data and code sharing which forms a private subnet of connected capability. Snowflake data and code sharing is the connective tissue for customer’s to build their own enterprise connected data insight platform.

Apache Iceberg Tables extends Snowflake’s Data Sharing

Snowflake’s data sharing has always enabled simple, secure, cross cloud data sharing. With the adoption of the Apache Iceberg open table format by a number of leading query engines and Snowflake’s own commitment to have feature parity of Iceberg and Native Snowflake tables, it is now possible to extend Snowflake data sharing to share data between different query engines.

Analytics Queries Require a single Query Engine

Analytics queries can only be executed by a single query engine. Even in a federated environment where data might reside under the control of different query engines the federated query engine performs the joins, aggregations, windowing and sorting, only the fetch and filter operations can be executed by other query engines. In essence all data necessary to support a query must be accessible to a single query engine either directly via ingest, externally via files or tables (Iceberg, Delta) or decoupled via 3rd party query engines.

Not all the data I need is in Snowflake

Very little data used in Snowflake is born in Snowflake, as with all data warehouses, operational data stores and data lakes the source data is generated in transactional and event recording systems and gathered in one place for cleansing, transformation and analysis. Large organisations usually have many warehouses and lakes operating on different data sets and different platforms. To connect all your shareable data to Snowflake, customers can simply share data with their Snowflake accounts via Iceberg or synchronise it with a replication tool.

Simplified Data Sharing

Snowflake Simplifies Existing Data Sharing

A major reason why you might want to leverage Snowflake’s data sharing is because you can then distribute and share any, and all shareable data to whichever destination, whether being consumed by Snowflake or non-Snowflake query engines. This can provide a more consistent, secure, easier to use, easier to control and more economical approach for data sharing by comparison to traditional methods.

A side effect of this approach to sharing data via Snowflake is you then have a curated copy of all shared datasets. This opens up the opportunity for net new AI and BI insights and use cases without disrupting any existing data pipelines or analytics.

Snowflake Platform Evolution

Snowflake has evolved from a platform that supported data warehouse and data lake workloads almost exclusively through SQL based tooling to a platform that now supports virtually any tooling with programming interfaces for Python, Java, Scala which can all leverage Snowflake’s elastic compute and unified data governance. Other languages and development environments can be supported with Snowflake’s Container Services.

Unleash Scalable Innovation

Applications developed on the Snowflake platform immediately benefit from instant access to local and shared data, on demand elastic compute, unified governance and crucially the ability to package an application, make it shareable and enable any other authorised Snowflake user to find, deploy and run the application in their own Snowflake account. There is no need to learn new programming languages you just have to be prepared for extreme convenience. Snowflake’s integrated development and unlimited scale deployment platform allows users to experiment, build, test, deploy and maintain AI and BI applications built using familiar programming languages and connected data. This is how Snowflake supports Scalable Innovation.

Summary

Almost every customer and prospect we work with has data stored and maintained in many different systems. From the perspective of generating new insights it is the shareable data trapped in siloes that is the problem. Synchronising or sharing data with Snowflake directly or via Iceberg tables allows Snowflake to open up new AI and BI use cases which rely on connected data. These new use cases not only benefit from the data they require being instantly accessible but also the unified governance, vitually unlimited scalablity, choice of programming languages and enormous ecosystem of partner solutions that can operate with Snowflake.

--

--

David Ruthven

Snowflake Sales Engineer — opinions expressed are solely my own and do not express the views or opinions of my employer.