Oracle Data Warehouse to Snowflake: Incremental Migration with Gluent

David Ruthven
7 min readJan 13, 2022

--

Split transformation phases between Oracle and Snowflake whilst maintaining seamless client access

Migrating from an Oracle Data Warehouse is a long-term project. Whether the end goal is to fully migrate, reduce the strain on your Oracle Data Warehouse to avoid expanding your Oracle footprint or to leverage the best of Oracle and Snowflake combined with a Hybrid Transactional and Analytics Processing (HTAP) solution, Gluent Data Platform (GDP) can play a key role in any or all of these scenarios.

In this article I will focus on how to use Gluent Data Platform to fully migrate to Snowflake efficiently. The approach outlined below will make your migration much faster and more systematically progressive, allowing you to move workloads to Snowflake more quickly and integrate ingest and transformation steps between Oracle and Snowflake reducing risk and accelerating switch over.

Extract, Load, Transform, Query (ELTQ)

The full end-to-end process for supporting analytics is ELTQ.

  1. Extract process replicates (pulls) data from transactional systems
  2. Load process loads extracted or streamed data into your Data Warehouse
  3. Transform validates, deduplicates, formats, enriches and re-shapes the data to ensure efficient reporting and analysis in the …
  4. Query phase supports concurrent batch and interactive analysis

From a migration point of view, the Extract process should remain intact as that is agnostic to the target data warehouse, its mostly concerned with figuring out changes on the source databases. The Query phase may be intact depending on how database agnostic your query, dashboard and data science tools are. The two phases which will require most of the migration effort are Load and Transform.

I estimate that around 80% (isn’t it always 80%) of processing in a Data Warehouse is in loading and transforming data for analysis. That is to be expected since ultimately the value of our data warehouse is measured by the ability to deliver low latency queries across multiple high volume data sets for multiple concurrent analytics use cases.

One of the primary obstacles to migration is cost and risk. It tends to be an all or nothing approach. Migrating data is relatively trivial and there are many solutions that can copy and synchronise data between different storage engines. Migrating proprietary application Load, Transform and Query (LTQ) code is the biggest challenge and traditionally this must be fully migrated before you can switch over to your target data warehouse. With Gluent Data Platform you can leverage Snowflake right from the start of the migration process whilst working seamlessly with Oracle.

The high level steps would be

  1. Synchronise your presentation layer data to Snowflake using Gluent Offload (batch and incremental)
  2. Connect your (semi) agnostic dashboards (including OBIEE/OAS), reporting and DS tools to Snowflake
  3. Develop all enhanced and new reports, dashboards, data science models against Snowflake (these can instantly start leveraging Data Marketplace)
  4. Migrate ingest and transformation logic, piecemeal using Gluent Offload, Present and Transparent Query to integrate the phases

Using the above approach Oracle initially continues to handle load and transform steps producing a presentation layer but Snowflake supports all your end user analytics. Any query tools that cannot easily switch to Snowflake initially can query the presentation layer in Oracle. All others including perhaps surprisingly OBIEE and OAS can connect to Snowflake. Data Science programming languages which use data frames can be switched quickly by leveraging Snowpark, those that use SQL might require some SQL translation. Third party tools tend to be query engine agnostic and can be switched more easily.

The next phase is to migrate Load and Transform components, but not necessarily in that order.

Why are transformations so significant?

The transformation phase validates, reformats, enriches and re-shapes data assimilated from many transactional and streaming source systems to ensure efficient reporting and analysis. Transformation usually involves separate steps and invariably accounts for the majority of the workload.

Here is an example of a multi-stage data transformation flow from Oracle’s own Data Warehousing Guide

During these steps all manner of processing takes place, for example:

  1. Validate data types and formats
  2. Standardise data values, e.g., enforce referential integrity
  3. Correct spellings
  4. Semantic validation to confirm validity of data in context
  5. Normalise the data, for more efficient maintenance
  6. Obfuscate sensitive data
  7. Enrich — add columns to improve filtering performance
  8. Enrich — add relevant attributes such as latitude/longitude
  9. Compute frequently required aggregations
  10. De-normalise for presenting to dashboard and reporting tools

The final result is a presentation layer which can be efficiently queried by analytics client tools.

Incremental Data Transformation Migration Recipe

When it comes to migrating the data pipeline/transformation code, we can leverage Gluent Data Platform’s (GDP) Offload and Present capabilities. GDP Offload synchronises data from Oracle to Snowflake and GDP Present makes any table or view in Snowflake queryable in Oracle without the need for replication. GDP Offload is a built-in Extract and Load tool whilst the GDP Present capability is like a sophisticated version of an Oracle Transparent Gateway.

As mentioned above data warehouse transformations are performed in steps which typically materialise the results into temporary staging tables, these are frequently implemented as Create Table (CTAS) and Insert as Select (IAS) steps. With Gluent, data already hosted in Oracle that needs to be transformed can be offloaded/replicated to Snowflake, Snowflake can then perform the transformation and materialise the results into a local staging table or even a view and these results can be made accessible to Oracle using GDP Present with no physical replication.

Here is an example workflow.

In this example, Snowflake is used to manage the initial ingest and the first transformation. The transformed data hosted in Snowflake as either a table or view, can then be made visible to Oracle via GDP Present. The second transformation is performed in Oracle and shared with Snowflake using GDP offload. The third transformation shares results with Oracle using GDP Present again. For the last transformation, Oracle is taking care of generating the final Presentation layer which can be accessed by Oracle client tools directly and made accessible to client tools connected to Snowflake via a final Offload to Snowflake.

An actual deployment would likely retain transformations in Oracle that are difficult to migrate, but push as much workload to Snowflake as possible. Once all transformation steps are migrated to Snowflake, Oracle can be switched off, assuming the analysis tools have already been ported.

How difficult is to migrate Oracle SQL and PL/SQL to Snowflake?

Snowflake SQL is a very rich SQL dialect and has many similarities with Oracle. A large portion of code migrations are now being done with the assistance of sophisticated code conversion tools reducing the amount of manual recoding. Added to that Snowflake is also releasing SnowScript which is a SQL based procedural language that is more similar to Oracle PL/SQL than Javascript. The most difficult PL/SQL procedures to migrate are those described in the Oracle Data Warehousing Guide — Transforming Data Using PL/SQL:

PL/SQL provides a row-based approached and can accommodate very sophisticated transformation rules. For example, a PL/SQL procedure could open multiple cursors and read data from multiple source tables, combine this data using complex business rules, and finally insert the transformed data into one or more target tables. It would be difficult or impossible to express the same sequence of operations using standard SQL statements.

PL/SQL transformations of this nature may need to be refactored but could be implemented in SnowScript stored procedures, or Java or Python User Defined Functions.

Why use Gluent for Oracle to Snowflake Migration?

Gluent allows Oracle customers to leverage Snowflake without any changes to their Oracle client tools as they can remain connected to Oracle. Gluent transparent query provides a no code gateway to seamlessly split query workload between Oracle and Snowflake.

For full migrations, the first phase is to move all your client analytics tools to connect to Snowflake and stream the presentation layer to Snowflake using GDP incremental offload. This enables all the query workloads to be handled by Snowflake and analytics and data science users can benefit from Snowflake’s scalable service levels and access to the Snowflake Data Cloud.

For the most substantial phase of the migration Gluent’s Offload and Present capabilities allow the transformation logic to be migrated in stages, leveraging Snowflake further as more of the transformation steps are migrated over time until finally the Oracle Data Warehouse can be retired.

Incremental Migration Summary

Most migration processes require that you migrate all your code, run two systems end to end in parallel before finally switching over.

The incremental migration approach allows customers to leverage Snowflake from the outset. Firstly migrating query workloads, reducing workload on your EDW and retiring Data Marts early. Following that the bigger task of migrating the Load and Transform code can be gradually phased into the production data pipeline where each ingest and transformation phase can built, tested and adopted incrementally.

How do I find out more?

Gluent are a Snowflake technical and services partner and can be reached at gluent.com

References

  1. Oracle Data Warehousing Guide (12c)
  2. Connecting Oracle BI to the Snowflake Data Cloud — Medium blog article written by Snowflake colleague Dan Flippo

--

--

David Ruthven
David Ruthven

Written by David Ruthven

Snowflake Sales Engineer — opinions expressed are solely my own and do not express the views or opinions of my employer.

No responses yet