Shrink Oracle and quickly leverage Snowflake with Gluent
One of the many things Gluent has in common with Snowflake is it appears too good to be true. In this article I will illustrate just how easily you can get up and running with the Gluent Data Platform and Snowflake and start leveraging the immense benefits of this combined solution almost immediately.
Just to Recap
Gluent allows you to migrate Oracle Data Warehouse workload to Snowflake without changes to your Oracle SQL and PL/SQL. Gluent calls this process a Reverse Migration. A reverse migration is where you migrate database workload first and gradually migrate your code over time, allowing you to leverage your new platform (Snowflake) early. Unlike a traditional migration which requires you to migrate all your extract, load, transform and query processing before you can start using your new platform.
Steps to success
Assuming you have Oracle on Linux or Exadata, (on premise or cloud), access to a Snowflake account and a cloud provider account with read-write access to cloud object storage, typically on the same cloud deployment as Snowflake, you are good to follow the following steps.
- Install Gluent
- Configure a Snowflake user account
- Configure Gluent Hybrid Schema
- Configure Gluent to connect to Oracle, Snowflake and cloud storage for staging
- Offload Oracle data to Snowflake using Gluent
- Run Oracle SQL and PL/SQL queries intact whilst Gluent pushes workload down to Snowflake
Below are some more details about the key steps.
Step 1: Install Gluent
There are only two tar bundles to install: gluent_offload and gluent_transport_spark.
Once installed the Gluent suite consists of 3 main programs.
- metadata daemon which collects information from Oracle and Snowflake to construct Snowflake SQL
- data daemon which is used for offloading (pushing data from Oracle to Snowflake) and pulling data (result sets) from Snowflake when executing queries
- smart connector constructs and executes queries against Snowflake
The data daemon and the spark ELT processing can be installed on a separate Gluent Node/Server whilst the metadata daemon and smart connector must be installed on the Oracle database server(s). This avoids paying Oracle core licenses for non-Oracle software.
Once installed and configured there is a utility called ‘connect’ which checks connectivity, permissions and resources with Oracle, Snowflake, and your cloud storage.
Step 2: Configure a Hybrid Schema in Oracle
One of the key elements is to create a hybrid schema in Oracle. This is performed by a Gluent script. In my example Sales History data is stored in schema SH. The Gluent script below creates a corresponding Hybrid Schema called SH_H. This allows us to run Oracle queries intact by only switching the session to use the SH_H hybrid schema to leverage the Gluent offload. When an Oracle table is offloaded (replicated) to Snowflake a corresponding view is created in the SH_H schema to enable a split of the query workload.
SQL> @prepare_hybrid_schema.sql schema=SH
How easy was that?
Step 3: Offload Oracle Table data to Snowflake using Gluent
Using the Oracle SALES HISTORY (SH) schema I have the following SALES fact table in Oracle. It has 52 million rows in total, partitioned by calendar year up to 2020, and quarterly for the first 2 calendar quarters in 2021.
To copy this data to Snowflake we use the Gluent offload command, like the following:
% offload -t sh.sales — older-than-date=2021–01–01 — sort-columns=TIME_ID — offload-sort-enabled -x
This command copies all the data from the SALES table up to 1st January, 2021 and ensures clustering is enabled on the TIME_ID column in the corresponding Snowflake table. The data is exported from Oracle as Parquet, transferred to a cloud storage staging area and then uses the Snowflake COPY command to populate the target table which is created on this initial run.
We now have a copy of 88% of all the SALES data up to the start of 2021 in Snowflake. The full dataset remains in Oracle so we can test relative performance. In production we could trim this offloaded data from Oracle.
The offload command creates a view in the hybrid schema SH_H for the SALES table. In essence this UNION ALL splits the query workload between Oracle and Snowflake. Below is a simplified version of the VIEW creation statement.
CREATE OR REPLACE VIEW “SH_H”.”SALES” (“PROD_ID”, . . .— —[ Query data in Oracle: everything since 01-JAN-2021 ] — — — SELECT “SH.PROD_ID”, . . .
FROM “SH”.”SALES”
WHERE “SH”.”TIME_ID” >= TIMESTAMP ‘2021–01–01 00:00:00.000000000’UNION ALL — — [ Query data in Snowflake: everything before 01-JAN-2021 ] — SELECT “SALES_EXT”.”PROD_ID”, . . .
FROM “SH_H”.”SALES_EXT”
WHERE “SH_H”.”TIME_ID” < TIMESTAMP ‘2021–01–01 00:00:00.000000000’;
Step 4: Run a performance comparison query or two
Time to compare performance. The query below is unchanged between executions. The only difference in the second run is we point the session to the hybrid schema (SH_H) created by Gluent to enable workload to be split between Oracle and Snowflake. The Oracle query took 26.43 seconds and the Oracle/Gluent/Snowflake query took 9.203 seconds.
In the above query since it spanned years 2016 to 2020 all the fetching and filtering was either done in Oracle or Snowflake respectively. The query below also requests data for year 2021, in this case Oracle needs to be involved in fetching data from the SALES fact table in both queries since the 2021 data only resides in Oracle. Below are the query execution times when including 2021 in the query, the Oracle query took 32.729 seconds and the Oracle/Gluent/Snowflake query took 10.18 seconds.
In these examples the performance improvement was assisted by pushing as much work as possible down to Snowflake. Gluent was able to push down a lot of the work to Snowflake despite only having a copy of the SALES fact table and none of the dimension tables. The operations pushed down were:
- Range filter on TIME_ID
- List filter on CALENDAR_WEEK_NUMBER (via TIME_ID)
- List filter on CHANNEL_ID
- Aggregation on AMOUNT_SOLD
Aside from pushing more work down to Snowflake, this also reduces the result set and hence the amount of data that Oracle has to post process. Below is a modified snippet of the query issued to Snowflake.
SELECT
“CUST_ID”, “TIME_ID”, “CHANNEL_ID”,
SUM(“AMOUNT_SOLD”) AS “SUM_AMOUNT_SOLD”
FROM “SH”.”SALES”
WHERE
(“TIME_ID” >= TIMESTAMP ‘2016-06-26 00:00:00.000000000’
AND “TIME_ID” <= TIMESTAMP ‘2020–07–05 00:00:00.000000000’)
AND “TIME_ID” IN (
TIMESTAMP ‘2017–06–26 00:00:00.000000000’,
TIMESTAMP ‘2017–06–27 00:00:00.000000000’,
“
“
“
TIMESTAMP ‘2020–07–04 00:00:00.000000000’,
TIMESTAMP ‘2020–07–05 00:00:00.000000000’))
AND “CHANNEL_ID” IN (2,4)
GROUP BY
“CHANNEL_ID” ,”CUST_ID” ,”TIME_ID”;
When a table is not present in Snowflake that needs to be joined, Gluent can push down an IN clause, if the table to be joined is present Gluent will add the JOIN to the SQL pushed to Snowflake. This is something Exadata storage cells cannot do.
Step 5: What’s the plan?
When you have run a hybrid query you can generate an Oracle Enterprise Manager / Gluent Hybrid Query report that provides details on the execution plan, execution times and latencies.
What did we learn?
We learned what a reverse migration is. We also learned that Gluent is easy to install, configure, monitor and test. It is designed to seamlessly integrate Oracle and Snowflake. You can synchronise data from Oracle to Snowflake with no additional tooling and start offloading workload to Snowflake without any coding. The time to value is almost immediate.
What’s Next?
Checkout Gluent Advisor which you can download for free. This provides a quick estimate of the amount of data and workload you could be offloading to Snowflake and an estimate of the anticipated costs and savings. There are multiple benefits deploying Gluent Data Platform with Snowflake including easing the pressure on your Oracle Warehouse databases and starting your migration journey today, instead of after months of rewriting complex code.