Data Integration on Oracle Cloud Infrastructure

This blog introduces you to Oracle Cloud Infrastructure (OCI) Data Integration service and reviews how to setup Data Integration workspace in OCI. I love Data Integration in OCI and hope to get you excited about this great product as well. I will deep dive into the details and  answer many questions about data integration. As you may be aware, there are several data integration tools like ODI11g, ODI12c, ODI on Marketplace, however I would like to dive into what  Oracle Cloud Infrastructure Data Integration is and how it can benefit you.

Oracle Cloud Infrastructure Data Integration is a next generation, fully managed, multi-tenant, serverless, native cloud ETL service. It will help you with common extract, load, and transform (ETL) tasks such as ingesting data from different sources, cleansing, transforming, and reshaping that data and then efficiently loading it to target data sources on Oracle Cloud Infrastructure.Data Integration on Oracle Cloud Infrastructure

The Key Features include:

  • Low code approach- It allows users to design graphically, visually the programs that run in the OCI.
  • Data immersive user experience to boost productivity
  • Hybrid execution powered by Spark and SQL push-down capabilities
  • Rules-based data integration pattern to support schema evolution
  • Serverless execution, pay-as you go pricing model 

Use Cases for OCI DI:

I will review a few use cases below and how they can benefit you and your organization.

Use Case 1: Data integration for big data, data lakes, and data science

  • Efficiently load and transform data at scale into Data Lakes for data science and analytics.
  • Load the data into object storage and create high-quality models more quickly using OCI data science

 Use Case 2: Data integration for data marts and data warehousing

  • Load and transform transactional data at scale into Data warehouse (Eg: Autonomous Data warehouse) for analytics purposes. 

The table below compares the differences in versions of ODI, ODI marketplace, OCI Data Integration.

Key Factors ODI ODI Marketplace OCI Data Integration
Deployment On-Premise -customer managed OCI Compute Instance, Oracle Managed OCI PaaS
Operating system Linux or Windows Only Linux NA
DB Repository DB ADW not supported Oracle,ADW,MySQL etc. NA
Installation Manual Launch the compute instance(agent and studio are preinstalled) Oracle Managed
Supported source and targets Oracle,Non-Oracle DB ,Hive, HDFS ,object storage etc. Similar like ODI on-prem Oracle,Non-Oracle DB ,Oracle Fusion Applications, HDFS,  ,object storage many more.
Pricing License needed OCPU hours(BYOL or free for now) OCPU hours

When to use what (ODI On-Premise vs. ODI Marketplace vs. OCI DI) depends on many factors. I have included some scenarios below:

  1. If your source and target DBs are both On-prem, then ODI on-prem is the good choice.
  2. ODI Marketplace is preferred in cases of:
    • You don’t want to maintain local infrastructure
    • Avoid ODI installation and configuration
    • Fully Control Scalability Dynamically
  3. OCI DI is preferred for
    • Data Lakehouse based implementations
    • Leverage data for AI/ML Data Science/ Big Data use cases
    • At least one of Source/Target DBs is on the cloud
    • Especially when Non-Oracle DB type of data assets are involved

OCI Data Integration Vision:

OCI Data Integration Vision

Important steps to perform before setting up OCI DI Workspace:

In order to ensure OCI DI Workspace is set up correctly and providing the most benefits, I recommend provisioning Data Integration service as outlined below. The following should be taken care prior to beginning the implementation steps to set up Data Integration below.

Your environment should consists of:

  • Windows or Linux based system
  • A web browser installed on your systems, preferably Mozilla firefox or Google Chrome etc.
  • Record all the Oracle Cloud Infrastructure (OCI) is assigned to you. Record the following:
    • Tenancy name or cloud account name
    • Username
    • Password
    • Compartment to be used

Implementation steps to setup Data Integration:

With your environment set up you should be ready for the actual implementation to create the Data Integration workspace. Below are the two perquisites to setup the data integration workspace in OCI.

  • Create an OCI Data Integration Polices in OCI Data Integration
  • Create a Virtual Cloud Network in Oracle Cloud Infrastructure

Let’s get started:

1. Create an OCI Data Integration Polices in OCI Data Integration:

Policies required for OCI Data Integration will be in addition to the regular policies used in Oracle Cloud Infrastructure for accessing other necessary resources.

In the below policy creation statements,

Group-name: The group that your OCI user belongs.
Compartment-name:  The OCI Compartment you are using.

  • allow group <group-name> to manage dis-workspaces in compartment <compartment-name>
  • allow group <group-name> to manage dis-work-requests in compartment <compartment-name>
  • allow group <group-name> to use virtual-network-family in compartment <compartment-name>
  • allow group <group-name> to manage tag-namespaces in compartment <compartment-name>

Policy Editor:

Policy-Editor

Policy Editor Two

2. Create a Virtual Cloud Network in Oracle Cloud Infrastructure:

You will need a Virtual Cloud Network (VCN) to use OCI Data Integration .Oracle virtual cloud networks provide customizable and private cloud networks in Oracle Cloud Infrastructure. 

To create a VCN, use the following steps:

    • 1. In the OCI console, open the navigation menu. Go to Networking and click Virtual Cloud Networks.

Virtual Cloud Networks

  • 2. Select a Compartment that has been allocated to you.
  • 3. On the VCN page, click Start VCN Wizard to create a new VCN.
  • 4. Select VCN with internet Connectivity and click Start VCN Wizard.
  • 5. In the Create VCN with Internet Connectivity dialog, enter the VCN Name as you wish (In this case it is vcn_dataintegration) and the Compartment. Accept the default values in Configure VCN and Subnets and click next.
  • 6. Verify the resource details and click “Create” to create the VCN.

Create The VCN

Creating a Data Integration Workspace:

Before you can get started with Data Integration, you must create a workspace.

A workspace is an organizational construct to keep multiple data integration projects and their resources (data assets, data flows etc.).

Use the following steps to create DI workspace: 

In the console, go to the navigation menu. Under Analytics & AI, go to Data Integration and click workspaces.

Data-Integration and Workspaces

Next enter the workspace name, description and choose the VCN that you have created earlier and select the subnet type as private in your compartment.

Create Workspaces

The workspace is ready now. Click on the workspace to see the homepage of DI.

Homepage of DI

Additional information regarding OCI Data Integration can be found at the URL below:

https://docs.oracle.com/en-us/iaas/data-integration/home.htm

Conclusion

In this blog, you have seen that you can successfully setup an OCI Data Integration workspace in Oracle Cloud Infrastructure. Utilizing the steps that I outlined, you can now create data assets, data flows to transform and move the data to desired targets.

This capability helps data engineers and ETL developers with common extract, transform, and load (ETL) tasks such as ingesting data from a variety of data assets, cleansing, transforming, and reshaping that data and efficiently loading it to target data assets.

Please try out this process and give us your feedback.

Apps Associates’ Data and Analytics Practice is a team of 125 highly credential professionals dedicated to helping companies leverage their data assets to become data-driven organizations.

Stay tuned for more blogs in this series about OCI Data Integration. We will continue the series with the next topic “creating the data Assets in OCI Data Integration Workspace” (Object Storage, Autonomous Data Warehouse, Oracle Fusion Applications).