The Hard Problem in Database Migrations

Online App: https://huggingface.co/spaces/neuesql/sqlgptapp

The Problem

A simple Oracle into Postgresql data migration solution analysis

YES=Meet Need, X=Not at All, P=Partially

ProviderSchema
migrate
Function
migrate
Store Procedure
migrate
AWS DMSYXX
GOOGLE DMSYXX
Azure DMSYXX
Ora2pgYPP
JooqYPP
Database Migration Service

Migrate functions and stored procedures: You need to migrate the stored procedures from Oracle to PostgreSQL. It is the most complicated tasks, even google, AWS, Azure do not support for enterprise big requirement and demand. The million of migration and transition cost is always the main blocker to shift technology forward.

Objectives

  • Overing all the features from database like view, index, schema, stored procedures, functions, etc…
  • Know the limitation when the functions can’t find on target database.
  • Automation with Test pipeline.

Solutions

the solution is inspired by OpenAI GPT model. Converting the problem from Database Domain Special Programming Language Compiler into General NLP problem.

Features

  • No Coding for SQL compiler or Converter in DSL language.
  • Based on Large language models (LLMs), like OpenAI GPT or Google T5, Facebook llama
  • SQL GPT be adapted into different databases with different datasets.
  • Support any data objects: Tables, Views, Indexes, Packages, Partitions, Procedures, Functions, Triggers, Types, Sequences, Materialized View, Database Links, Scheduler, GIS, etc…
  • Reinforcement learning from Human Feedback(DBA) for language model. OpenAI Paper

Roadmap

Version 1: SQL GPT is verifying the possibility of this design by OpenAI GPT model and API.

Version 2: to extend model like open-source model like Google T5 model + clean dataset to build for enterprise demand.

System Architecture(in plan)

  • There are several components like SQL collector, Dummy-Data-Generator, SQLGPT service …
  • SQLCollector: Web Service to receive source SQL.
  • DataGenerator: Generate Dummy Data for Schema.
  • SQLGPT Service: Core Service to generate target SQL.
  • Models : V1 from OpenAI model; V2 from Google T5 Model.
  • SQLTrainer: training the model by new HumanFeedback Reinforcement learning.

Examples

Example 1: select top 100 customer from Oracle PL/SQL into Postgresql

### Oracle
SELECT id, client_id
FROM customer
WHERE rownum <= 100
ORDER BY create_time DESC;
### Postgresql
SELECT id, client_id
FROM customer
ORDER BY create_time DESC
LIMIT 100;

Example 2: A Transform SQL from Oracle PL/SQL into PostgreSQL PG/SQL

### Oracle
CREATE OR REPLACE PROCEDURE print_contact(
    in_customer_id NUMBER
)
    IS
    r_contact contacts%ROWTYPE;
BEGIN

    SELECT *
    INTO r_contact
    FROM contacts
    WHERE customer_id = p_customer_id;

    dbms_output.put_line(r_contact.first_name || ' ' ||
                         r_contact.last_name || '<' || r_contact.email || '>');

EXCEPTION
    WHEN OTHERS THEN
        dbms_output.put_line(SQLERRM);
END;
### Postgresql
-- A postgresql PG/SQL Procedure
create procedure print_contact(IN in_customer_id integer)
    language plpgsql
as
$$
DECLARE
    r_contact contacts%ROWTYPE;
BEGIN
    -- get contact based on customer id
    SELECT *
    INTO r_contact
    FROM contacts
    WHERE customer_id = in_customer_id;

    -- print out contact's information
    RAISE NOTICE '% %<%>', r_contact.first_name, r_contact.last_name, r_contact.email;
EXCEPTION
    WHEN OTHERS THEN
        RAISE EXCEPTION '%', SQLERRM;
END;
$$;

Limitation:

SQL context is limited to the LLM max training token size now, T5 model need different datasets to feed. Testing Facebook LLLM model.

Source Code

Github Introducation: https://github.com/neuesql/sqlgpt

Github T5 model: https://github.com/neuesql/sqltransformer

T5-Endpoint: https://huggingface.co/neuesql/sqltransformer

Posted in

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.