The Hard Problem in Database Migrations
Online App: https://huggingface.co/spaces/neuesql/sqlgptapp

The Problem
A simple Oracle into Postgresql data migration solution analysis
YES=Meet Need, X=Not at All, P=Partially
| Provider | Schema migrate | Function migrate | Store Procedure migrate |
|---|---|---|---|
| AWS DMS | Y | X | X |
| GOOGLE DMS | Y | X | X |
| Azure DMS | Y | X | X |
| Ora2pg | Y | P | P |
| Jooq | Y | P | P |
Migrate functions and stored procedures: You need to migrate the stored procedures from Oracle to PostgreSQL. It is the most complicated tasks, even google, AWS, Azure do not support for enterprise big requirement and demand. The million of migration and transition cost is always the main blocker to shift technology forward.
Objectives
- Overing all the features from database like view, index, schema, stored procedures, functions, etc…
- Know the limitation when the functions can’t find on target database.
- Automation with Test pipeline.
Solutions
the solution is inspired by OpenAI GPT model. Converting the problem from Database Domain Special Programming Language Compiler into General NLP problem.
Features
- No Coding for SQL compiler or Converter in DSL language.
- Based on Large language models (LLMs), like OpenAI GPT or Google T5, Facebook llama
- SQL GPT be adapted into different databases with different datasets.
- Support any data objects: Tables, Views, Indexes, Packages, Partitions, Procedures, Functions, Triggers, Types, Sequences, Materialized View, Database Links, Scheduler, GIS, etc…
- Reinforcement learning from Human Feedback(DBA) for language model. OpenAI Paper
Roadmap
Version 1: SQL GPT is verifying the possibility of this design by OpenAI GPT model and API.
Version 2: to extend model like open-source model like Google T5 model + clean dataset to build for enterprise demand.
System Architecture(in plan)
- There are several components like SQL collector, Dummy-Data-Generator, SQLGPT service …
- SQLCollector: Web Service to receive source SQL.
- DataGenerator: Generate Dummy Data for Schema.
- SQLGPT Service: Core Service to generate target SQL.
- Models : V1 from OpenAI model; V2 from Google T5 Model.
- SQLTrainer: training the model by new HumanFeedback Reinforcement learning.
Examples
Example 1: select top 100 customer from Oracle PL/SQL into Postgresql
### Oracle
SELECT id, client_id
FROM customer
WHERE rownum <= 100
ORDER BY create_time DESC;
### Postgresql
SELECT id, client_id
FROM customer
ORDER BY create_time DESC
LIMIT 100;
Example 2: A Transform SQL from Oracle PL/SQL into PostgreSQL PG/SQL
### Oracle
CREATE OR REPLACE PROCEDURE print_contact(
in_customer_id NUMBER
)
IS
r_contact contacts%ROWTYPE;
BEGIN
SELECT *
INTO r_contact
FROM contacts
WHERE customer_id = p_customer_id;
dbms_output.put_line(r_contact.first_name || ' ' ||
r_contact.last_name || '<' || r_contact.email || '>');
EXCEPTION
WHEN OTHERS THEN
dbms_output.put_line(SQLERRM);
END;
### Postgresql
-- A postgresql PG/SQL Procedure
create procedure print_contact(IN in_customer_id integer)
language plpgsql
as
$$
DECLARE
r_contact contacts%ROWTYPE;
BEGIN
-- get contact based on customer id
SELECT *
INTO r_contact
FROM contacts
WHERE customer_id = in_customer_id;
-- print out contact's information
RAISE NOTICE '% %<%>', r_contact.first_name, r_contact.last_name, r_contact.email;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION '%', SQLERRM;
END;
$$;
Limitation:
SQL context is limited to the LLM max training token size now, T5 model need different datasets to feed. Testing Facebook LLLM model.
Source Code
Github Introducation: https://github.com/neuesql/sqlgpt
Github T5 model: https://github.com/neuesql/sqltransformer
T5-Endpoint: https://huggingface.co/neuesql/sqltransformer
Leave a comment