Re: automated 'discovery' of a table : potential primary key, columns functional dependencies ...

Adrian Klaver Fri, 22 Nov 2019 14:49:37 -0800

On 11/22/19 2:05 PM, Rémi Cura wrote:

Hello dear List,
I'm currently wondering about how to streamline the normalization of anew table.
I often have to import messy CSV files into the database, and makingclean normalized version of these takes me a lot of time (think dozensof columns and millions of rows).

To me messy means the information to do the below is not available.Personally I think you best bet is to get the data into tables and thenuse visualization tools to help you determine the below. My guess isthere will be a lot of data cleaning going on before you can get to awell ordered table layout.

I wrote some code to automatically import a CSV file and infer the typeof each column.
Now I'd like to quickly get an idea of
  - what would be the most likely primary key
  - what are the functional dependencies between the columns

The goal is **not** to automate the modelling process,
but rather to automate the tedious phase of information collection
that is necessary for the DBA to make a good model.
If this goes well, I'd like to automate further tedious stuff (likesplitting a table into several ones with appropriate foreign keys /constraints)
I'd be glad to have some feedback / pointers to tools in plpgsql or evenplpython.
Thank you very much
Remi



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: automated 'discovery' of a table : potential primary key, columns functional dependencies ...

Reply via email to