Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rich Shepard
On Wed, 22 May 2019, Adrian Klaver wrote: A sample of the data you are cleaning up. Adrian, et al.: I have it working properly now. Both org_id and person_id numbers are prepended to each row in the appropriate table and they are unique because each series begins one greater than the max(*_id

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel
> > A sample of the data you are cleaning up. > > I think what people are trying to wrap there head around is how 800 > lines in the file is being split into two subsets: the organization data > and the people data. In particular how that is being done to preserve > the relationship between organiz

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Adrian Klaver
On 5/22/19 10:53 AM, Rich Shepard wrote: On Wed, 22 May 2019, Francisco Olarte wrote: Also, when I speak of "unique identifier" I'm not speaking of the one if your FINAL tables, I assume you would have at least the *_id field as PKEY, so nothing else needed, but the one in your SOURCE data set

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rich Shepard
On Wed, 22 May 2019, Jeremy Finzel wrote: Are you saying your database already has an organizations table, and this data file is appending to it with all-brand-new organizations and people? Jeremy, The database has both organizations and people tables (among others) which are already populate

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel
On Wed, May 22, 2019 at 12:53 PM Rich Shepard wrote: > On Wed, 22 May 2019, Francisco Olarte wrote: > > > Also, when I speak of "unique identifier" I'm not speaking of the one if > > your FINAL tables, I assume you would have at least the *_id field as > > PKEY, so nothing else needed, but the on

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rich Shepard
On Wed, 22 May 2019, Francisco Olarte wrote: Also, when I speak of "unique identifier" I'm not speaking of the one if your FINAL tables, I assume you would have at least the *_id field as PKEY, so nothing else needed, but the one in your SOURCE data set (it can be anything, like the row number i

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel
On Wed, May 22, 2019 at 11:07 AM Rich Shepard wrote: > On Wed, 22 May 2019, Francisco Olarte wrote: > > > You are not reading what we write to you. Note YOU AND ONLY YOU are the > > one speaking of PK. We are speaking of "unique identifier" ( that would > > be, IIRC, "candidate keys", you can pee

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Francisco Olarte
Rich: On Wed, May 22, 2019 at 6:07 PM Rich Shepard wrote: > On Wed, 22 May 2019, Francisco Olarte wrote: > > You are not reading what we write to you. Note YOU AND ONLY YOU are the > > one speaking of PK. We are speaking of "unique identifier" ( that would > > be, IIRC, "candidate keys", you can

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rich Shepard
On Wed, 22 May 2019, Francisco Olarte wrote: You are not reading what we write to you. Note YOU AND ONLY YOU are the one speaking of PK. We are speaking of "unique identifier" ( that would be, IIRC, "candidate keys", you can peek any as your PK, or even introduce a new synthetic one with a seque

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Francisco Olarte
Rich: On Wed, May 22, 2019 at 4:38 PM Rich Shepard wrote: > Not necessary; see below. Also, these data come from a regulator and > provided as an Excel spreadsheet. If they were extracted from a database > then that was very poorly designed because there's no consistency in how > fields/columns

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Francisco Olarte
Rich: On Wed, May 22, 2019 at 2:40 PM Rich Shepard wrote: > > I'm curious, what org_id do you put (manually) to the people? you must > > have some way to match it ( like, say, "I have an organization line, > > followed by lines for people in this organization" > The first data entered was in sma

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rich Shepard
On Wed, 22 May 2019, Adrian Klaver wrote: So does the people data have an organization attribute? Adrian, Yes. It's the FK to the organization table. If so why not just assign the org_id while cleaning up the data? That's what I thought to do based on your suggestion yesterday. It would m

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Adrian Klaver
On 5/22/19 7:38 AM, Rich Shepard wrote: On Wed, 22 May 2019, Jeremy Finzel wrote: There's absolutely no need to use anything beyond SQL here, though you could if you want to. Jeremy, This is a new experience for me so I didn't think of a SQL solution. I really wonder how much we are just ta

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rob Sargent
> > Each row in the source file (exported from the spreadsheet as .csv and > renamed to .txt for processing in emacs and awk) is a mixture of attributes Absolutely no need to rename the .csv for those tools. > that belong in either or both of the organization and people tables in my > database. An

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rich Shepard
On Wed, 22 May 2019, Jeremy Finzel wrote: There's absolutely no need to use anything beyond SQL here, though you could if you want to. Jeremy, This is a new experience for me so I didn't think of a SQL solution. I really wonder how much we are just talking past each other simply because we

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Jeremy Finzel
On Wed, May 22, 2019 at 7:40 AM Rich Shepard wrote: > On Wed, 22 May 2019, Francisco Olarte wrote: > > > I'm curious, what org_id do you put (manually) to the people? you must > > have some way to match it ( like, say, "I have an organization line, > > followed by lines for people in this organiz

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Rich Shepard
On Wed, 22 May 2019, Francisco Olarte wrote: I'm curious, what org_id do you put (manually) to the people? you must have some way to match it ( like, say, "I have an organization line, followed by lines for people in this organization" Francisco, The first data entered was in small chunks so

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Francisco Olarte
Jeremy: On Tue, May 21, 2019 at 11:58 PM Jeremy Finzel wrote: > Then take Francisco's suggestion, only use an md5 of the organization fields > to create yourself a unique identifier. Then you can use ctid (unique > internal identifier for each row) to join back. You use SQL like this: Sadly

Re: Bulk inserts into two (related) tables

2019-05-22 Thread Francisco Olarte
On Tue, May 21, 2019 at 7:24 PM Rich Shepard wrote: > > From how you say it, I assume you have some data in your original > > dumps which can relate boths, lets assume it's org_name, but may be an > > org-code. If you do not have it it means you cannot match people to > > orgs in your data, all is

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Jeremy Finzel
On Tue, May 21, 2019 at 12:24 PM Rich Shepard wrote: > On Tue, 21 May 2019, Francisco Olarte wrote: > > > From how you say it, I assume you have some data in your original > > dumps which can relate boths, lets assume it's org_name, but may be an > > org-code. If you do not have it it means you c

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Rich Shepard
On Tue, 21 May 2019, Will Hartung wrote: You can either use a script for the entire process, or, convert the people table to INSERT statements that have a SELECT for the foreign key as part of the insert. INSERT INTO PEOPLE (id, name, org_id) VALUES (1, ‘Bob’, (SELECT org_id FROM org WHERE org_

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Will Hartung
> On May 21, 2019, at 9:56 AM, Rich Shepard wrote: > > I could insert all new rows into the organizations table, then produce a > list of the org_id and org_name to manually insert the foreign key in the > related people table. Time and effort intense. You can either use a script for the entire

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Ron
On 5/21/19 12:27 PM, Rich Shepard wrote: On Tue, 21 May 2019, Michael Lewis wrote: For each row- Insert into organizations table if the record does not exist, returning ID. Insert into people using that ID. Michael, The org_id will not exist until I run the insert script. Else, load all th

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Rich Shepard
On Tue, 21 May 2019, Adrian Klaver wrote: The other way is to create the org_id for each organization ahead of time and put it into the data file. Either way you have create the org_id for the FK relationship, it is just a matter of where and when. Adrian, I had thought of that but overlooked

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Rich Shepard
On Tue, 21 May 2019, Michael Lewis wrote: For each row- Insert into organizations table if the record does not exist, returning ID. Insert into people using that ID. Michael, The org_id will not exist until I run the insert script. Else, load all the data with empty ID column on person tabl

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Adrian Klaver
On 5/21/19 10:22 AM, Rich Shepard wrote: On Tue, 21 May 2019, Adrian Klaver wrote: Well you are not going to know the org_id until the organization table is loaded, which means something like: 1) First run through file load the organizations table. 2) Build a mapping of org_id to organizatio

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Rich Shepard
On Tue, 21 May 2019, Francisco Olarte wrote: From how you say it, I assume you have some data in your original dumps which can relate boths, lets assume it's org_name, but may be an org-code. If you do not have it it means you cannot match people to orgs in your data, all is lost. Francisco,

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Michael Lewis
For each row- Insert into organizations table if the record does not exist, returning ID. Insert into people using that ID. Else, load all the data with empty ID column on person table,then just update the person table afterward and drop the org name column. Perhaps I am missing something.

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Rich Shepard
On Tue, 21 May 2019, Adrian Klaver wrote: Well you are not going to know the org_id until the organization table is loaded, which means something like: 1) First run through file load the organizations table. 2) Build a mapping of org_id to organization. 3) Run through data file again and loa

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Francisco Olarte
Rich: On Tue, May 21, 2019 at 6:56 PM Rich Shepard wrote: > I'm cleaning and formatting a 800-line data file to be inserted into a > database. Some of the input file fields will be inserted into an > 'organizations' table letting postgres assign sequential org_id numbers. > Other fields will be i

Re: Bulk inserts into two (related) tables

2019-05-21 Thread Adrian Klaver
On 5/21/19 9:56 AM, Rich Shepard wrote: I'm cleaning and formatting a 800-line data file to be inserted into a database. Some of the input file fields will be inserted into an 'organizations' table letting postgres assign sequential org_id numbers. Other fields will be inserted into a separate 'p

Bulk inserts into two (related) tables

2019-05-21 Thread Rich Shepard
I'm cleaning and formatting a 800-line data file to be inserted into a database. Some of the input file fields will be inserted into an 'organizations' table letting postgres assign sequential org_id numbers. Other fields will be inserted into a separate 'people' table associated with each organiz