Re: managing primary key conflicts while restoring data to table with existing data

Rob Sargent Thu, 26 Sep 2019 07:07:06 -0700

> On Sep 26, 2019, at 7:44 AM, Krishnakant Mane <kkm...@riseup.net> wrote:
> 
> 
>> On 26/09/19 6:53 PM, Rob Sargent wrote:
>>  
>> 
>> On Sep 26, 2019, at 12:27 AM, Krishnakant Mane <kkm...@riseup.net> wrote:
>> 
>>> 
>>>> On 26/09/19 12:03 AM, Adrian Klaver wrote:
>>>>> On 9/25/19 8:04 AM, Rob Sargent wrote: 
>>>>> 
>>>>> 
>>>>> On Sep 25, 2019, at 8:24 AM, Krishnakant Mane <kkm...@riseup.net 
>>>>> <mailto:kkm...@riseup.net>> wrote: 
>>>>> 
>>>>>> 
>>>>>>> On 25/09/19 7:50 PM, Adrian Klaver wrote: 
>>>>>>>> On 9/25/19 12:15 AM, Krishnakant Mane wrote: 
>>>>>>>> Hello all, 
>>>>>>>> 
>>>>>>>> I have been using postgresql for an enterprise quality account's 
>>>>>>>> automation and inventory management software called GNUKhata 
>>>>>>>> <https://gnukhata.in> 
>>>>>>>> 
>>>>>>>> Our team is planning to add backup and restore function in the 
>>>>>>>> software. 
>>>>>>>> 
>>>>>>>> But we don't want to dump the entire database and then restore the 
>>>>>>>> same. 
>>>>>>>> 
>>>>>>>> What we are trying to do is to copy data specific to an organization. 
>>>>>>>> 
>>>>>>>> The challenge here is that I might copy all data (account heads, 
>>>>>>>> bills, vouchers etc ) for one organization from an instance on one 
>>>>>>>> machine. 
>>>>>>>> 
>>>>>>>> I take the archive in what ever format to another machine and now 
>>>>>>>> attempt to restore. 
>>>>>>>> 
>>>>>>>> The risk here is for example if the primary key value for orgcode in 
>>>>>>>> the organization table is 5, it might conflict with the data where I 
>>>>>>>> am attempting                     it to be restored. 
>>>>>>>> 
>>>>>>>> Same holds true for bills, invoices etc. 
>>>>>>>> 
>>>>>>>> A certain account head with accountcode 1 might be already present on 
>>>>>>>> the second machine. 
>>>>>>>> 
>>>>>>>> I am not expecting the users to empty all data from the destination 
>>>>>>>> machine before restoring a backup. 
>>>>>>>> 
>>>>>>>> The reason is that an auditor may have many client's data and one 
>>>>>>>> can't predict what primary key values are going to come from a backup. 
>>>>>>>> 
>>>>>>>> Basically I can even say this is a copy paste instead of a pure backup 
>>>>>>>> and restore. 
>>>>>>>> 
>>>>>>>> Can any one suggest how to handle such conflicts? 
>>>>>>> 
>>>>>>> Hard to say. If the data is held in common tables(bills, vouchers, 
>>>>>>> etc)then the only thing I see happening is changing the PK values to an 
>>>>>>> unused value. That could turn into a nightmare though. Not only that 
>>>>>>> you lose the connection to the original data source. If the data can be 
>>>>>>> broken out into separate tables then I could see placing them in their 
>>>>>>> own schema. 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Regards, 
>>>>>>>> Krishnakant Mane, 
>>>>>>>> Project Founder and Leader, 
>>>>>>>> GNUKhata <https://gnukhata.in/> 
>>>>>>>> //(Opensource Accounting, Billing and Inventory Management Software)// 
>>>>>>> 
>>>>>> 
>>>>>> Hi Adrian, 
>>>>>> 
>>>>>> Even I am thinnking to do some kind of upsert with this situation. 
>>>> 
>>>> So to be clear the tables you are working can have records from multiple 
>>>> organizations in a single table? 
>>>> 
>>>>>> 
>>>>>> And I would have to set the pkey to an unassigned value when there is 
>>>>>> conflict. 
>>>> 
>>>> I am seeing nextval() in your future:) 
>>>> 
>>>>>> 
>>>>>> I may also choose to revamp the serial by timestamps but don't know if 
>>>>>> the target customers would like it. 
>>>> 
>>>> I would avoid that. In my opinion timestamps are to too volatile to serve 
>>>> as a PK. If you are going to change I would go with the previous 
>>>> suggestion of UUID: 
>>>> https://www.postgresql.org/docs/11/datatype-uuid.html 
>>>> 
>>>> Not sure your customers would like that either. 
>>>> 
>>> Hi Adrian,
>>> 
>>> I think I would make them like the uuid idea.
>>> 
>>> So now what I am thinking is to first revamp the database by first removing 
>>> all the primary key constraints and then deleting all the values.
>>> 
>>> Then loop through the existing data and get uuid in that colum for every 
>>> row.
>>> 
>>> I might also require to update all the references to this value as foreign 
>>> key in related tables.
>>> 
>>> But I guess some kind of on update cascade might do well.
>>> 
>>> I know this would slow down the system, but given that this will be a one 
>>> time process for an individual user (that too if he has existing data ), I 
>>> would take that trade-off.
>>> 
>>> What do you say?
>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Regards, 
>>>>>> Krishnakant Mane, 
>>>>>> Project Founder and Leader, 
>>>>>> GNUKhata <https://gnukhata.in/> 
>>>>>> //(Opensource Accounting, Billing and Inventory Management Software)// 
>>>>> It would likely be easier to rethink your backup and restore plan. 
>>>>> Putting each restore into its own space would be one tack. 
>>>> 
>>>> 
>>> -- 
>>> Regards, 
>>> Krishnakant Mane,
>>> Project Founder and Leader,
>>> GNUKhata
>>> (Opensource Accounting, Billing and Inventory Management Software)
>> You might think about adding the new UUID column and use the existing 
>> primary key to inform the updates in dependent tables. Then remove the old 
>> PK  column and constraint followed by promoting the UUID to primary key. 
>> This could be safely scripted and applied to all instances of your data. 
>> That said, this is only truly necessary of you have production databases to 
>> worry about.
> 
> Thanks a million, this is the most logical and safe way.
> 
> yes I have a lot of production databases to worry about.
> 
> I am only confused about what you mean by "use the existing primary key to 
> inform the updates in dependent tables."
> 
> Are you refering to a cascading effect?
> 
> If yes then does it mean I first program my upgrade script to manually go 
> through all new uuid keys and update the same in the depending tables with 
> reference to the old primary key working as foreign key in those tables?
> 
> 
> 
Yes. You will need the old keys to recreate referential integrity (foreign 
keys, etc)
> -- 
> Regards, 
> Krishnakant Mane,
> Project Founder and Leader,
> GNUKhata
> (Opensource Accounting, Billing and Inventory Management Software)
Re: managing primary key conflicts while restoring data to table with existing data

Reply via email to