Re: How to perform a long running dry run transaction without blocking

Adrian Klaver Fri, 07 Feb 2025 09:01:24 -0800

On 2/7/25 05:49, Robert Leach wrote:

Ccing list

Alright I am trying to reconcile this with from below, 'The largest studies 
take just under a minute'.

The context of the 'The largest studies take just under a minute' statement is 
that it's not loading the hefty/time-consuming raw data.   It's only validating 
the metadata.  That's fast (5-60s).  And that data is a portion of the 
transaction in the back-end load.  There are errors that validation can miss 
that are due to not touching the raw data, and in fact, those errors are 
addressed by curators editing the excel sheets.  That's why it's all in the 
load transaction instead of


As a scientist that makes me start to twitch.

Is there an audit trail for that?


We have a well defined curation process.  Original files are checked into a 
private data repo.  There is a CHANGES.md files that details any/every change a 
curator makes, and the curator coordinates these changes with the researcher.

Aah, time travel.


Sorry, "anticipate" wasn't the best word choice.  What I was thinking was more along the 
lines of "detect" when a block happens and at that point, either stop and queue up the 
validation or provide a status/progress that shows it's waiting on validations in line before it.  
Probably not possible.  It's just what I'm imagining would be the most efficient strategy with the 
least wait time.

Other documentation I read referred to the state of the DB (when a transaction starts) as 
a "snapshot" and I thought... what if I could save such a snapshot 
automatically just *before* a back-end load starts, and use that snapshot for validation, 
such that my validation processes could use that to validate against and not encounter 
any locks?  The validation will never commit, so there's no risk.


Hmm. I don't think so.


 From a separate thread, which seems analogous...:

Seems to me this could be dealt with using a schema named validate that 
contains 'shadow' tables of those in the live schema(s). Import into their and 
see what fails.


What is a "shadow table"?  Is that a technical thing?  Could these shadow 
tables be in the same database?  (Trying to wrap my head around what this implies is 
possible.)

No a concept of my own making. Basically create copies of the tables youare loading into now in a separate schema and load into them instead.Then do the validation against those tables, that would reduce thecontention with the existing data. It is something I have done tovalidate data from another data source before loading into productiontables. A possible issue to deal with is how much, if at all, youdepend on other tables for the validation process.



--
Adrian Klaver
adrian.kla...@aklaver.com

Re: How to perform a long running dry run transaction without blocking

Reply via email to