This answer is probably way off topic for this group as it's not really
Cayenne related. But I want to give some closure to the solution so that
anyone else who ends up reading this can see the totality of the problem
and solution.

My original through was, I don't have to test for uniqueness because the
database will tell me if I violate the unique key constraint. The problem
with doing it on the java code side of persistence is that as the number of
inserted records grows so does the time a query takes to test for
uniqueness. If you're inserting a million records that starts to
accumulate.  The database has to enforce this constraint so putting in your
java code is just a duplication of effort.

That process works fine as long as you're inserting one record at a time.
In that case the lack of identifying information isn't a problem, because
you can know that the object your inserting caused the error. The solution
is log the offending row and off you go, to the next record. Simple and
direct.

Where that ability fails is when you start "batching" your objects and
commit multiple objects at once. At that point relying on the database
becomes problematic in a few ways. First the transaction fails on the first
failure. Which is fine the DB only cares about the transaction and it
failed. What to do about it, from the DB perspective, is someone eles's
problem.   From the java app perspective its possible there are more
problems in that batch and the way you solve that problem is going to be
difficult and/or time consuming. The second issue is that short of parsing
this error string there's not an easy way to figure out which insert caused
the problem. All these inserts are to the same table so just finding the ID
that caused the problem would be enough. If I had to do this.. I could. But
change the database and I need a new implementation. I hate that idea.

In the end this was a case of premature optimization.  Running a sample job
of one million lines show that without batching the app only takes about 4
minutes to insert those rows. Would I like that to be faster, I would, but
is it acceptable, yes.

Tony Giaccone

On Fri, Sep 28, 2018 at 11:17 AM Bob Schellink <sab...@gmail.com> wrote:

> Would be nice if Cayenne could pinpoint the problematic record, but I think
> that will depend on what error info the database provides.
>
> Thinking of a strategy to get speed and correctness, how about an
> optimistic strategy:
>
> Process a batch of 500 records and if commit fails only then process that
> batch records one by one ( or chunk the 500 batch into smaller batches )
> until you find the guilty object, then continue with the 500 batches again.
>
> You pay a price through with more complex code. ;-)
>
> Kind regards
>
> Bob
>
>
> On Fri, Sep 28, 2018 at 4:28 PM Tony Giaccone <t...@giaccone.org> wrote:
>
> > Yeah, that's pretty much what I ended up doing. Even reading the file
> line
> > by line and doing an insert after each object is created only made the
> run
> > time go to 4 minutes and I can live with that. What I really wanted to do
> > was find a way to recover from a larger commit. It seems that's not
> really
> > possible. The one feature that would make that failure easier to deal
> with
> > would be some kind of data value in commit error that would identify the
> > class and key value of the object that caused the commit exception. I
> > recognize that the value is there in the text, but parsing through that
> > text message to find the value is a serious hack. It would be better if
> the
> > framework included in the commit exception, the class type and the key
> > value of the entity that caused the problem.
> >
> > Now maybe in the larger scheme of things, it doesn't make sense to
> identify
> > which item in the set of items being committed caused the problem. It's
> > clear it makes sense in my use case, but in the general use case, maybe
> > not..
> >
> >
> > Tony
> >
> > On Thu, Sep 27, 2018 at 5:10 PM John Huss <johnth...@gmail.com> wrote:
> >
> > > Commit the ObjectContext after each object/row and rollback the
> > > ObjectContext on failure.
> > >
> > > On Thu, Sep 27, 2018 at 3:57 PM Tony Giaccone <t...@giaccone.org>
> wrote:
> > >
> > > > So the question isn't as much about who to manage the transaction.
> It's
> > > > more about how to recover and eliminate the offending object so that
> > the
> > > > commit can be made again.
> > > >
> > > > On Thu, Sep 27, 2018 at 3:52 PM John Huss <johnth...@gmail.com>
> wrote:
> > > >
> > > > > I'd just wrap the whole thing in a database transaction. Then
> commit
> > > your
> > > > > ObjectContexts as often as you want to, but the real DB commit
> won't
> > > > happen
> > > > > until the end.
> > > > >
> > > > > TransactionManager transactionManager =
> > > > CayenneRuntime.*getThreadInjector*
> > > > > ().getInstance(TransactionManager.*class*);
> > > > >
> > > > > transactionManager.performInTransaction(*new*
> > > > > TransactionalOperation<Void>() {
> > > > >
> > > > > @Override
> > > > >
> > > > > *public* Void perform() {
> > > > >
> > > > > *return* *null*;
> > > > >
> > > > > }
> > > > >
> > > > > });
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Sep 27, 2018 at 2:36 PM Tony Giaccone <t...@giaccone.org>
> > > wrote:
> > > > >
> > > > > > I'm processing a large number of rows, over 600,000 and the key
> > value
> > > > > > should be unique in this file but I'd like to ensure that. I also
> > > want
> > > > > this
> > > > > > to happen with some rapidity.  To speed this process upI'm going
> to
> > > > read
> > > > > > lines from the file, create objects and commit the changes after
> > 500
> > > > have
> > > > > > been created.
> > > > > >
> > > > > > The problem with this is that if I have a duplicate value I won't
> > > catch
> > > > > it
> > > > > > till I do the commit.
> > > > > >
> > > > > > When I insert a second key value the first exception is a db
> level
> > :
> > > > > > org.postgresql.util.PSQLException
> > > > > >
> > > > > > eventually this gets wrapped by a Cayenne Commit error.
> > > > > >
> > > > > > So I'd like to get a sense of what folks think. Given that I want
> > to
> > > > > > balance these conflicting goals of speed and accuracy.
> > > > > >
> > > > > > Can I easily figure out what object or objects caused the error
> and
> > > > can I
> > > > > > exclude them from the context and redo the commit? f
> > > > > >
> > > > > > Is this a reasonable path to follow.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Tony Giaccone
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to