Hi, summarizing: we had a table that had an OID column, referencing an object in pg_largeobject. This was mapped to a (Java) entity with a byte array field, annotated with @Lob. The problem was that we were fetching thousands of these entities in one go, and LOB fetching is not batched by Hibernate/JDBC (so each row is fetched separately). Because we were abusing LOBs (they were small, often less than 10 kB), we have chosen to move the binary blobs from the LO table to a simple bytea column. So the entity that had a byte array field mapped to an OID column now has a byte array field mapped to a bytea column, and we have manually moved data from the LO table to the bytea column. Now Hibernate/JDBC fetches all the content we need in batches. Random benchmark: fetching 20k rows used to take 7 seconds (250 msec query execution time, 6.7 sec for transfer) and now it takes 1.5 seconds (250 msec query + 1.3 sec transfer).
Regards, Mate On Thu, Sep 6, 2018 at 10:56 AM Dave Cramer <[email protected]> wrote: > Hi > > Can you be more explicit how you fixed the problem ? > > Thanks > Dave Cramer > > [email protected] > www.postgresintl.com > > > On Thu, 6 Sep 2018 at 03:46, Mate Varga <[email protected]> wrote: > >> After inlining the data, performance issues have been solved. Thanks for >> the help. >> >> On Mon, Sep 3, 2018 at 9:57 PM Mate Varga <[email protected]> wrote: >> >>> Thanks, >>> 1) we'll try to move stuff out from LOBs >>> 2) we might raise a PR for the JDBC driver >>> >>> Mate >>> >>> On Mon, 3 Sep 2018, 19:35 Dave Cramer, <[email protected]> wrote: >>> >>>> >>>> >>>> On Mon, 3 Sep 2018 at 13:00, Mate Varga <[email protected]> wrote: >>>> >>>>> More precisely: when fetching 10k rows, JDBC driver just does a large >>>>> bunch of socket reads. With lobs, it's ping-pong: one read, one write per >>>>> lob... >>>>> >>>>> >>>> Ok, this is making more sense. In theory we could fetch them all but >>>> since they are LOB's we could run out of memory. >>>> >>>> Not sure what to tell you at this point. I'd entertain a PR if you were >>>> motivated. >>>> >>>> Dave Cramer >>>> >>>> [email protected] >>>> www.postgresintl.com >>>> >>>> >>>> >>>>> >>>>> On Mon, Sep 3, 2018 at 6:54 PM Mate Varga <[email protected]> wrote: >>>>> >>>>>> So I have detailed profiling results now. Basically it takes very >>>>>> long that for each blob, the JDBC driver reads from the socket then it >>>>>> creates the byte array on the Java side. Then it reads the next blob, >>>>>> etc. >>>>>> I guess this takes many network roundtrips. >>>>>> >>>>>> On Mon, Sep 3, 2018 at 5:58 PM Dave Cramer <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> On Mon, 3 Sep 2018 at 10:48, Mate Varga <[email protected]> wrote: >>>>>>> >>>>>>>> That's 1690 msec (1.69 seconds, and that is how long it takes to >>>>>>>> fetch 20k (small-ish) rows without LOBs (LOBs are a few lines below on >>>>>>>> the >>>>>>>> screenshot) >>>>>>>> >>>>>>> >>>>>>> that sound high as well! >>>>>>> >>>>>>> Something isn't adding up.. >>>>>>> >>>>>>> >>>>>>> Dave Cramer >>>>>>> >>>>>>> [email protected] >>>>>>> www.postgresintl.com >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 3, 2018 at 4:40 PM Dave Cramer <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> the one you have highlighted ~1.69ms >>>>>>>>> >>>>>>>>> Dave Cramer >>>>>>>>> >>>>>>>>> [email protected] >>>>>>>>> www.postgresintl.com >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, 3 Sep 2018 at 10:38, Mate Varga <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> Which frame do you refer to? >>>>>>>>>> >>>>>>>>>> On Mon, Sep 3, 2018 at 3:57 PM Dave Cramer <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Not sure why reading from a socket is taking 1ms ? >>>>>>>>>>> >>>>>>>>>>> Dave Cramer >>>>>>>>>>> >>>>>>>>>>> [email protected] >>>>>>>>>>> www.postgresintl.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, 3 Sep 2018 at 09:39, Mate Varga <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> https://imgur.com/a/ovsJPRv -- I've uploaded the profiling >>>>>>>>>>>> info (as an image, sorry). It seems this is a JDBC-level problem. I >>>>>>>>>>>> understand that the absolute timing is not meaningful at all >>>>>>>>>>>> because you >>>>>>>>>>>> don't know how large the resultset is, but I can tell that this is >>>>>>>>>>>> only a >>>>>>>>>>>> few thousands rows + few thousand largeobjects, each largeobject >>>>>>>>>>>> is around >>>>>>>>>>>> 1 kByte. (Yes I know this is not a proper use of LOBs -- it's a >>>>>>>>>>>> legacy db >>>>>>>>>>>> structure that's hard to change.) >>>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Mate >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 3, 2018 at 11:52 AM Mate Varga <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey, >>>>>>>>>>>>> >>>>>>>>>>>>> we'll try to test this with pure JDBC versus hibernate. Thanks! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 3, 2018 at 11:48 AM Dave Cramer <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, 3 Sep 2018 at 03:55, Mate Varga <[email protected]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Basically there's a class with a byte[] field, the class is >>>>>>>>>>>>>>> mapped to table T and the byte field is annotated with @Lob so >>>>>>>>>>>>>>> it goes to >>>>>>>>>>>>>>> the pg_largeobject table. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ah, so hibernate is in the mix. I wonder if that is causing >>>>>>>>>>>>>> some challenges ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The DB is on separate host but relatively close to the app, >>>>>>>>>>>>>>> and I can reproduce the problem locally as well. One >>>>>>>>>>>>>>> interesting bit is >>>>>>>>>>>>>>> that turning of SSL between the app and PSQL speeds up things >>>>>>>>>>>>>>> by at least >>>>>>>>>>>>>>> 50%. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ah, one addition -- the binary objects are encrypted, so >>>>>>>>>>>>>>> their entropy is very high. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any chance you could write a simple non-hibernate test code >>>>>>>>>>>>>> to time the code ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Dave Cramer >>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> www.crunchydata.ca >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mate >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Sep 2, 2018 at 12:55 AM Dave Cramer < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, 31 Aug 2018 at 10:15, Mate Varga <[email protected]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I see -- we could try that, though we're mostly using an >>>>>>>>>>>>>>>>> ORM (Hibernate) to do this. Thanks! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Aug 31, 2018 at 3:57 PM Dmitry Igrishin < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> пт, 31 авг. 2018 г. в 16:35, Mate Varga <[email protected] >>>>>>>>>>>>>>>>>> >: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > we're fetching binary data from pg_largeobject table. >>>>>>>>>>>>>>>>>> The data is not very large, but we ended up storing it >>>>>>>>>>>>>>>>>> there. If I'm >>>>>>>>>>>>>>>>>> copying the data to a file from the psql console, then it >>>>>>>>>>>>>>>>>> takes X time >>>>>>>>>>>>>>>>>> (e.g. a second), fetching it through the JDBC driver takes >>>>>>>>>>>>>>>>>> at least 10x >>>>>>>>>>>>>>>>>> more. We don't see this difference between JDBC and 'native' >>>>>>>>>>>>>>>>>> performance >>>>>>>>>>>>>>>>>> for anything except largeobjects (and bytea columns, for the >>>>>>>>>>>>>>>>>> record). >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Does anyone have any advice about whether this can be >>>>>>>>>>>>>>>>>> tuned or what the cause is? >>>>>>>>>>>>>>>>>> I don't know what a reason of that, but I think it's >>>>>>>>>>>>>>>>>> reasonable and >>>>>>>>>>>>>>>>>> quite simple to call lo_import()/lo_export() via JNI. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can't imagine that's any faster. The driver simply >>>>>>>>>>>>>>>> implements the protocol >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do you have any code to share ? Any other information ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is the JDBC connection significantly further away network >>>>>>>>>>>>>>>> wise ? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dave Cramer >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>>> www.postgresintl.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>
