Re: paged query slow when fetching big lists

Andrus Adamchik Tue, 26 Jun 2007 00:39:58 -0700

Hi Marcin,

1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
2) SQLTemplate(Student.class, "SELECT #result('id' 'int') fromSTUDENT") without paging takes 100 sec.3) SQLTemplate(Student.class, "SELECT #result('id' 'int') fromSTUDENT") with page size 10 takes 5 sec.4) SQLTemplate(Student.class, "SELECT #result('id' 'int') fromSTUDENT") with page size 10 and fetching row data rows takes 4.5 sec.

I see you still didn't use profiler, but the data you provided seemsto finally confirm that at least on Derby not fetching all columnsdoes result in significant speedup (#1 vs. #3). So now it is aquestion of implementing the right algorithm for theIncrementalFaultList.

Andrus, you mentioned using addCustomDbAttribute to fetch only partof the data. I tried to use addCustomDbAttribute("id") on client,it resulted in returning the raw dataRows, is there something I cando to fetch faulted objects ?

We should encapsulate this logic inside IncrementalFaultList on theserver.

Our application was designed to use the SelectQuery. If we have tochange that and use the SQLTemplate instead, there is a lot of workfor us, including:


Same thing - the right thing to do is to fix it on the server.

Let me try to find a spare minute later tonight and implement id-onlyfetch. I have some large tables in a MySQL5 so I can test theperformance in a slightly different environment.


Andrus



On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:

Hi

I have done some more profiling and testing.
executing queries on table with >100000 records, directly on server(not on client) gave results as listed below:
1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
2) SQLTemplate(Student.class, "SELECT #result('id' 'int') fromSTUDENT") without paging takes 100 sec.3) SQLTemplate(Student.class, "SELECT #result('id' 'int') fromSTUDENT") with page size 10 takes 5 sec.4) SQLTemplate(Student.class, "SELECT #result('id' 'int') fromSTUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
what more, I found that executing the SQLTemplate does allow tofault the objects (I sometimes discover the simplest things last),so I did try to check how long it takes for the objects to be faulted:
1) first object on every page (except first) 30-200ms, rest = 0ms
2) objects is faulted in 20ms (average)
3) on first page first object faulted in 200ms, rest  ~20 ms,
on any following page first object faulted in 30-200ms, rest0ms (interesting that the first page does not seem to be faulted atall)
4) no point testing.
Also I did check if the resizing of the ArrayList which is keepingthe results does affect the speed, and it does not. (Tried to makethe ArrayList initial size = 150,000).
My conclusion is that SelectQuery with paging is usable only forfetching less than, say 10,000 records, otherwise the performanceis to low. With SQLTemplate the performance is much greater. Itapplies to both ROP and 'normal' cayenne, since I made those testson server.
Andrus, you mentioned using addCustomDbAttribute to fetch only partof the data. I tried to use addCustomDbAttribute("id") on client,it resulted in returning the raw dataRows, is there something I cando to fetch faulted objects ?
Our application was designed to use the SelectQuery. If we have tochange that and use the SQLTemplate instead, there is a lot of workfor us, including:- dealing with adding and concatenating Expressions to theSQLTemplate (is there an easy way ?)
- dealing with declared qualifier (the one set in modeller)
- possibly more...
i would really like to avoid all of that, so if you have any ideason how to improve the performance without too much hassle I wouldreally appreciate.
Marcin

On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
Hi Andrus
I had not much time to check, but with the fix the 100k recordsload in 30 instead of 50 seconds. It is some improvement, but notenough. I'll do some more profiling tomorrow and let you know.
By the way, we are using netbeans for profiling, the benefit : itis free. I will evaluate the yourkit as we are moving away fromnetbeans as a development platform.
Marcin

On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
Ari, Marcin --
going through the code I noticed one inefficiency - the elementsarray access is synchronized in 'fillIn' method. Since 'fillIn'is called from constructor, such synchronization is unneeded andonly slows things down. I just checked a fixed version to trunk.Could you try it out?
Andrus



On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
On 22/06/2007, at 11:10 PM, Michael Gentry wrote:
Marcin, this thread might be of interest to you ...
http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/browser
Look at the "Paging and SQL queries" thread on May 25.
Yes, this is the same project we are working on. I started someperformance profiling and Marcin has been able now to take itmuch further. What is it about:
elements.add(it.nextObjectId(entity));
which is so slow? The code gets a little complex at that pointand we are having difficulty tracing it through to the exactperformance problem in the underlying code. Is it the speed ofadding the object id to the Collection or the speed of creatingan object id itself? 0.5ms doesn't sound slow, but it doesn'tscale well.
Andrus, I got the impression from the previous thread that yoususpected something slightly different. That the performanceproblem might be in the fat query itself, but from our teststhis isn't the case. If I've got this right, the way it works is:
1. perform regular query to get all columns but return result initerator
2. iterate through first page and build full objects
3. iterate through other pages and build just objectids (this isthe slow part for us)4. when another page is fetched perform another query and fetchthose objects from the DB
So, getting just primary keys from the DB may not be any fasterif the performance problem is simply in the construction ofobjectIds. If the performance problem is in the numerousresizings of the Collection (each time it runs out of space,then it is increased by 50% or 100% in size), then the solutioncould be as simple as figuring out the size of the iterator andsizing the collection appropriately from the beginning.
Any ideas on how to discover the exact cause of the performancehit?
Ari Maniatis

Re: paged query slow when fetching big lists

Reply via email to