Hello,
which version includes this corrections? 2.0.3 or 3.0?
Thanks,
Borut
On 26.6.2007 20:32, Andrus Adamchik wrote:
Hi Marcin,
I have good news (I think). Fetching just id columns inside the
IncrementalFaultList indeed speeds things up significantly. I just
committed the change to optimize SelectQueries to do just that. Please
let me know how does it work for you.
Now the profiling details...
* I profiled on Derby and MySQL. In both cases fetching a table with 25
columns and 100000 rows took between 3-4 seconds (not as long as in
your case, but what's important is relative times I guess)
* YourKit clearly showed the bottleneck: ~95% of the 'fillIn' method is
spent in the driver code, rewinding the result set (i.e. brining the
data from db to the client).
* After my change the query time went down to 0.2-0.5 second (0.8 if
you consider the second query needed to fault the first page). Not bad!
* ResultSet reading still remained a bottleneck, but it became faster
in absolute terms. And now finally Cayenne-related code (such as
DataRow creation) started to show up on the radar (e.g. DataRow
constructor taking 3% of the 'fillIn' method time).
Andrus
On Jun 26, 2007, at 10:55 AM, Marcin Skladaniec wrote:
Hi Andrus !
Many thanks for that !
Marcin
On 26/06/2007, at 5:39 PM, Andrus Adamchik wrote:
Hi Marcin,
1) SelectQuery(Student.class) with page
size 10 takes 30-50 seconds.
2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
STUDENT") without paging takes 100 sec.
3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
STUDENT") with page size 10 takes 5 sec.
4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
I see you still didn't use profiler, but the data you provided seems to
finally confirm that at least on Derby not fetching all columns does
result in significant speedup (#1 vs. #3). So now it is a question of
implementing the right algorithm for the IncrementalFaultList.
Andrus, you mentioned using
addCustomDbAttribute to fetch only part of the data. I tried to use
addCustomDbAttribute("id") on client, it resulted in returning the raw
dataRows, is there something I can do to fetch faulted objects ?
We should encapsulate this logic inside IncrementalFaultList on the
server.
Our application was designed to use the
SelectQuery. If we have to change that and use the SQLTemplate instead,
there is a lot of work for us, including:
Same thing - the right thing to do is to fix it on the server.
Let me try to find a spare minute later tonight and implement id-only
fetch. I have some large tables in a MySQL5 so I can test the
performance in a slightly different environment.
Andrus
On Jun 26, 2007, at 9:26 AM, Marcin Skladaniec wrote:
Hi
I have done some more profiling and testing.
executing queries on table with >100000 records, directly on server
(not on client) gave results as listed below:
1) SelectQuery(Student.class) with page size 10 takes 30-50 seconds.
2) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
STUDENT") without paging takes 100 sec.
3) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
STUDENT") with page size 10 takes 5 sec.
4) SQLTemplate(Student.class, "SELECT #result('id' 'int') from
STUDENT") with page size 10 and fetching row data rows takes 4.5 sec.
what more, I found that executing the SQLTemplate does allow to fault
the objects (I sometimes discover the simplest things last), so I did
try to check how long it takes for the objects to be faulted:
1) first object on every page (except first) 30-200ms, rest = 0ms
2) objects is faulted in 20ms (average)
3) on first page first object faulted in 200ms, rest ~20 ms,
on any following page first object faulted in 30-200ms, rest 0ms
(interesting that the first page does not seem to be faulted at all)
4) no point testing.
Also I did check if the resizing of the ArrayList which is keeping the
results does affect the speed, and it does not. (Tried to make the
ArrayList initial size = 150,000).
My conclusion is that SelectQuery with paging is usable only for
fetching less than, say 10,000 records, otherwise the performance is to
low. With SQLTemplate the performance is much greater. It applies to
both ROP and 'normal' cayenne, since I made those tests on server.
Andrus, you mentioned using addCustomDbAttribute to fetch only part of
the data. I tried to use addCustomDbAttribute("id") on client, it
resulted in returning the raw dataRows, is there something I can do to
fetch faulted objects ?
Our application was designed to use the SelectQuery. If we have to
change that and use the SQLTemplate instead, there is a lot of work for
us, including:
- dealing with adding and concatenating Expressions to the SQLTemplate
(is there an easy way ?)
- dealing with declared qualifier (the one set in modeller)
- possibly more...
i would really like to avoid all of that, so if you have any ideas on
how to improve the performance without too much hassle I would really
appreciate.
Marcin
On 25/06/2007, at 8:31 PM, Marcin Skladaniec wrote:
Hi Andrus
I had not much time to check, but with the fix the 100k records load in
30 instead of 50 seconds. It is some improvement, but not enough. I'll
do some more profiling tomorrow and let you know.
By the way, we are using netbeans for profiling, the benefit : it is
free. I will evaluate the yourkit as we are moving away from netbeans
as a development platform.
Marcin
On 23/06/2007, at 5:38 PM, Andrus Adamchik wrote:
Ari, Marcin --
going through the code I noticed one inefficiency - the elements array
access is synchronized in 'fillIn' method. Since 'fillIn' is called
from constructor, such synchronization is unneeded and only slows
things down. I just checked a fixed version to trunk. Could you try it
out?
Andrus
On Jun 23, 2007, at 12:33 AM, Aristedes Maniatis wrote:
On 22/06/2007, at 11:10 PM, Michael
Gentry wrote:
Marcin, this thread might be of
interest to you ...
http://mail-archives.apache.org/mod_mbox/cayenne-dev/200705.mbox/browser
Look at the "Paging and SQL queries" thread on May 25.
Yes, this is the same project we are working on. I started some
performance profiling and Marcin has been able now to take it much
further. What is it about:
elements.add(it.nextObjectId(entity));
which is so slow? The code gets a little complex at that point and we
are having difficulty tracing it through to the exact performance
problem in the underlying code. Is it the speed of adding the object id
to the Collection or the speed of creating an object id itself? 0.5ms
doesn't sound slow, but it doesn't scale well.
Andrus, I got the impression from the previous thread that you
suspected something slightly different. That the performance problem
might be in the fat query itself, but from our tests this isn't the
case. If I've got this right, the way it works is:
1. perform regular query to get all columns but return result in
iterator
2. iterate through first page and build full objects
3. iterate through other pages and build just objectids (this is the
slow part for us)
4. when another page is fetched perform another query and fetch those
objects from the DB
So, getting just primary keys from the DB may not be any faster if the
performance problem is simply in the construction of objectIds. If the
performance problem is in the numerous resizings of the Collection
(each time it runs out of space, then it is increased by 50% or 100% in
size), then the solution could be as simple as figuring out the size of
the iterator and sizing the collection appropriately from the
beginning.
Any ideas on how to discover the exact cause of the performance hit?
Ari Maniatis
--

|