Hi Marcin,
* fetch only Pk columns and create all ObjectIds at once, get rid
of the iterating process if possible
* use already existing method resolveInterval() to fault the
required range of records
This strategy was discussed in the May thread with Ari (the one that
Michael Gentry mentioned). My vote is +0, meaning that before we make
this change, I want to confirm first that it has a visible impact on
performance. Could you possibly make such change locally and see if
it helps? (Look at SelectQuery.addCustomDbAttribute() to only include
PK; if you have problems making this change, ping me via the dev list
- I'll try my best to help).
If the creation of ObjectId and getting the results from ResultSet
cannot be speed up (because it simply has to happen, and it does
not depend on the way it is done), the only choice will be to
implement some more complex solution using sql LIMIT statement.
I'd love to avoid that, as the data you get may as well be different
the next time you resolve a page, so you may end up with duplicates
or skipped records. If we ever go this way, we'll probably need to
make it a user choice (use LIMIT vs. IncrementalFaultList).
Andrus
On Jun 22, 2007, at 2:35 AM, Marcin Skladaniec wrote:
Hi
Recently we have found that fetching a list of 100,000 records
using ROP with paging and no cache takes a long time, about 50
seconds in our case. We have profiled the cpu usage and the result
shows that 99% of time is spent in IncrementalFaultList, within the
fillIn() method.
The fillIn method works (in my opinion) in a bit strange fashion:
it does execute query at once, stores the query result in
java.sql.ResultSet, and than iterates through the result either
creating the whole DataRow or just ObjectId. If there is a need the
DataRows are faulted at the end of the method.
From our testing it came up that this bit of code :
while (it.hasNextRow()) {
elements.add(it.nextObjectId(entity));
}
is where all the time is spent. Each iteration in this loop takes
about 0.5ms, which multiplied by 100,000 takes almost 50 seconds.
nextObjectId method consists of two parts: fetching the next result
from ResultSet and creating a ObjectId, but I was unable to check
which one takes the most time, anyway I think that this approach is
somewhat wrong, since always 99% of the records will be fetched as
ObjectId and never faulted, so my ideas to enhance this are:
* fetch only Pk columns and create all ObjectIds at once, get rid
of the iterating process if possible
* use already existing method resolveInterval() to fault the
required range of records
If the creation of ObjectId and getting the results from ResultSet
cannot be speed up (because it simply has to happen, and it does
not depend on the way it is done), the only choice will be to
implement some more complex solution using sql LIMIT statement.
I would like to mention that we are using some DataContext
decorators and life-cycle callbacks, but I don't believe those are
important factors in this case.
Whatever is the solution, i think it is pretty crucial that it will
be implemented soon, since the usability of the ROP without fast
paging is rather low.
With regards
Marcin
-------------------------->
ish
http://www.ish.com.au
Level 1, 30 Wilson Street Newtown 2042 Australia
phone +61 2 9550 5001 fax +61 2 9550 4001