Right so I agree with the partitioning of the database, that's a thing that can be done.
Andrus, I'm a bit less confident in the proposal you're suggesting. I want to be able to spin up new instances potentially in new containers and run them in different environments. If we're moving to a cloud based infrastructure, then paralyzing in a single app doesn't match up with that kind of deployment. I recognize there are limits on my solution as well. You have to deal with how you split up the rows into partitions. The problem generally stated is. If I have 10,000 records and I want to distribute them across N number of workers. How do I do that? How can I partition the result set at run time, into an arbitrary number of workers? I also realize this is quickly expanding out side the scope of the cayenne users mailing list. On Thu, Dec 15, 2016 at 3:18 AM, Andrus Adamchik <and...@objectstyle.org> wrote: > Here is another idea: > > * read all data in one thread using iterated query and DataRows > * append received rows to an in-memory queue (individually or in small > batches) > * run a thread pool of processors that read from the queue and do the work. > > As with all things performance, this needs to be measured and compared > with a single-threaded base line. This will not help with IO bottleneck, > but the processing part will happen in parallel. If you see any Cayenne > bottlenecks during the last step, you can start multiple ServerRuntimes - > one per thread. > > Andrus > > > On Dec 15, 2016, at 3:06 AM, John Huss <johnth...@gmail.com> wrote: > > > > Unless your DB disk is stripped into at least four parts this won't be > > faster. > > On Wed, Dec 14, 2016 at 5:46 PM Tony Giaccone <tgiacc...@gmail.com> > wrote: > > > >> I want to speed thing up, by running multiple instances of a job that > >> fetches data from a table. So that for example if I need to process > 10,000 > >> rows > >> the query runs on each instance and returns 4 sets of 2500 rows one for > >> each instance with no duplication. > >> > >> My first thought in SQL was to add something like this to the where > >> clause.. > >> > >> and MOD(ID, INSTANCE_COUNT) == INSTANCE_ID; > >> > >> so that if the instance count was 4 then the instance IDs would run > >> 0,1,2,3. > >> > >> I'm not quite sure how you would structure that using the queryAPI. Any > >> suggestions about that? > >> > >> And there are some problems with this idea, as you have to be certain > your > >> IDs increase in a manner that aligns with your math so that the > >> partitioning is equal in size. > >> For example if your sequence increments by 20, then you would have to > futz > >> around with your math to get the right partitioning and that is the > problem > >> with this technique. > >> It's brittle it depends on getting a bunch of things in "sync". > >> > >> Does anyone have another idea of how to segment out rows that would > yield a > >> solution that's not quite so brittle? > >> > >> > >> > >> Tony Giaccone > >> > >