Re: Partitioning a query result..

2016-12-19 Thread Andrus Adamchik
Ah ok, so sending to pubsub is the bottleneck here. An ideal solution would have been replacing the entire event DB with Kafka. There'll be fewer moving parts, you can consume the events with any number of parallel consumers and Kafka will care of spreading consumption. But as you said, this is

Re: Partitioning a query result..

2016-12-16 Thread Giaccone, Tony
So the essential bit of this that's not perhaps been exposed is that we're publishing this data to google's pubsub. The intent is to generate the events as a result of actions taken in the main application and store the event data in a new event database Then a periodic job, reads the data from t

Re: Partitioning a query result..

2016-12-16 Thread Andrus Adamchik
Actually this is an interesting architectural discussion. Speaking for myself, I certainly like having it here. The 2 main approaches have already been mentioned: 1. Single dispatcher -> message queue -> multiple workers. 2. Multiple workers that somehow guess their part of the workload. Both c

Re: Partitioning a query result..

2016-12-16 Thread Giaccone, Tony
Right so I agree with the partitioning of the database, that's a thing that can be done. Andrus, I'm a bit less confident in the proposal you're suggesting. I want to be able to spin up new instances potentially in new containers and run them in different environments. If we're moving to a cloud b

Re: Partitioning a query result..

2016-12-15 Thread Andrus Adamchik
Here is another idea: * read all data in one thread using iterated query and DataRows * append received rows to an in-memory queue (individually or in small batches) * run a thread pool of processors that read from the queue and do the work. As with all things performance, this needs to be measur

Re: Partitioning a query result..

2016-12-14 Thread John Huss
Unless your DB disk is stripped into at least four parts this won't be faster. On Wed, Dec 14, 2016 at 5:46 PM Tony Giaccone wrote: > I want to speed thing up, by running multiple instances of a job that > fetches data from a table. So that for example if I need to process 10,000 > rows > the qu

Partitioning a query result..

2016-12-14 Thread Tony Giaccone
I want to speed thing up, by running multiple instances of a job that fetches data from a table. So that for example if I need to process 10,000 rows the query runs on each instance and returns 4 sets of 2500 rows one for each instance with no duplication. My first thought in SQL was to add somet