Thanks Bert, The first link is exactly what I was looking for.
Ravneet On Jun 16, 4:52 am, Bert <[email protected]> wrote: > Hi Ravneet, > > Have you taken a look at fork join > queues?http://www.google.com/events/io/2010/sessions/high-throughput-data-pi... > or > High concurrency counters without > sharding?http://blog.notdot.net/2010/04/High-concurrency-counters-without-shar... > > I think they may do what you need and are proven solutions. > > Thanks > Rob > > On Jun 15, 9:38 pm, thecheatah <[email protected]> wrote: > > > > > > > > > This is actually a pretty good implementation. The only issue is the > > size of the processed task list. Instead of having two tasks, I am > > thinking that the one task will clean up the processed task list > > before it begins its work. Basically check that the processed inserts > > have indeed been deleted. > > > So the processed list records all the inserts processed in the > > previous run. It first deletes all those inserts if needed, then it > > goes on to process new tasks. > > > Thanks, > > > Ravneet > > > On Jun 15, 11:48 am, Ravi Sharma <[email protected]> wrote: > > > > In those scenerio you can go ahead and do something extra... > > > > Keep a list of Keys in your highly updating object, and whenevr you > > > process > > > one insert and update it into main updating object make sure you put the > > > key > > > in this object's list property.,SO your main object will know if i have > > > got > > > the content of this insert or not > > > > Say after it when you are deleting or updating insert object .. > > > then when next time you get the same insert(as it was faile when you were > > > marking it as processed), check if key exists in list.... if yes then mark > > > the insert object processed and also remove it from list property. > > > > Also then you need to have a another job which will clean the list > > > property > > > from updating object. read the object list..get the insert object for each > > > key, if they are marked as processed then remove it from this list. > > > > this will eventually increase your datatsore put but you will not have to > > > worry about some inconsistency. > > > > So you code will look like this > > > Highly updating object will have property liek this > > > List<Key> processedInserts; (in Java JDO) > > > > TASK -1 > > > 1) getNextInsert object say i1, assume its key is k1 > > > //at this atge say processedInserts is empty > > > 2) check if k1 exists in processedInserts, if no then go to step 3 else go > > > to 4 > > > 3) update Highly updating object with content of insert object i1, also > > > add > > > the k1 into processedInserts > > > //at this stage it will have k1 in processedInserts > > > 4) Update i1 as processed. > > > > Now after this we will have a growing list of processedInserts > > > property...and it has upper boud. So to keep it down. you need to have > > > another job running once in a while or submit a task depending on step2, > > > if > > > processedInserts.size > some number say 500. > > > TASK -2 > > > In this task > > > 1) getHighlyUpdatingObject > > > 2) Loop through processedInserts > > > 3) get Insert Object , if it is processed delete that key from > > > processedInserts > > > > Just make sure one of the TASK-1 and TASK2 running at one time. You can > > > even > > > run task-2 as part task-1 after step 4, upto you where you see it as safe > > > and less If then else :) > > > > On Wed, Jun 15, 2011 at 4:20 PM, thecheatah <[email protected]> wrote: > > > > Ravi, > > > > > Thanks for the feedback. I was thinking exactly along the lines of > > > > what you have said. The only problem that I see is that I plan on > > > > processing multiple inserts in one batch job. The inserts and the > > > > highly updated object will not be updatable in a single transaction. > > > > Thus, there might be situations where an insert was processed but the > > > > flag was not set or the row was not deleted. To overcome this issue, I > > > > am going to either use make sure that processing inserts multiple > > > > times does not effect the output or accept a small percentage of > > > > failures. > > > > > Ravneet > > > > > On Jun 15, 5:18 am, Ravi Sharma <[email protected]> wrote: > > > > > if A B C and are not dependent on each other and ordering doesnt > > > > > matter > > > > for > > > > > you e.g. if you process C A B..then also its fine then you can put a > > > > another > > > > > column in this insert table. say processed. > > > > > When inserting make it N(if string) or false(boolean). > > > > > > and query that entity based on this column. > > > > > Whenevr you prcoess one row, make the value Y or true. and carry on > > > > > with > > > > > next insert. > > > > > > or even you can delete these rows once you have processed them ..then > > > > > you > > > > > will not need to have extra column.... > > > > > > Note: I am considering that for one update you will be processing all > > > > > its > > > > > insert in one task or job...no mutliprocessing > > > > > > On Wed, Jun 15, 2011 at 4:20 AM, thecheatah <[email protected]> > > > > wrote: > > > > > > I am trying to implement a system for an object that will be > > > > > > updated a > > > > > > lot. The way I was thinking was to turn the updates into inserts > > > > > > then > > > > > > have a batch job that executes the inserts in batches to update the > > > > > > highly writable object. The inserts can either be sorted by time or > > > > > > by > > > > > > some sort of an incremented identifier. This identifier or timestamp > > > > > > can be stored on the highly writable object so the next time the job > > > > > > runs it knows where to start executing the next batch. > > > > > > > Using timestamp I am running into a problem with eventual > > > > > > consistency. > > > > > > When I search for inserts to execute some inserts might not make it > > > > > > into the query because they were not inserted into the index yet. So > > > > > > suppose we have insert A, B and C. If A and C make it into the batch > > > > > > job, it will mark all work up to C completed and B will never be > > > > > > executed. > > > > > > > Using incremented identifiers seems like it will solve the problem > > > > > > but > > > > > > implementing such an identifier itself is not clear. To explain why > > > > > > it > > > > > > would solve the original problem, we would be able to detect when we > > > > > > went from A to C as the difference in the identifiers would be > > > > > > greater > > > > > > then 1. The sharded counter is great for counting, but is not good > > > > > > to > > > > > > use as a unique identifier given eventual consistency. > > > > > > > I can use the memcached increment function but the counter might be > > > > > > flushed out of memory at anytime. I believe the memcache update > > > > > > speed > > > > > > should be enough for what I want to do. > > > > > > > If I had an upper bound time limit on the eventual consistency, I > > > > > > could make my system so that it only processes inserts older then > > > > > > the > > > > > > time limit. > > > > > > > Anyways those are my thoughts and any feedback is appreciated. > > > > > > > BTW: The inserts processed in batches are assumed to be not > > > > > > dependent > > > > > > on each other. > > > > > > > -- > > > > > > You received this message because you are subscribed to the Google > > > > Groups > > > > > > "Google App Engine" group. > > > > > > To post to this group, send email to > > > > > > [email protected] > > > > . > > > > > > To unsubscribe from this group, send email to > > > > > > [email protected]. > > > > > > For more options, visit this group at > > > > > >http://groups.google.com/group/google-appengine?hl=en. > > > > > -- > > > > You received this message because you are subscribed to the Google > > > > Groups > > > > "Google App Engine" group. > > > > To post to this group, send email to [email protected]. > > > > To unsubscribe from this group, send email to > > > > [email protected]. > > > > For more options, visit this group at > > > >http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
