[google-appengine] Re: Creating a highly writable object

thecheatah Fri, 17 Jun 2011 06:02:26 -0700

Thanks Bert,

The first link is exactly what I was looking for.


Ravneet

On Jun 16, 4:52 am, Bert <[email protected]> wrote:
> Hi Ravneet,
>
> Have you taken a look at fork join 
> queues?http://www.google.com/events/io/2010/sessions/high-throughput-data-pi...
> or
> High concurrency counters without 
> sharding?http://blog.notdot.net/2010/04/High-concurrency-counters-without-shar...
>
> I think they may do what you need and are proven solutions.
>
> Thanks
> Rob
>
> On Jun 15, 9:38 pm, thecheatah <[email protected]> wrote:
>
>
>
>
>
>
>
> > This is actually a pretty good implementation. The only issue is the
> > size of the processed task list. Instead of having two tasks, I am
> > thinking that the one task will clean up the processed task list
> > before it begins its work. Basically check that the processed inserts
> > have indeed been deleted.
>
> > So the processed list records all the inserts processed in the
> > previous run. It first deletes all those inserts if needed, then it
> > goes on to process new tasks.
>
> > Thanks,
>
> > Ravneet
>
> > On Jun 15, 11:48 am, Ravi Sharma <[email protected]> wrote:
>
> > > In those scenerio you can go ahead and do something extra...
>
> > > Keep a list of Keys in your highly updating object, and whenevr you 
> > > process
> > > one insert and update it into main updating object make sure you put the 
> > > key
> > > in this object's list property.,SO your main object will know if i have 
> > > got
> > > the content of this insert or not
>
> > > Say after it when you are deleting or updating insert object ..
> > > then when next time you get the same insert(as it was faile when you were
> > > marking it as processed), check if key exists in list.... if yes then mark
> > > the insert object processed and also remove it from list property.
>
> > > Also then you need to have a another job which will clean the list 
> > > property
> > > from updating object. read the object list..get the insert object for each
> > > key, if they are marked as processed then remove it from this list.
>
> > > this will eventually increase your datatsore put but you will not have to
> > > worry about some inconsistency.
>
> > > So you code will look like this
> > > Highly updating object will have property liek this
> > > List<Key> processedInserts; (in Java JDO)
>
> > > TASK -1
> > > 1) getNextInsert  object say i1,  assume its key is k1
> > > //at this atge say processedInserts is empty
> > > 2) check if k1 exists in processedInserts, if no then go to step 3 else go
> > > to 4
> > > 3) update Highly updating object with content of insert object i1, also 
> > > add
> > > the k1 into processedInserts
> > > //at this stage it will have k1 in processedInserts
> > > 4) Update i1 as processed.
>
> > > Now after this we will have a growing list of processedInserts
> > > property...and it has upper boud. So to keep it down. you need to have
> > > another job running once in a while or submit a task depending on step2, 
> > > if
> > > processedInserts.size > some number say 500.
> > > TASK -2
> > > In this task
> > > 1) getHighlyUpdatingObject
> > > 2) Loop through processedInserts
> > > 3) get Insert Object , if it is processed delete that key from
> > > processedInserts
>
> > > Just make sure one of the TASK-1 and TASK2 running at one time. You can 
> > > even
> > > run task-2 as part task-1 after step 4, upto you where you see it as safe
> > > and less If then else :)
>
> > > On Wed, Jun 15, 2011 at 4:20 PM, thecheatah <[email protected]> wrote:
> > > > Ravi,
>
> > > > Thanks for the feedback. I was thinking exactly along the lines of
> > > > what you have said. The only problem that I see is that I plan on
> > > > processing multiple inserts in one batch job. The inserts and the
> > > > highly updated object will not be updatable in a single transaction.
> > > > Thus, there might be situations where an insert was processed but the
> > > > flag was not set or the row was not deleted. To overcome this issue, I
> > > > am going to either use make sure that processing inserts multiple
> > > > times does not effect the output or accept a small percentage of
> > > > failures.
>
> > > > Ravneet
>
> > > > On Jun 15, 5:18 am, Ravi Sharma <[email protected]> wrote:
> > > > > if A B C and are not dependent on each other and ordering doesnt 
> > > > > matter
> > > > for
> > > > > you e.g. if you process C A B..then also its fine then you can put a
> > > > another
> > > > > column in this insert table.  say processed.
> > > > > When inserting make it N(if string) or false(boolean).
>
> > > > > and query that entity based on this column.
> > > > > Whenevr you prcoess one row, make the value Y or true. and carry on 
> > > > > with
> > > > > next insert.
>
> > > > > or even you can delete these rows once you have processed them ..then 
> > > > > you
> > > > > will not need to have extra column....
>
> > > > > Note: I am considering that for one update you will be processing all 
> > > > > its
> > > > > insert in one task or job...no mutliprocessing
>
> > > > > On Wed, Jun 15, 2011 at 4:20 AM, thecheatah <[email protected]>
> > > > wrote:
> > > > > > I am trying to implement a system for an object that will be 
> > > > > > updated a
> > > > > > lot. The way I was thinking was to turn the updates into inserts 
> > > > > > then
> > > > > > have a batch job that executes the inserts in batches to update the
> > > > > > highly writable object. The inserts can either be sorted by time or 
> > > > > > by
> > > > > > some sort of an incremented identifier. This identifier or timestamp
> > > > > > can be stored on the highly writable object so the next time the job
> > > > > > runs it knows where to start executing the next batch.
>
> > > > > > Using timestamp I am running into a problem with eventual 
> > > > > > consistency.
> > > > > > When I search for inserts to execute some inserts might not make it
> > > > > > into the query because they were not inserted into the index yet. So
> > > > > > suppose we have insert A, B and C. If A and C make it into the batch
> > > > > > job, it will mark all work up to C completed and B will never be
> > > > > > executed.
>
> > > > > > Using incremented identifiers seems like it will solve the problem 
> > > > > > but
> > > > > > implementing such an identifier itself is not clear. To explain why 
> > > > > > it
> > > > > > would solve the original problem, we would be able to detect when we
> > > > > > went from A to C as the difference in the identifiers would be 
> > > > > > greater
> > > > > > then 1. The sharded counter is great for counting, but is not good 
> > > > > > to
> > > > > > use as a unique identifier given eventual consistency.
>
> > > > > > I can use the memcached increment function but the counter might be
> > > > > > flushed out of memory at anytime. I believe the memcache update 
> > > > > > speed
> > > > > > should be enough for what I want to do.
>
> > > > > > If I had an upper bound time limit on the eventual consistency, I
> > > > > > could make my system so that it only processes inserts older then 
> > > > > > the
> > > > > > time limit.
>
> > > > > > Anyways those are my thoughts and any feedback is appreciated.
>
> > > > > > BTW: The inserts processed in batches are assumed to be not 
> > > > > > dependent
> > > > > > on each other.
>
> > > > > > --
> > > > > > You received this message because you are subscribed to the Google
> > > > Groups
> > > > > > "Google App Engine" group.
> > > > > > To post to this group, send email to 
> > > > > > [email protected]
> > > > .
> > > > > > To unsubscribe from this group, send email to
> > > > > > [email protected].
> > > > > > For more options, visit this group at
> > > > > >http://groups.google.com/group/google-appengine?hl=en.
>
> > > > --
> > > > You received this message because you are subscribed to the Google 
> > > > Groups
> > > > "Google App Engine" group.
> > > > To post to this group, send email to [email protected].
> > > > To unsubscribe from this group, send email to
> > > > [email protected].
> > > > For more options, visit this group at
> > > >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: Creating a highly writable object

Reply via email to