Ravi,

Thanks for the feedback. I was thinking exactly along the lines of
what you have said. The only problem that I see is that I plan on
processing multiple inserts in one batch job. The inserts and the
highly updated object will not be updatable in a single transaction.
Thus, there might be situations where an insert was processed but the
flag was not set or the row was not deleted. To overcome this issue, I
am going to either use make sure that processing inserts multiple
times does not effect the output or accept a small percentage of
failures.

Ravneet

On Jun 15, 5:18 am, Ravi Sharma <[email protected]> wrote:
> if A B C and are not dependent on each other and ordering doesnt matter for
> you e.g. if you process C A B..then also its fine then you can put a another
> column in this insert table.  say processed.
> When inserting make it N(if string) or false(boolean).
>
> and query that entity based on this column.
> Whenevr you prcoess one row, make the value Y or true. and carry on with
> next insert.
>
> or even you can delete these rows once you have processed them ..then you
> will not need to have extra column....
>
> Note: I am considering that for one update you will be processing all its
> insert in one task or job...no mutliprocessing
>
>
>
>
>
>
>
> On Wed, Jun 15, 2011 at 4:20 AM, thecheatah <[email protected]> wrote:
> > I am trying to implement a system for an object that will be updated a
> > lot. The way I was thinking was to turn the updates into inserts then
> > have a batch job that executes the inserts in batches to update the
> > highly writable object. The inserts can either be sorted by time or by
> > some sort of an incremented identifier. This identifier or timestamp
> > can be stored on the highly writable object so the next time the job
> > runs it knows where to start executing the next batch.
>
> > Using timestamp I am running into a problem with eventual consistency.
> > When I search for inserts to execute some inserts might not make it
> > into the query because they were not inserted into the index yet. So
> > suppose we have insert A, B and C. If A and C make it into the batch
> > job, it will mark all work up to C completed and B will never be
> > executed.
>
> > Using incremented identifiers seems like it will solve the problem but
> > implementing such an identifier itself is not clear. To explain why it
> > would solve the original problem, we would be able to detect when we
> > went from A to C as the difference in the identifiers would be greater
> > then 1. The sharded counter is great for counting, but is not good to
> > use as a unique identifier given eventual consistency.
>
> > I can use the memcached increment function but the counter might be
> > flushed out of memory at anytime. I believe the memcache update speed
> > should be enough for what I want to do.
>
> > If I had an upper bound time limit on the eventual consistency, I
> > could make my system so that it only processes inserts older then the
> > time limit.
>
> > Anyways those are my thoughts and any feedback is appreciated.
>
> > BTW: The inserts processed in batches are assumed to be not dependent
> > on each other.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> > [email protected].
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to