I just looked at the java doc for com.google.appengine.api.taskqueue.Queue<https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/taskqueue/Queue.html#leaseTasksByTag%28long,%20java.util.concurrent.TimeUnit,%20long,%20java.lang.String%29>while trying to implement a version of this, and it looks like the App Engine team was one step ahead, as it relates to the challenges with task cleanup: if you filter by tag but pass in a NULL tag value, you get all the tasks that match the oldest task's tag. This means you can a) process any missed work in aggregate batches, and b) if you happen to pull an active batch, you can still aggregate the work and minimize the impact to throughput. Pretty cool!
I think you would still want to check the time of 'cleaned up' tasks to control how deep you go, but I also see that the API provides access to the task ETA timestamp, so you don't need to add your own. On Thursday, May 3, 2012 10:56:33 AM UTC-4, Michael Hermus wrote: > > Definitely, assuming the queue maintains FIFO ordering (which the > documentation seems to indicate). > > However, I was concerned about determining how deep to go into the > pull queue during cleanup. In other words, you don't want to lease > work tasks that are part of active batches, because you won't be able > to aggregate them effectively. You can guarantee the work will get > done by adding it to a push queue (even if its not aggregated), but if > you process too many like that, it will defeat the primary purpose of > the fan-in task. You could end up with write contention as the > 'cleaned up' tasks come in at rates greater than a few per second. > > I suppose you could simply timestamp the work tasks, and stop pulling > from the queue once the timestamps pass a certain threshold. For > example, only clean up tasks that are older than 10 minutes. I was a > bit wary of using timestamps for anything after you mentioned the > potential lack of time synchronization, but in this case it wouldn't > have to be perfect, just good enough. > > Feature Idea: It would be pretty slick if you could assign a timeout > value to a pull queue task, and set a URL value such that once the > timeout passes, the task would be PUSHED to the specified URL for > handling (assuming it has not already been successfully processed). > This would make the process clean, simple, and efficient, and I > imagine there are a number of other cool uses for such a feature as > well. > > > On May 2, 5:46 pm, Brett Slatkin <[email protected]> wrote: > > On Wed, May 2, 2012 at 1:34 PM, Michael Hermus > > <[email protected]>wrote: > > > > > > Excellent, thanks! One question though: isn't there an issue similar > > > to the HRD 'Eventual Consistency' with the Task Queue? In other words, > > > there is a variable latency between queue insert and lease > > > availability that could potentially spike high enough so that the fan- > > > in task misses some work. > > > > > If this is true, we still need some sort of cleanup mechanism for a > > > robust implementation. I have several ideas for this, but wanted to > > > make sure I wasn't missing something. > > > > I think having a cron once a minute or so to fetch all tasks on the pull > > queue (regardless of tag) and re-insert corresponding push tasks is a > good > > idea to make it robust. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/HWbxp0BbmK8J. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
