Just to be clear, I don't think you could build a running percentile from a 
deque.

The potential solution is to build an indexed skip list from the MCAS 
primitive in memcache-collections.  An indexed skip list has O(log n) 
inserts and indexed lookups.  For median you'd want to index the N/2 entry.

I'm not saying this is trivial work-- but I've been meaning to try this out 
anyway and may be able to help.

The main thing I'm trying to find out from you is whether you're in Python 
or not.  This library is Python only.


On Thursday, November 21, 2013 7:38:54 AM UTC-8, Mathieu Simard wrote:
>
> John,
>
> Your solution has a lot of potential.
> Especially since you distribute the load across the keyspace.
>
> My only concern is the deque create/bind method.
> Shouldn't it be a single method since multiple GAE instances could try to 
> create the deque concurrently?
>
> On Thursday, November 21, 2013 9:18:08 AM UTC-5, Mathieu Simard wrote:
>>
>> I'm having a look at your project right now John.
>> This is quite interesting.
>>
>> On Wednesday, November 20, 2013 12:42:20 PM UTC-5, John Belmonte wrote:
>>>
>>> You may have missed my message behind Jeff's-- consistent data 
>>> structures can be done on memcache, directly in GAE.  If you're using 
>>> Python I might be able to put together an example which computes a running 
>>> median.
>>>
>>>
>>> On Tuesday, November 19, 2013 12:44:49 PM UTC-8, Mathieu Simard wrote:
>>>>
>>>> Thanks Jeff for your balanced insight.
>>>>
>>>> I'm not treating GAE as an all or nothing. Not at all.
>>>> The options you describe were all discussed, I was just wondering if 
>>>> someone had drawn a clever trick to resolve this problem directly in GAE.
>>>>
>>>> Please next time take a few minutes to read the whole thread.
>>>>
>>>> On Tuesday, November 19, 2013 3:33:19 PM UTC-5, Jeff Schnitzer wrote:
>>>>>
>>>>> Yikes, most of these suggestions are crazy complicated. Even redis 
>>>>> seems excessive.
>>>>>
>>>>> Write a little server that accepts values, stores them in memory, 
>>>>> rolls off old values, and answers requests for the median. Run it on GCE 
>>>>> or 
>>>>> linode or whatever. This should be a few hours of programming, and even a 
>>>>> simplistic implementation in Java should be able to handle hundreds of 
>>>>> QPS. 
>>>>> Even if you hit capacity, you can always start sampling. Or you can 
>>>>> federate.
>>>>>
>>>>> Don't treat GAE as "all or nothing".
>>>>>
>>>>> Jeff
>>>>>
>>>>>
>>>>> On Tue, Nov 19, 2013 at 6:50 PM, John Belmonte <[email protected]>wrote:
>>>>>
>>>>>> Hi Mathieu,
>>>>>>
>>>>>> I've been toying with concurrent data structures on memcache.  I 
>>>>>> haven't gotten as far as an (indexed) skiplist yet, but I believe it's 
>>>>>> possible-- and that structure is efficient for maintaining a running 
>>>>>> percentile.  Is your app in Python by any chance?
>>>>>>
>>>>>>   
>>>>>> http://github.com/google/memcache-collections<http://www.google.com/url?q=http%3A%2F%2Fgithub.com%2Fgoogle%2Fmemcache-collections&sa=D&sntz=1&usg=AFQjCNFl_pUmkHJPTAZWTNOvAyWhh8fIMg>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Friday, November 15, 2013 11:29:11 AM UTC-8, Mathieu Simard wrote:
>>>>>>>
>>>>>>> I decided to settle with a solution that will do the trick for now.
>>>>>>>
>>>>>>> I'm building heaps of data per metrics per system in memcache and 
>>>>>>> just accepting the fact that sometimes I will not be able to compute my 
>>>>>>> results.
>>>>>>> I've no guarantee of consistency, since now all my writes in 
>>>>>>> memcache can overwrite others.
>>>>>>>
>>>>>>> It's far from perfect, but it does the trick quite well at a 
>>>>>>> fraction of the cost.
>>>>>>> If flush rates get too intense I'll simply upgrade to a dedicated 
>>>>>>> memcache.
>>>>>>>
>>>>>>> I will post an update if I hit a wall with that approach.
>>>>>>>
>>>>>>> On Friday, November 15, 2013 2:10:39 PM UTC-5, Kaan Soral wrote:
>>>>>>>>
>>>>>>>> This seems like a bad idea, you could just add it as a pull-queue 
>>>>>>>> task, instances can die pretty easily
>>>>>>>>
>>>>>>>> You also assumed that the data is small enough to process/store 
>>>>>>>> easily, however this doesn't seem to be the case
>>>>>>>>
>>>>>>>> On Friday, November 15, 2013 9:04:58 PM UTC+2, Stephen wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Nov 12, 2013 at 8:07 PM, Mathieu Simard <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Since there is no appengine solution available such as the Redis 
>>>>>>>>>> atomic list, I'm left wondering how to implement a cost effective 
>>>>>>>>>> rolling 
>>>>>>>>>> median.
>>>>>>>>>> Has anyone come up with a solution that would be more convenient 
>>>>>>>>>> than running a redis instance on Google Compute Engine?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - batch data in front-end instance memory
>>>>>>>>>  - flush data to a pull queue every time window
>>>>>>>>> - run a task every time window to gather all data batches from 
>>>>>>>>> pull queue
>>>>>>>>> - merge data, compute moving median, write result to data store
>>>>>>>>>
>>>>>>>>>  Instances will be started as more data is submitted. Tune the 
>>>>>>>>> frequency of the calculation so that the size of the pending data 
>>>>>>>>> batches 
>>>>>>>>> does not overwhelm a single task.
>>>>>>>>>
>>>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Google App Engine" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at http://groups.google.com/group/google-appengine.
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to