[google-appengine] Re: CPU time for trivial datastore roundtrip

Nick Johnson (Google) Mon, 13 Jul 2009 02:31:02 -0700

On Sun, Jul 12, 2009 at 1:39 AM, Anthony Mills<[email protected]> wrote:
>
> Nick, thanks for responding. I'll add a lot more detail and test data
> here.
>
> I'm doing three fetches: one to get the user data, one to get his
> starred items, one to get his own items. I've broken it down to
> examine each one individually. The user fetch is fast:
>
> userEntity = datastore.get(KeyFactory.createKey("User", makeUserKeyName
> (getCurrentUserId())));
>
> /service?a=g&t=u 200 56ms 27cpu_ms 12api_cpu_ms 0kb
> /service?a=g&t=u 200 35ms 21cpu_ms 12api_cpu_ms 0kb
> /service?a=g&t=u 200 26ms 15cpu_ms 12api_cpu_ms 0kb
> /service?a=g&t=u 200 49ms 15cpu_ms 12api_cpu_ms 0kb
> /service?a=g&t=u 200 37ms 15cpu_ms 12api_cpu_ms 0kb
> /service?a=g&t=u 200 27ms 15cpu_ms 12api_cpu_ms 0kb
>
> The saved query is not so fast. Each item has a list property "s"
> which is a list of all user IDs that have saved that item. It's served
> by a dedicated index, s asc + i asc.
>
> Query query = new Query("Item");
> query.addFilter("s", Query.FilterOperator.EQUAL, getCurrentUserId());
> query.addSort("i");
> List<Entity> items = datastore.prepare(query).asList(withLimit
> (REQUEST_SIZE));


How big is REQUEST_SIZE? That's going to be an important factor in how
expensive the query is.

>
> /service?a=g&t=ts 200 97ms 710cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=ts 200 79ms 711cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=ts 200 84ms 705cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=ts 200 87ms 716cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=ts 200 84ms 696cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=ts 200 72ms 697cpu_ms 678api_cpu_ms 0kb
>
> The query to get the user's items is similarly slow. It is a query on
> a string property "u", ordered by "i". It's served by a dedicated
> index, u asc + i asc.
>
> Query query = new Query("Item");
> query.addFilter("u", Query.FilterOperator.EQUAL, getCurrentUserId());
> query.addSort("i");
> List<Entity> items = datastore.prepare(query).asList(withLimit
> (REQUEST_SIZE));
>
> /service?a=g&t=tm 200 71ms 699cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=tm 200 90ms 710cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=tm 200 73ms 699cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=tm 200 67ms 692cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=tm 200 83ms 698cpu_ms 678api_cpu_ms 0kb
> /service?a=g&t=tm 200 80ms 722cpu_ms 678api_cpu_ms 0kb
>
> Adding the sort does not seem to add any extra time, and nor should
> it, given the index type.
>
> Anyway, put the three together and this is what you get:
>
> /service?a=g 200 929ms 1731cpu_ms 1368api_cpu_ms 24kb
> /service?a=g 200 159ms 1423cpu_ms 1368api_cpu_ms 24kb
> /service?a=g 200 171ms 1420cpu_ms 1368api_cpu_ms 24kb
> /service?a=g 200 178ms 1414cpu_ms 1368api_cpu_ms 24kb
> /service?a=g 200 146ms 1405cpu_ms 1368api_cpu_ms 24kb
>
> See, what really annoys me is this. Those requests are coming back in
> 170 ms. But I'm charged for 1400 ms of CPU time. How does that work?
> It should be doing blocking work on the server, right? Are there
> really nine servers all churning for that whole time?
>
> Thanks for your time,
>
> Anthony Mills
>
> On Jul 10, 11:07 am, Alex Popescu
> <[email protected]> wrote:
>> Here's some sample code (not exactly the one I've previously mentioned
>> as I haven't got the time to prepare it):
>>
>> [code]
>> query = ContentEvent.gql('ORDER BY created_at DESC')
>> event_list = query.fetch(max_results, offset)
>> # now that we've fetched the initial list we need to navigate all the
>> references
>> # to make sure that there will be no further DB hits
>> keys = []
>> for event in event_list:
>>   keys.append(event.get_refkey('entry'))
>> entries = db.get(keys)
>> for index, event in enumerate(event_list):
>>   event.entry = entries[index]
>> [/code]
>>
>> This code gets called with a max_results=20 and various offsets
>> (pagination).
>> Now the results:
>>
>> 07-10 09:51AM 01.720 /_m/cs?p=6 200 173ms 690cpu_ms 522api_cpu_ms 4kb
>> 07-10 09:50AM 57.408 /_m/cs?p=5 200 203ms 690cpu_ms 522api_cpu_ms 4kb
>> 07-10 09:50AM 52.963 /_m/cs?p=4 200 261ms 717cpu_ms 522api_cpu_ms 5kb
>> 07-10 09:50AM 48.888 /_m/cs?p=3 200 164ms 745cpu_ms 522api_cpu_ms 5kb
>> 07-10 09:50AM 42.068 /_m/cs?p=2 200 1563ms 2034cpu_ms 522api_cpu_ms
>> 6kb
>> 07-10 09:46AM 40.239 /_m/cs?p=1 200 168ms 745cpu_ms 522api_cpu_ms 5kb
>> 07-10 09:46AM 30.222 /_m/cs?p=2 200 1422ms 2089cpu_ms 522api_cpu_ms
>> 6kb
>> 07-10 09:45AM 20.582 /_m/cs 200 2718ms 2145cpu_ms 522api_cpu_ms 5kb
>>
>> As you can see the initial requests are above 2000cpu_ms. Then it
>> looks like there is some pre-fetching kicking in that pushes this time
>> down to 690-750 cpu_ms.
>>
>> To add a bit more detail to the above code: there are 2 entities one
>> of them keeping a ReferenceProperty to the other. Still, I am making
>> sure there are only 2 hits to the datastore by fetching all child
>> entities in a single get (basically there are 2 hits to the datastore
>> each returning 20 results).
>>
>> The cpu_ms range is quite huge though: 690-750cpu_ms to 2000+cpu_ms.
>>
>> I hope to get some sample code for the initial scenario and report
>> back.
>>
>> ./alex
>> --
>> .w( the_mindstorm )p.
>>   Alexandru Popescu
>>
>> On Jul 10, 7:17 pm, "Nick Johnson (Google)" <[email protected]>
>> wrote:
>>
>> > Here's the relevant ops that one of these pages 
>> > (ex:http://www.nmaps.net/172547) does:
>>
>> > self.user = model.User.get(db.Key(self.session['user']))
>>
>> > map = model.Map.get_by_map_id(int(map_id))
>>
>> > q = model.Comment.all().ancestor(map).order("-lastupdated")
>> > comments = q.fetch(COMMENTS_PER_PAGE + 1)
>>
>> > q = model.Map.all()
>> > q.filter("user =", map.user)
>> > q.filter("unlisted =", False)
>> > q.filter("float_num <", map.float_num)
>> > q.order("-float_num")
>> > other_maps = q.fetch(6)
>>
>> > As you can see, typically two gets for a single entity, along with a
>> > couple of queries - one for up to 40 entities (COMMENTS_PER_PAGE=40),
>> > one for 6.
>>
>> > -Nick Johnson
>>
>> > On Fri, Jul 10, 2009 at 5:09 PM, Alex
>>
>> > Popescu<[email protected]> wrote:
>>
>> > > Hi Nick,
>>
>> > > I'll deploy some tests to further investigate this issue and I'll
>> > > provide the code and results.
>>
>> > > Meanwhile, I'm wondering if you could share the relevant pieces of
>> > > code in your app that are tipically
>> > > below 600ms. I must confess that I haven't seen many such cases in my
>> > > app so far
>> > > (except some on which I'm doing extremely aggressive caching).
>>
>> > > ./alex
>> > > --
>> > > .w( the_mindstorm )p.
>> > >  Alexandru Popescu
>>
>> > > On Jul 10, 2:46 pm, "Nick Johnson (Google)" <[email protected]>
>> > > wrote:
>> > >> Hi Alex,
>>
>> > >> It's impossible to give a useful comment without first seeing the code
>> > >> that you're using. Speaking from my own experience, I have a personal
>> > >> site that serves reasonably datastore-intensive pages, and typically
>> > >> total CPU milliseconds doesn't exceed about 600 - that's to do several
>> > >> get requests and at least one query. So it's certainly possible to do
>> > >> useful work with the datastore and come in well below the 1300 to
>> > >> 1400ms you quote.
>>
>> > >> -Nick Johnson
>>
>> > >> On Fri, Jul 10, 2009 at 1:27 AM, Alex
>>
>> > >> Popescu<[email protected]> wrote:
>>
>> > >> > Hi guys,
>>
>> > >> > I have a cron op that is fetching some records for an entity and does
>> > >> > some processing on them.
>>
>> > >> > While checking my app logs I have noticed that for the trivial case
>> > >> > when no result is returned (and so there is no additional processing
>> > >> > done) this operation is billed constantly with something between
>> > >> > 1300ms and 1400ms.
>>
>> > >> > This basically tells me that this is the initial cost of a roundtrip
>> > >> > to the datastore (plus basic request dispatching; note: the response
>> > >> > doesn't have any data).
>>
>> > >> > Based on this I can further deduce that on a daily basis the free CPU
>> > >> > quota will allow me to run around 18k datastore roundtrips with 0
>> > >> > results.
>>
>> > >> > How do you comment on this data when compared with the public
>> > >> > announcements you've done in the past about the amount of requests an
>> > >> > app can serve based only on free quota?
>>
>> > >> > ./alex
>> > >> > --
>> > >> > .w( the_mindstorm )p.
>> > >> >  Alexandru Popescu
>>
>> > >> > PS: I have argued (and I continue to believe) that billing for CPU
>> > >> > time is completely incorrect to app engine customers and that Google
>> > >> > should look into some alternative options:
>> > >> >http://themindstorms.blogspot.com/2009/06/open-letter-to-google-app-e...
>>
>> > >> --
>> > >> Nick Johnson, App Engine Developer Programs Engineer
>> > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
>> > >> Number: 368047
>>
>> > --
>> > Nick Johnson, App Engine Developer Programs Engineer
>> > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
>> > Number: 368047
> >
>



-- 
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
Number: 368047

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: CPU time for trivial datastore roundtrip

Reply via email to