Re: [google-appengine] Re: Updated App Engine Pricing FAQ!

Robert Kluin Fri, 15 Jul 2011 07:24:29 -0700

Has anyone actually run (Python) benchmarks of either on App Engine?

I think the Hessian lib might work, or probably would with minor
changes.  I haven't actually tried it though.  I'm not sure the pure
Python version of MessagePack buys you much, I think when I glanced
over the code it is really quite similar to protobuf.




Robert





On Thu, Jul 14, 2011 at 17:44, Jeff Schnitzer <[email protected]> wrote:
> Also, there are Python implementations of Hessian and MessagePack.
> Jeff
>
> On Thu, Jul 14, 2011 at 11:33 AM, johnP <[email protected]> wrote:
>>
>> Very relevant analysis to what I'm working on right now.  Now a
>> question: is the slowness of Pickle in this case in packing and
>> unpacking the dictionary, or is it in serializing and unserializing
>> the db.Key() objects?
>>
>> In other words:  would pickle work faster if it was dumping and
>> loading a dictionary containing str(db.Key()) rather than db.Key()?
>> And is there a relevant a performance gain by first converting all the
>> keys to str, then pickling; and vice-versa?
>>
>> I'm thinking back to a series of Blog Posts by Nick Johnson a while
>> back where he said to serialize objects to progobufs before sending to
>> memcache to get a significant performance boost.
>>
>> johnP
>>
>>
>>
>> On Jul 13, 9:37 pm, Feng <[email protected]> wrote:
>> > OK, I have done some quick tests:
>> >
>> > We are working with 10 Python dictionaries here, ranging from 200KB to
>> > 600KB in size. The total size is about 4.2MB, and they contain about 4
>> > million keys in total.
>> >
>> > We have these 10 dicts serialized and stored in a Datastore Kind. We
>> > fetch them using key_names, then we have to call pickle.loads() on
>> > them to recreate the dictionaries, and we get/set them in the
>> > memcache, which calls pickle.loads() and pickle.dumps() internally.
>> >
>> > Now to isolate the impact of pickle, we do two experiments. One just
>> > getting and setting the strings (serialized form), and the other
>> > actually using pickle to convert the strings to the dictionaries we
>> > want.
>> >
>> >
>> > ---------------------------------------------------------------------------
>> > ---------------------------------------
>> >
>> > 1. Just the strings:
>> >
>> > (1) Load 10 big strings from datastore using key_name:
>> >
>> > 375ms 270cpu_ms 83api_cpu_ms
>> >
>> > Profile data:
>> >          8390 function calls (8359 primitive calls) in 0.331 CPU
>> > seconds
>> >
>> > In which datastore fetches cost:
>> >
>> >  10    0.000    0.000    0.323    0.032 /base/python_runtime/
>> > python_lib/versions/1/google/appengine/ext/db/__init__.py:
>> > 1192(get_by_key_name)
>> >
>> > So 32ms per string on average.
>> >
>> > (2) Load 10 big strings from memcache:
>> >
>> > 278ms 93cpu_ms
>> >
>> > Profile data:
>> >          1123 function calls in 0.102 CPU seconds
>> >
>> > In which memcache fetches cost:
>> >
>> >  10    0.000    0.000    0.101    0.010 /base/python_runtime/
>> > python_lib/versions/1/google/appengine/api/memcache/__init__.py:
>> > 462(get)
>> >
>> > So 10ms per string on average, or 1/3 the time of a datastore fetch. I
>> > wouldn't call this impressive, but read on.
>> >
>> >
>> > ---------------------------------------------------------------------------
>> > -----------------------------------------
>> >
>> > 2. Fetch strings and convert to Python dictionaries using pickle:
>> >
>> > (1) Load them from datastore and explicitly call pickle.loads():
>> >
>> > 4370ms 9626cpu_ms 83api_cpu_ms
>> >
>> > Profile data:
>> >          5417368 function calls (5417337 primitive calls) in 4.339 CPU
>> > seconds
>> >
>> > In which pickle.loads() cost:
>> >
>> > 10    0.015    0.002    3.983    0.398 /base/python_runtime/
>> > python_dist/lib/python2.5/pickle.py:1365(loads)
>> >
>> > So on average 398ms to recreate an object.
>> >
>> > (2) Load them from memcache, which calls pickle.loads() implicitly:
>> >
>> > 4535ms 10266cpu_ms
>> >
>> > Profile data:
>> >          5565212 function calls in 4.512 CPU seconds
>> >
>> > In which memcache fetches cost:
>> >
>> > 10    0.000    0.000    4.477    0.448 /base/python_runtime/python_lib/
>> > versions/1/google/appengine/api/memcache/__init__.py:462(get)
>> >
>> > Almost half a second to fetch a 400KB object from memcache. Note that
>> > in this specific case fetching from memcache actually took longer (and
>> > more CPU time) than fetching from datastore, and that's because the
>> > difference between memcache and datastore is insignificant compared to
>> > the parsing cost, and variance in the parsing performance dominates.
>> >
>> >
>> > ---------------------------------------------------------------------------
>> > ------------------------------------------
>> >
>> > As a bonus, memcache.set() costs on the strings and objects,
>> > respectively:
>> >
>> > 10    0.000    0.000    0.098    0.010 /base/python_runtime/python_lib/
>> > versions/1/google/appengine/api/memcache/__init__.py:
>> > 695(_set_with_policy)
>> >
>> > 10    0.000    0.000    9.232    0.923 /base/python_runtime/python_lib/
>> > versions/1/google/appengine/api/memcache/__init__.py:
>> > 695(_set_with_policy)
>> >
>> >
>> > ---------------------------------------------------------------------------
>> > ------------------------------------------
>> >
>> > In comparison, here are the results with the same objects on a VPS
>> > with a single 2.66GHz core using Redis:
>> >
>> > Store String: 2.4ms  -  4x against GAE
>> >
>> > Fetch String: 0.8ms  -  12x against GAE
>> >
>> > Store Object (cPickle): 65ms  -  15x against GAE
>> >
>> > Fetch Object (cPickle): 50ms  -  9x against GAE
>> >
>> > Store Object (pickle): 320ms  -  3x against GAE
>> >
>> > Fetch Object (pickle): 250ms  -  1.8x against GAE
>> >
>> > Note that the above listed "CPU seconds" for various GAE operations
>> > are extracted from the profile data, which seems to be before the
>> > conversion to a 1.2GHz standard CPU. The displayed cpu_ms is over
>> > twice that of the corresponding "CPU seconds" in the profile data:
>> >
>> > 9626/4339 ~= 10226/4512 ~= 2.2
>> > 1.2GHz * 2.2 = 2.664GHz
>> >
>> > Makes perfect sense:) So Google is most likely using 2.66GHz cores to
>> > serve these requests and the above comparisons are quite apples-to-
>> > apples.
>> >
>> >
>> > ---------------------------------------------------------------------------
>> > --------------------------------------------
>> >
>> > <rants>
>> >
>> > IMHO this is the problem with AppEngine. It's not only much more
>> > expensive than other cloud hosting offerings under the new pricing
>> > model, it's also (and has always been) much slower.
>> >
>> > It is relatively OK with the current "power delivered" model as Tim
>> > termed it, but it's nonsense with the new "fuel consumed" model. Hey,
>> > you force me to use an inefficient library and you implement it badly,
>> > so that it's 3 times slower than I would get elsewhere for the same
>> > resource usage using the same slow library, or 10 times slower if I
>> > use a fast library, and you ask me to pay for your (bad) decisions and
>> > (bad) implementations that are out of my control?
>> >
>> > OK, OK, this is the premium you have to pay for the automatic scaling
>> > feature, although I don't see why using cPickle will hamper scaling.
>> > But then GAE is a closed platform, and it has made a precendent of
>> > raising prices by multiple times overnight, I don't see why anyone who
>> > are not already trapped-in want to start investing in this platform
>> > now. Even if you find the new prices still acceptable now, what
>> > happens if they raise prices by another several times 2 years down the
>> > road, when you are already too heavily invested to move away?
>> >
>> > I have learned my lesson (lucky me, I didn't invest too much), and I
>> > will never invest too much into any closed platform from now on unless
>> > there is an easy migration plan.
>> >
>> > </rants>
>> >
>> > On Jul 14, 1:55 am, Jeff Schnitzer <[email protected]> wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > > On Wed, Jul 13, 2011 at 8:26 AM, Feng <[email protected]> wrote:
>> > > > And they don't support cPickle, and parsing a 1MB object for each
>> > > > request with pickle is not funny.
>> >
>> > > > And BTW, you still have to parse it every time even when using
>> > > > memcache. It's no different than the datastore in this regard.
>> >
>> > > > Due to the huge parsing cost of big objects, caching them in
>> > > > memcache
>> > > > dosn't provide much benefit.
>> >
>> > > Do you have some quantitative figures for this?  I'd love to see
>> > > cpu_ms for
>> > > parsing 1000 entities vs api_cpu_ms for fetching the same entities.
>> >
>> > > Jeff
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Updated App Engine Pricing FAQ!

Reply via email to