Also, there are Python implementations of Hessian and MessagePack.

Jeff

On Thu, Jul 14, 2011 at 11:33 AM, johnP <[email protected]> wrote:

>
> Very relevant analysis to what I'm working on right now.  Now a
> question: is the slowness of Pickle in this case in packing and
> unpacking the dictionary, or is it in serializing and unserializing
> the db.Key() objects?
>
> In other words:  would pickle work faster if it was dumping and
> loading a dictionary containing str(db.Key()) rather than db.Key()?
> And is there a relevant a performance gain by first converting all the
> keys to str, then pickling; and vice-versa?
>
> I'm thinking back to a series of Blog Posts by Nick Johnson a while
> back where he said to serialize objects to progobufs before sending to
> memcache to get a significant performance boost.
>
> johnP
>
>
>
> On Jul 13, 9:37 pm, Feng <[email protected]> wrote:
> > OK, I have done some quick tests:
> >
> > We are working with 10 Python dictionaries here, ranging from 200KB to
> > 600KB in size. The total size is about 4.2MB, and they contain about 4
> > million keys in total.
> >
> > We have these 10 dicts serialized and stored in a Datastore Kind. We
> > fetch them using key_names, then we have to call pickle.loads() on
> > them to recreate the dictionaries, and we get/set them in the
> > memcache, which calls pickle.loads() and pickle.dumps() internally.
> >
> > Now to isolate the impact of pickle, we do two experiments. One just
> > getting and setting the strings (serialized form), and the other
> > actually using pickle to convert the strings to the dictionaries we
> > want.
> >
> >
> ---------------------------------------------------------------------------
> ---------------------------------------
> >
> > 1. Just the strings:
> >
> > (1) Load 10 big strings from datastore using key_name:
> >
> > 375ms 270cpu_ms 83api_cpu_ms
> >
> > Profile data:
> >          8390 function calls (8359 primitive calls) in 0.331 CPU
> > seconds
> >
> > In which datastore fetches cost:
> >
> >  10    0.000    0.000    0.323    0.032 /base/python_runtime/
> > python_lib/versions/1/google/appengine/ext/db/__init__.py:
> > 1192(get_by_key_name)
> >
> > So 32ms per string on average.
> >
> > (2) Load 10 big strings from memcache:
> >
> > 278ms 93cpu_ms
> >
> > Profile data:
> >          1123 function calls in 0.102 CPU seconds
> >
> > In which memcache fetches cost:
> >
> >  10    0.000    0.000    0.101    0.010 /base/python_runtime/
> > python_lib/versions/1/google/appengine/api/memcache/__init__.py:
> > 462(get)
> >
> > So 10ms per string on average, or 1/3 the time of a datastore fetch. I
> > wouldn't call this impressive, but read on.
> >
> >
> ---------------------------------------------------------------------------
> -----------------------------------------
> >
> > 2. Fetch strings and convert to Python dictionaries using pickle:
> >
> > (1) Load them from datastore and explicitly call pickle.loads():
> >
> > 4370ms 9626cpu_ms 83api_cpu_ms
> >
> > Profile data:
> >          5417368 function calls (5417337 primitive calls) in 4.339 CPU
> > seconds
> >
> > In which pickle.loads() cost:
> >
> > 10    0.015    0.002    3.983    0.398 /base/python_runtime/
> > python_dist/lib/python2.5/pickle.py:1365(loads)
> >
> > So on average 398ms to recreate an object.
> >
> > (2) Load them from memcache, which calls pickle.loads() implicitly:
> >
> > 4535ms 10266cpu_ms
> >
> > Profile data:
> >          5565212 function calls in 4.512 CPU seconds
> >
> > In which memcache fetches cost:
> >
> > 10    0.000    0.000    4.477    0.448 /base/python_runtime/python_lib/
> > versions/1/google/appengine/api/memcache/__init__.py:462(get)
> >
> > Almost half a second to fetch a 400KB object from memcache. Note that
> > in this specific case fetching from memcache actually took longer (and
> > more CPU time) than fetching from datastore, and that's because the
> > difference between memcache and datastore is insignificant compared to
> > the parsing cost, and variance in the parsing performance dominates.
> >
> >
> ---------------------------------------------------------------------------
> ------------------------------------------
> >
> > As a bonus, memcache.set() costs on the strings and objects,
> > respectively:
> >
> > 10    0.000    0.000    0.098    0.010 /base/python_runtime/python_lib/
> > versions/1/google/appengine/api/memcache/__init__.py:
> > 695(_set_with_policy)
> >
> > 10    0.000    0.000    9.232    0.923 /base/python_runtime/python_lib/
> > versions/1/google/appengine/api/memcache/__init__.py:
> > 695(_set_with_policy)
> >
> >
> ---------------------------------------------------------------------------
> ------------------------------------------
> >
> > In comparison, here are the results with the same objects on a VPS
> > with a single 2.66GHz core using Redis:
> >
> > Store String: 2.4ms  -  4x against GAE
> >
> > Fetch String: 0.8ms  -  12x against GAE
> >
> > Store Object (cPickle): 65ms  -  15x against GAE
> >
> > Fetch Object (cPickle): 50ms  -  9x against GAE
> >
> > Store Object (pickle): 320ms  -  3x against GAE
> >
> > Fetch Object (pickle): 250ms  -  1.8x against GAE
> >
> > Note that the above listed "CPU seconds" for various GAE operations
> > are extracted from the profile data, which seems to be before the
> > conversion to a 1.2GHz standard CPU. The displayed cpu_ms is over
> > twice that of the corresponding "CPU seconds" in the profile data:
> >
> > 9626/4339 ~= 10226/4512 ~= 2.2
> > 1.2GHz * 2.2 = 2.664GHz
> >
> > Makes perfect sense:) So Google is most likely using 2.66GHz cores to
> > serve these requests and the above comparisons are quite apples-to-
> > apples.
> >
> >
> ---------------------------------------------------------------------------
> --------------------------------------------
> >
> > <rants>
> >
> > IMHO this is the problem with AppEngine. It's not only much more
> > expensive than other cloud hosting offerings under the new pricing
> > model, it's also (and has always been) much slower.
> >
> > It is relatively OK with the current "power delivered" model as Tim
> > termed it, but it's nonsense with the new "fuel consumed" model. Hey,
> > you force me to use an inefficient library and you implement it badly,
> > so that it's 3 times slower than I would get elsewhere for the same
> > resource usage using the same slow library, or 10 times slower if I
> > use a fast library, and you ask me to pay for your (bad) decisions and
> > (bad) implementations that are out of my control?
> >
> > OK, OK, this is the premium you have to pay for the automatic scaling
> > feature, although I don't see why using cPickle will hamper scaling.
> > But then GAE is a closed platform, and it has made a precendent of
> > raising prices by multiple times overnight, I don't see why anyone who
> > are not already trapped-in want to start investing in this platform
> > now. Even if you find the new prices still acceptable now, what
> > happens if they raise prices by another several times 2 years down the
> > road, when you are already too heavily invested to move away?
> >
> > I have learned my lesson (lucky me, I didn't invest too much), and I
> > will never invest too much into any closed platform from now on unless
> > there is an easy migration plan.
> >
> > </rants>
> >
> > On Jul 14, 1:55 am, Jeff Schnitzer <[email protected]> wrote:
> >
> >
> >
> >
> >
> >
> >
> > > On Wed, Jul 13, 2011 at 8:26 AM, Feng <[email protected]> wrote:
> > > > And they don't support cPickle, and parsing a 1MB object for each
> > > > request with pickle is not funny.
> >
> > > > And BTW, you still have to parse it every time even when using
> > > > memcache. It's no different than the datastore in this regard.
> >
> > > > Due to the huge parsing cost of big objects, caching them in memcache
> > > > dosn't provide much benefit.
> >
> > > Do you have some quantitative figures for this?  I'd love to see cpu_ms
> for
> > > parsing 1000 entities vs api_cpu_ms for fetching the same entities.
> >
> > > Jeff
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to