Also, there are Python implementations of Hessian and MessagePack. Jeff
On Thu, Jul 14, 2011 at 11:33 AM, johnP <[email protected]> wrote: > > Very relevant analysis to what I'm working on right now. Now a > question: is the slowness of Pickle in this case in packing and > unpacking the dictionary, or is it in serializing and unserializing > the db.Key() objects? > > In other words: would pickle work faster if it was dumping and > loading a dictionary containing str(db.Key()) rather than db.Key()? > And is there a relevant a performance gain by first converting all the > keys to str, then pickling; and vice-versa? > > I'm thinking back to a series of Blog Posts by Nick Johnson a while > back where he said to serialize objects to progobufs before sending to > memcache to get a significant performance boost. > > johnP > > > > On Jul 13, 9:37 pm, Feng <[email protected]> wrote: > > OK, I have done some quick tests: > > > > We are working with 10 Python dictionaries here, ranging from 200KB to > > 600KB in size. The total size is about 4.2MB, and they contain about 4 > > million keys in total. > > > > We have these 10 dicts serialized and stored in a Datastore Kind. We > > fetch them using key_names, then we have to call pickle.loads() on > > them to recreate the dictionaries, and we get/set them in the > > memcache, which calls pickle.loads() and pickle.dumps() internally. > > > > Now to isolate the impact of pickle, we do two experiments. One just > > getting and setting the strings (serialized form), and the other > > actually using pickle to convert the strings to the dictionaries we > > want. > > > > > --------------------------------------------------------------------------- > --------------------------------------- > > > > 1. Just the strings: > > > > (1) Load 10 big strings from datastore using key_name: > > > > 375ms 270cpu_ms 83api_cpu_ms > > > > Profile data: > > 8390 function calls (8359 primitive calls) in 0.331 CPU > > seconds > > > > In which datastore fetches cost: > > > > 10 0.000 0.000 0.323 0.032 /base/python_runtime/ > > python_lib/versions/1/google/appengine/ext/db/__init__.py: > > 1192(get_by_key_name) > > > > So 32ms per string on average. > > > > (2) Load 10 big strings from memcache: > > > > 278ms 93cpu_ms > > > > Profile data: > > 1123 function calls in 0.102 CPU seconds > > > > In which memcache fetches cost: > > > > 10 0.000 0.000 0.101 0.010 /base/python_runtime/ > > python_lib/versions/1/google/appengine/api/memcache/__init__.py: > > 462(get) > > > > So 10ms per string on average, or 1/3 the time of a datastore fetch. I > > wouldn't call this impressive, but read on. > > > > > --------------------------------------------------------------------------- > ----------------------------------------- > > > > 2. Fetch strings and convert to Python dictionaries using pickle: > > > > (1) Load them from datastore and explicitly call pickle.loads(): > > > > 4370ms 9626cpu_ms 83api_cpu_ms > > > > Profile data: > > 5417368 function calls (5417337 primitive calls) in 4.339 CPU > > seconds > > > > In which pickle.loads() cost: > > > > 10 0.015 0.002 3.983 0.398 /base/python_runtime/ > > python_dist/lib/python2.5/pickle.py:1365(loads) > > > > So on average 398ms to recreate an object. > > > > (2) Load them from memcache, which calls pickle.loads() implicitly: > > > > 4535ms 10266cpu_ms > > > > Profile data: > > 5565212 function calls in 4.512 CPU seconds > > > > In which memcache fetches cost: > > > > 10 0.000 0.000 4.477 0.448 /base/python_runtime/python_lib/ > > versions/1/google/appengine/api/memcache/__init__.py:462(get) > > > > Almost half a second to fetch a 400KB object from memcache. Note that > > in this specific case fetching from memcache actually took longer (and > > more CPU time) than fetching from datastore, and that's because the > > difference between memcache and datastore is insignificant compared to > > the parsing cost, and variance in the parsing performance dominates. > > > > > --------------------------------------------------------------------------- > ------------------------------------------ > > > > As a bonus, memcache.set() costs on the strings and objects, > > respectively: > > > > 10 0.000 0.000 0.098 0.010 /base/python_runtime/python_lib/ > > versions/1/google/appengine/api/memcache/__init__.py: > > 695(_set_with_policy) > > > > 10 0.000 0.000 9.232 0.923 /base/python_runtime/python_lib/ > > versions/1/google/appengine/api/memcache/__init__.py: > > 695(_set_with_policy) > > > > > --------------------------------------------------------------------------- > ------------------------------------------ > > > > In comparison, here are the results with the same objects on a VPS > > with a single 2.66GHz core using Redis: > > > > Store String: 2.4ms - 4x against GAE > > > > Fetch String: 0.8ms - 12x against GAE > > > > Store Object (cPickle): 65ms - 15x against GAE > > > > Fetch Object (cPickle): 50ms - 9x against GAE > > > > Store Object (pickle): 320ms - 3x against GAE > > > > Fetch Object (pickle): 250ms - 1.8x against GAE > > > > Note that the above listed "CPU seconds" for various GAE operations > > are extracted from the profile data, which seems to be before the > > conversion to a 1.2GHz standard CPU. The displayed cpu_ms is over > > twice that of the corresponding "CPU seconds" in the profile data: > > > > 9626/4339 ~= 10226/4512 ~= 2.2 > > 1.2GHz * 2.2 = 2.664GHz > > > > Makes perfect sense:) So Google is most likely using 2.66GHz cores to > > serve these requests and the above comparisons are quite apples-to- > > apples. > > > > > --------------------------------------------------------------------------- > -------------------------------------------- > > > > <rants> > > > > IMHO this is the problem with AppEngine. It's not only much more > > expensive than other cloud hosting offerings under the new pricing > > model, it's also (and has always been) much slower. > > > > It is relatively OK with the current "power delivered" model as Tim > > termed it, but it's nonsense with the new "fuel consumed" model. Hey, > > you force me to use an inefficient library and you implement it badly, > > so that it's 3 times slower than I would get elsewhere for the same > > resource usage using the same slow library, or 10 times slower if I > > use a fast library, and you ask me to pay for your (bad) decisions and > > (bad) implementations that are out of my control? > > > > OK, OK, this is the premium you have to pay for the automatic scaling > > feature, although I don't see why using cPickle will hamper scaling. > > But then GAE is a closed platform, and it has made a precendent of > > raising prices by multiple times overnight, I don't see why anyone who > > are not already trapped-in want to start investing in this platform > > now. Even if you find the new prices still acceptable now, what > > happens if they raise prices by another several times 2 years down the > > road, when you are already too heavily invested to move away? > > > > I have learned my lesson (lucky me, I didn't invest too much), and I > > will never invest too much into any closed platform from now on unless > > there is an easy migration plan. > > > > </rants> > > > > On Jul 14, 1:55 am, Jeff Schnitzer <[email protected]> wrote: > > > > > > > > > > > > > > > > > On Wed, Jul 13, 2011 at 8:26 AM, Feng <[email protected]> wrote: > > > > And they don't support cPickle, and parsing a 1MB object for each > > > > request with pickle is not funny. > > > > > > And BTW, you still have to parse it every time even when using > > > > memcache. It's no different than the datastore in this regard. > > > > > > Due to the huge parsing cost of big objects, caching them in memcache > > > > dosn't provide much benefit. > > > > > Do you have some quantitative figures for this? I'd love to see cpu_ms > for > > > parsing 1000 entities vs api_cpu_ms for fetching the same entities. > > > > > Jeff > > -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
