I like how in this medium people can talk behind your back and in your face at the same time! :P
I actually invested about 2 weeks (both at work AND at home), experimenting with MANY different options of storing and retrieving data in redis, using all structure-types, using both generic (procedural-generated) data, and our own real-world data. It started out as a pet-project, but it mushroomed into a very detailed and flexible "py-redis-benchmarking tool", which I have every intention of sharing on github - I think it's over a 1k-loc already... You basically tell it which benchmak-combination(s) you wish to run, and it prints the results in a nicely-organized table. If you choose to use the procedurally-generated data (for synthetic-benchmarking) you can define each of the 3 dimensions it has (keys, records, fields), to see how each effect each redis-storing-option (lists, sets, hashes, etc.). So you can get a feel for how "scale" behaves as a factor of influence on the benefits/trade-offs of each storage-option. I think I will add graph-plotting for IPython, just for the fun of it... In conclusion: A major performance-factor is the number of round-trips to redis, so I employed heavy-use of "pipeline", But it turns out that another major-performance-factor after that, is the manipulations that need to happen to the data in python, on pre-storing and post-retrival, in order to fit the data into the redis-structures. Turns out, that - at least for bulk-store/retrival (pipeline-usage), the overheads of fitting a data structure into redis, outweighs the benefits, sometimes by orders of magnitude. Perhaps if an application is written to use redis as a database, it would be worth it, as interjecting into a specific value "nested" inside a redis-structure "may" be faster than having to pull an entire "key" with serialized data - but that's not the use-case we're talking about for "caching" in web2py. So, the *tl;dr;* version of it, is: "Flat key-value store of serialized data is fastest for bulk-store/retrieval" * Especially when using "hiredis" (python-wrapper around a "c-compiled" redis-client - That's orders-of-magnitudes faster...) Then I went to testing many serialization formats/libraries: - JSON (pure-python) - simplejson (with "c-compiled" optimizations) - cjson (a "c-compiled" library w/Python-wrapper) - ujson (a "c-compiled" library w/Python-wrapper) - pickle (pure-python) - cPickle (a "c-compiled" library w/Python-wrapper) - msgpack (with "c-compiled" optimizations) - u-msgpack (pure-python) - marshal Results: - all pure-python options are slowest (unsurprising) - simplejson is almost as fast as cjson when used with c-compiled-optimization, and is more maintained, so no use for cjson. - cPickle is almost as fast as marshal, and is platform/version agnostic, so no use for marshal. - ujson is only faster than simplejson for very long (and flat) lists, and is less maintained/popular/mature. So, that leaves us with: - simplejson - cPickle - msgpack - cPickle is actually "slowest", AND is python-only. - With either simplejson or msgpack, you can read the data from redis from non-python clients AND they both (surprisingly) handle unicode really well.. - msgpack is roughly x2 faster than simplejson, but is less-readable in a redis-GUI. However: When using simplejson or msgpack. once you introduce "DateTime" values, you need to process the results in python by interjecting into the parsers with hooks... Once you do that, all the performance-gain nullifies... So cPickle becomes fastest, as it generates the python "DateTime" objects in the c-level... So I ended-up where I started, rounding a full-circle back to flat-keys with cPickle... The only benefit I ended-up gaining, is by re-factoring our high-level cache-data-structure, on-top of redis_cache.py, that does bulk-retrival and smart-refreshes - but I'm not sure I can share that code... We are now doing a bulk-get of our entire redis-cache on every request. It has over 100 keys, some very small and some with hundreds of nested-records. We narrowed it down to 16ms per-request (best-case), which is good enough for me. We basically have a class in a module, which instanciates a non-thread-local singleton, once per-process. It has an ordered-dictionary of "keys" mapped to "lambdas". We call it the "cache-catalog". The results are stored in a regular dictionary (which is thread-local), which maps the keys to their respective resultant-value. On each request, a bulk-get is issued with a list of all the keys (which we already have - it's the keys of the catalog + the "w2py:<app>:" prefix, so we don't even need to have them stored in redis in a separate set... And we still don't have to use the infamous "GET KEYS" redis-command...), and since the catalog is an ordered-dictionary, we know which value in the result maps to which key. So we know the "None" values represent the keys that are currently "missing" in redis, due to a deletion triggered by a cache-update on another request/thread/process. So we get a list of "missing keys", that we just run through in a regular for-loop, generating new values using the regular cache-mechanism (which triggers the lambdas) - so we only update what's missing. This turns out to be extremely efficient, fast and resilient. I suggest this approach would be factored-into the redis_cache.py file itself somehow... Not sure I can share that code though... (legally...) Anyways, hope this somes-up the topic, and hope some people learned something from this summary of my experience. If not, hey, what do I know, I'm just an "idea guy" after all, right? :P I'll be posting a link to the git-repo of the benchmark-code in a few days, after I clean it up a bit... On Tue, Feb 18, 2014 at 9:27 PM, Derek <sp1d...@gmail.com> wrote: > >endless arguments just to "win"? > > I don't think it's that, I think that people who consider themselves "idea > men" are people who are generally lazy who don't want to do any of the > work, but want to take credit for it. They discount the amount of time that > developers put into a project and state that they could do it better (if > they could just be bothered to implement their idea, which happens to be > too simple for them to bother with.) I was merely suggesting that the best > way to handle such people is to say 'it is a wonderful idea! people might > steal it! better be the first to implement it yourself and then patent it!' > What I've seen is that they usually shut up about their great 'new idea' > and maybe they learn that programming isn't as easy or 'simple' as they > thought it was. > > > On Thursday, February 6, 2014 3:44:37 PM UTC-7, Jufsa Lagom wrote: >> >> Hello Arnon. >> >> I just made a quick search of your posts on the other groups on >> groups.google.com.. >> >> On many (almost all) groups that you have made posts, you run into >> arguments with longtime members/contributors that have put down huge amount >> of time in the projects. >> >> You say yourself in many posts, that you are inexperienced in the subject >> that are being discussed? >> Then, perhaps it's good to take a more humble approach when addressing >> your questions/statements? >> I can only speak for myself, that I should at least pick that approach if >> I had a question to the community.. >> >> Don't misunderstand me, It's always good with new ideas and fresh >> insights.. >> But when meeting massive resistance in a community about an idea that >> doesn't seem to get any traction, then perhaps that idea shouldn't be >> forced with endless arguments just to "win"? >> >> Sorry for the OT, and this is just a friendly hint from an old news user >> :) >> >> -- >> Kind Regards >> Jufsa Lagom >> >> On Thursday, January 16, 2014 11:57:05 PM UTC+1, Arnon Marcus wrote: >>> >>> Derek: Are you being sarcastic and mean? >>> >>> >>> >>>> cache doesn't cache only resultsets, hence pickle is the only possible >>>> choice. >>>> >>>> >>> >>> Well, not if you only need flat and basic objects - there the benefit of >>> pickle is mute and it's overhead is obvious - take a look at this project: >>> https://redis-collections.readthedocs.org/en/latest/ >>> >>> >>>> It's cool. Actually, I started developing something like that using DAL >>>> callbacks, but as soon as multiple tables are involved with FK and such, it >>>> starts to loose "speed". Also, your whole app needs to be coded a-la >>>> "ActiveRecord", i.e. fetch only by PK. >>>> >>> >>> Hmmm... Haven't thought of that... Well, you can't search/query for >>> specific records by their hashed-values, but that's not the use-case I was >>> thinking about - I am not suggesting "replacing" the dal... Plus, that >>> restriction would also exist when using pickles for such a use-case... >>> What I had in mind is simpler than that - just have a bunch of simple >>> queries that you would do in your cache.ram anyways, and instead have their >>> "raw" result-set (before being parsed into "rows" objects) and cached as-is >>> (almost...) - that would be faster to load-in the cache than into >>> cache.ram, and also faster for retrieval. >>> >>> >>>> BTW, I'm not properly sure that fetching 100 records with 100 calls to >>>> redis vs pulling a single time a pickle of 1000 records and discarding what >>>> you don't need is faster. >>>> >>> >>> Hmmm... I don't know, redis is famous for crunching somewhere in the >>> order of 500K requests per-second - have you tested it? >>> >>> >>>> BTW2: ORM are already there: redisco and redis-lympid >>>> >>> >>> 10x, I'll take a look - though I think an ORM would defeat the purpose >>> (in terms of of speed) and would be overkill... >>> >> -- > Resources: > - http://web2py.com > - http://web2py.com/book (Documentation) > - http://github.com/web2py/web2py (Source code) > - https://code.google.com/p/web2py/issues/list (Report Issues) > --- > You received this message because you are subscribed to a topic in the > Google Groups "web2py-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/web2py/im3pZuKWkWI/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > web2py+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.