The thing about slow on joins is true (we experience that ourselves) but still I wonder myself, why you use cassandra for the indices. Can't you just store them in MySQL although?
Am 13.07.2010 um 08:26 schrieb Sandeep Kalidindi at PaGaLGuY.com: > @paul - cassandra is really good for storing indices. But i like redis > because it provides us with some of the really good data-structures like > sorted sets and all. So we use both to their strengths. For example in a > forum - all the posts and replies in a thread + which user is following which > threads etc etc are all stored in cassandra. > > But for something like which threads make up the top page (should be able to > sort these both in terms of latest posted thread first - most active thread > first etc etc) - sorted sets makes lots of sense as recomputing the whole > (which threads make up the first page) is too much to re-compute every time a > post is made. Hence in such cases i use redis but for storage of posts and > follower lists et al i use cassandra. > > @micheal - Well with mysql writes are fast but reads are slow cause of joins > et al . my main criterion is reads should be really really fast - which means > i have to precompute indices so that my front end code will just read the > index and output it - no more complex joins et al . so you need a store which > is really good for storing lots of indices - in my opinion cassandra is one > of the best store for storing indices. Hence the use of cassandra. > > Also complete de-normalization isn't so easy. There will be many cases where > it will be a pain in the ass. I could handle most of such cases with the help > of Varnish - ESI module. For example. You have a timeline of posts stored in > cassandra. But we cannot store the author information in all the posts as it > is subjected to change any time. This is the classic case of a join. If you > want to use a NoSQL for that then you will end up making multiple calls to > the database (first to fetch the posts and then to fetch the user > information). But if you just use ESI then no need to make so many calls > because you can cache that small user info module for as long as that > information doesn't change. This seems a much more elegant solution than a > mysql join. Hope i explained the point. > > Cheers, > Deepu. > > On Tue, Jul 13, 2010 at 11:38 AM, Michael Dürgner <mich...@duergner.de> wrote: > Are your PVs mostly read or write? As if they are read, I'd think you > wouldn't need a Cassandra like storage which is tuned towards writes. > > Am 12.07.2010 um 23:40 schrieb Sandeep Kalidindi at PaGaLGuY.com: > > > well we were going down constantly with VB running on 3-4 dedicated servers > > due to huge traffic(couple of tens of millions of page views). We are also > > planning on some new major features, hence the shift to cassandra with > > future in mind. > > > > Well roughly the architecture is like this(in order of how the request > > proceeds) :- > > > > 1) Varnish - php reads from cassandra and the performance isn't always > > good(i am still yet to master it though. so probably my lack of expertise > > here). So we use heavy use of varnish to cache as much as possible. VCL > > means we can cache same page for different logged in users differently. ESI > > means no need to worry about joins. Really varnish is quite a good > > companion for NoSQL . > > > > 2) Front end php servers - contains most of the template code - reads > > directly from cassandra and Redis. > > > > 3) Middleware(written in scala + python -- planning to move middleware to > > scala completely to reduce no of langs in production) - all writes from php > > directly go to the middleware - As cassandra is infact mostly a storage of > > indices - which means you need to change your strategy from mysql(post > > computation) to precomputing all the needed indices and storing them on > > cassandra. so middleware takes care of computing the indices and storing > > them in cassandra and redis accordingly. This way php will just submit the > > write to middleware and the request can be completed while middleware might > > take couple secs at most to compute the indices and finish the request > > completely. > > > > 4) Cassandra + redis clusters. > > > > > > So writes are taken care of by the middleware and hence writes complete > > uber fast and reads are also quite fast courtesy of utilizing varnish where > > ever it helps. > > > > Still not in production though. Hope it helped. Would welcome anybody's > > suggestions on the way i am using cassandra and if i can do anything better > > > > Cheers, > > Deepu. > > > > On Tue, Jul 13, 2010 at 2:48 AM, S Ahmed <sahmed1...@gmail.com> wrote: > > What sort of traffic levels made you port the application to Cassandra? > > > > Very interested in seeing this go live. > > > > What sort of server setup are you looking at using? > > > > > > On Mon, Jul 12, 2010 at 4:39 PM, Sandeep Kalidindi at PaGaLGuY.com > > <sandeep.kalidi...@pagalguy.com> wrote: > > No we re-coded from scratch with most of the needed functionality. > > > > Cheers, > > Deepu. > > > > > > On Mon, Jul 12, 2010 at 7:49 PM, S Ahmed <sahmed1...@gmail.com> wrote: > > Very interesting! > > > > What kind of integration do you have between vB and Cassandra? its not a > > port then? > > > > > > On Mon, Jul 12, 2010 at 3:34 AM, Sandeep Kalidindi at PaGaLGuY.com > > <sandeep.kalidi...@pagalguy.com> wrote: > > we were one of the vbulletin customers and our forums has been facing some > > bad scaling issues. > > > > we coded our forum software to work with cassandra. we are still testing > > for bugs and might go live in couple of weeks. You can ask any specific > > questions about vbulletin and cassandra and i will answer to the best of my > > knowledge. > > > > I our case a combination of cassandra and redis took care of most of the > > functionality that vbulletin offers and much more. > > > > Cheers, > > Deepu. > > > > > > On Mon, Jul 12, 2010 at 9:58 AM, Paul Prescod <pres...@gmail.com> wrote: > > On Sun, Jul 11, 2010 at 8:39 AM, S Ahmed <sahmed1...@gmail.com> wrote: > > > I want to build a vBulletin type application (forums, threads, posts, user > > > management, etc). > > > Support multi-tenancy for a Saas type environment. > > > Would Cassandra be suitable for this type of application? > > > > > > > > > Thanks in advance. > > > > Most likely, it is technically a fine fit. But Cassandra is very early > > stage software, so you should expect that the documentation will not > > always be clear and things will change from version to version. If you > > are not extremely self-reliant, you may find it a frustrating > > experience. Unless you are confident you will have trouble scaling > > traditional technologies, it might not make business sense. > > > > Paul Prescod > > > > > > > > > > > >