Hi Riyad, No problem. Because it is a new library, I cannot provide a large list of production deployments. However, there are various reasons you should have confidence in the library:-
1/ Firstly the background of Pelops is that it is being used as the basis of a serious commercial project that makes very heavy use of Cassandra. The project itself is best described as a social network/games venture aimed at kids 6-13. I cannot go into commercial details because the information is sensitive, but all I can say is that scalability is very important to this venture, it has sufficient funds to ensure that whether or not it is ultimately successful it will have to support complex and extensive data processing in the context of large numbers of users, and the library has been created and will continue to be developed on the basis that we will suffer substantial commercial pain if it has bugs or deficiencies. I personally wrote most of the library, and have 18 years of solid programming experience. Every days large amounts of Cassandra code is being written here using the library, if/where problems appear they will be immediately reported to me and fixed with urgency. Once the venture is in production - hopefully this is not double digits weeks away now - this will provide the best affirmation, but until then the above will have to suffice (if anyone else is using Pelops successfully, would be great to hear) 2/ Before going into some more technical detail, I just want to reiterate that fundamentally Pelops is a wrapper to the Thrift API. Therefore, it does not have particular bearing on the scalability of Cassandra systems per se. However we do try to add value through our connection pooling and load balancing strategy, and that is something I will explore a little more below. 3/ Connection pooling and load balancing: As you know, one of the features of Pelops is that it separates data processing from lower level details like connection pooling. One benefit of this approach is that code becomes much more readable and less bug prone, but a really big benefit is that Pelops is able to "lend" connections to data processing code only for the moments that calls to Thrift are in progress. This makes it possible to perform client load balancing by counting how many "outstanding' Thrift API calls exist to each node, and always choosing to perform operations against the node that has the smallest number of Thrift calls running. This is the best available strategy available without actually knowing the CPU/memory etc load on Cassandra nodes - which, anyway, has various pitfalls and will probably offer only an enhancement, not an alternative system. Using this strategy adds a little to the complexity of the connection pooling system which of course increases the surface area for mistakes. It has been working for us, but I do invite people to code review it and will be very happy to answer questions and address any issues found. In terms of how the existing connection pooling system can be improved, I think in general it is pretty much the best optional available now, but there is one area where I plan an improvement. At the moment, Pelops maintains a "context" for each node it knows about in the Cassandra cluster. Each context has a refiller thread, which creates and caches new connections to the Cassandra node in question with the aim of ensuring a sufficient number of free connections exist to be available for spikes in usage. You can configure a target number of connections, a minimum number of free connections, and a maximum number of connections through the Policy. The area I see for improvement at the moment, is that each context only has a single "pool refiller" thread responsible for creating new free connections when the number falls below a low water mark. It would be better if this was multi-threaded, since in extreme situations where the buffer was depleted rapidly, it could be more rapidly restored (since in the synchronous model presented by Thrift, creating new connections is a blocking operation). This is quite a minor improvement, but I plan on addressing this shortly. Hope this helps Best Dominic On 11 June 2010 16:11, Riyad Kalla <rka...@gmail.com> wrote: > Dominic, > > I like the API; reads clearly and fairly intuitive. > > I think Ian was asking about what large-scale production deployments Pelops > has been deployed in that you could speak to -- he's trying to get a > confidence index and I am interested as well ;) > > Best, > Riyad > > > On Fri, Jun 11, 2010 at 7:04 AM, Dominic Williams < > thedwilli...@googlemail.com> wrote: > >> Hi good question. >> >> The scalability of Pelops is dependent on Cassandra, not the library >> itself. The library aims to provide an more effective access layer on top of >> the Thrift API. >> >> The library does perform connection pooling, and you can control the size >> of the pool and other parameters using a policy object. But connection >> pooling itself does not increase scalability, only efficiency. >> >> Hope this helps. >> BEst, Dominic >> >> On 11 June 2010 14:47, Ian Soboroff <isobor...@gmail.com> wrote: >> >>> Sounds nice. Can you say something about the scales at which you've used >>> this library? Both write and read load? Size of clusters and size of data? >>> >>> Ian >>> >>> >>> On Fri, Jun 11, 2010 at 9:41 AM, Dominic Williams < >>> thedwilli...@googlemail.com> wrote: >>> >>>> Pelops is a new high quality Java client library for Cassandra. >>>> >>>> It has a design that: >>>> * reveals the full power of Cassandra through an elegant "Mutator and >>>> Selector" paradigm >>>> * generates better, cleaner, less bug prone code >>>> * reduces the learning curve for new users >>>> * drives rapid application development >>>> * encapsulates advanced pooling algorithms >>>> >>>> An article introducing Pelops can be found at >>>> >>>> http://ria101.wordpress.com/2010/06/11/pelops-the-beautiful-cassandra-database-client-for-java/ >>>> >>>> Thanks for reading. >>>> Best, Dominic >>>> >>> >>> >> >