i suppose that he should buy http://shop.oreilly.com/product/0636920010852.do , to get an idea of what cassandra can and what can't. that's my personal thinking.
-- francesco.tangari....@gmail.com Inviato con Sparrow (http://www.sparrowmailapp.com/?sig) Il giorno venerdì 17 febbraio 2012, alle ore 17.59, Chris Gerken ha scritto: > In response to an offline question… > > There are two usage patterns for Cassandra column families, static and > dynamic. With both approaches you store objects of a given type into a column > family. > > With static usage the object type you're persisting has a single key and each > row in the column family maps to a single object. The value of an object's > key is stored in the row key and each of the object's properties is stored in > a column whose name is the name of the property and whose value is the > property value. There are the same number of columns in a row as there are > non-null property values. This usage is very much like traditional relational > database usage. > > With dynamic usage the object type to be persisted has two keys (I'll get to > composite keys in a bit). With this approach the value of an object's primary > key is stored as a row key and the entire object is stored in a single column > whose name is the value of the object's secondary key and whose value is the > entire object (serialized into a ByteBuffer). This results in persisting > potentially many objects in a single row. All of those objects have the same > primary key and there are as many columns as there are objects with the same > primary key. An example of this approach is a time series column family in > which each row holds weather readings for a different city and each column in > a row holds all of the weather observations for that city at a certain time. > The timestamp is used as a column name and an object holding all the > observations is serialized and stored in the corresponding column value. > > Cassandra is a really powerful database, but it excels performance-wise with > reading and writing time series data stored using a dynamic column family. > > There are variations of the above patterns. You can use composite types to > define a row key or column name that are made up of values of multiple keys, > for example. > > I gave a presentation on the topic of Cassandra patterns recently to the > Austin Cassandra Meetup. You can find my charts there in the archives or > posted to my box at the linkedin site below…. or contact me offline. > > To bring this back to the original question. Asking for the ability to apply > a Java method to selected rows makes sense for static column families, but I > think the more general need is to be able to apply a Java method to selected > persisted objects in a column family regardless of static or dynamic usage. > While I'm on my soapbox, I think this requirement applies to Pig support as > well. > > thx > > Chris Gerken > > chrisger...@mindspring.com (mailto:chrisger...@mindspring.com) > 512.587.5261 > http://www.linkedin.com/in/chgerken > > > > On Feb 17, 2012, at 10:07 AM, Chris Gerken wrote: > > > Don, > > > > That's a good idea, but you have to be careful not to preclude the use of > > dynamic column families (e.g. CF's with time series-like schemas) which is > > what Cassandra's best at. The right approach is to build your own > > "ORM"/persistence layer (or generate one with some tools) that can hide the > > API differences between static and dynamic CF's. Once you're there, hadoop > > and Pig both come very close to what you're asking for. > > > > In other words, you should be asking for a means to apply a Java method to > > selected objects (not rows) that are persisted in a Cassandra column family. > > > > thx > > > > - Chris > > > > Chris Gerken > > > > chrisger...@mindspring.com (mailto:chrisger...@mindspring.com) > > 512.587.5261 > > http://www.linkedin.com/in/chgerken > > > > > > > > On Feb 17, 2012, at 9:35 AM, Don Smith wrote: > > > > > Are there plans to build-in some sort of map-reduce framework into > > > Cassandra and CQL? It seems that users should be able to apply a Java > > > method to selected rows in parallel on the distributed Cassandra JVMs. I > > > believe Solandra uses such an integration. > > > > > > Don > > > ________________________________________ > > > From: Alessio Cecchi [ales...@skye.it (mailto:ales...@skye.it)] > > > Sent: Friday, February 17, 2012 4:42 AM > > > To: user@cassandra.apache.org (mailto:user@cassandra.apache.org) > > > Subject: General questions about Cassandra > > > > > > Hi, > > > > > > we have developed a software that store logs from mail servers in MySQL, > > > but for huge enviroments we are developing a version that store this > > > data in HBase. Raw logs are, once a day, first normalized, so the output > > > is like this: > > > > > > username,date of login, IP Address, protocol > > > username,date of login, IP Address, protocol > > > username,date of login, IP Address, protocol > > > [...] > > > > > > and after inserted into the database. > > > > > > As I was saying, for huge installation (from 1 to 10 million of logins > > > per day, keep for 12 months) we are working with HBase, but I would also > > > consider Cassandra. > > > > > > The advantage of HBase is MapReduce which makes searching the logs very > > > fast by splitting the "query" concurrently on multiple hosts. > > > > > > Query will be launched from a web interface (will be few requests per > > > day) and the search keys are user and time range. > > > > > > But Cassandra seems less complex to manage and simply to run, so I want > > > to evaluate it instead of HBase. > > > > > > My question is, can also Cassandra split a "query" over the cluster like > > > MapReduce? Reading on-line Cassandra seems fast in insert data but > > > slower than HBase to "query". Is it really so? > > > > > > We want not install Hadoop over Cassandra. > > > > > > Any suggestion is welcome :-) > > > > > > -- > > > Alessio Cecchi is: > > > @ ILS -> http://www.linux.it/~alessice/ > > > on LinkedIn -> http://www.linkedin.com/in/alessice > > > Assistenza Sistemi GNU/Linux -> http://www.cecchi.biz/ > > > @ PLUG -> ex-Presidente, adesso senatore a vita, http://www.prato.linux.it > > > @ LOLUG -> Socio http://www.lolug.net > > > > > > > > > >