Yes, #16 (which is almost done for 0.7) will make this possible. On Wed, May 26, 2010 at 7:52 PM, Richard West <r...@clearchaos.com> wrote: > Hi all, > > I'm currently looking at new database options for a URL shortener in order > to scale well with increased traffic as we add new features. Cassandra seems > to be a good fit for many of our requirements, but I'm struggling a bit to > find ways of designing certain indexes in Cassandra due to its 2GB row > limit. > > The easiest example of this is that I'd like to create an index by the > domain that shortened URLs are linking to, mostly for spam control so it's > easy to grab all the links to any given domain. As far as I can tell the > typical way to do this in Cassandra is something like: - > > DOMAIN = { //columnfamily > thing.com { //row key > timestamp: "shorturl567", //column name: value > timestamp: "shorturl144", > timestamp: "shorturl112", > ... > } > somethingelse.com { > timestamp: "shorturl817", > ... > } > } > > The values here are keys for another columnfamily containing various data on > shortened URLs. > > The problem with this approach is that a popular domain (e.g. blogspot.com) > could be used in many millions of shortened URLs, so would have that many > columns and hit the row size limit mentioned at > http://wiki.apache.org/cassandra/CassandraLimitations. > > Does anyone know an effective way to design this type of one-to-many index > around this limitation (could be something obvious I'm missing)? If not, are > the changes proposed for https://issues.apache.org/jira/browse/CASSANDRA-16 > likely to make this type of design workable? > > Thanks in advance for any advice, > > Richard >
-- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com