Re: Cassandra's 2GB row limit and indexing

Jonathan Ellis Thu, 27 May 2010 06:53:09 -0700

Yes, #16 (which is almost done for 0.7) will make this possible.

On Wed, May 26, 2010 at 7:52 PM, Richard West <r...@clearchaos.com> wrote:
> Hi all,
>
> I'm currently looking at new database options for a URL shortener in order
> to scale well with increased traffic as we add new features. Cassandra seems
> to be a good fit for many of our requirements, but I'm struggling a bit to
> find ways of designing certain indexes in Cassandra due to its 2GB row
> limit.
>
> The easiest example of this is that I'd like to create an index by the
> domain that shortened URLs are linking to, mostly for spam control so it's
> easy to grab all the links to any given domain. As far as I can tell the
> typical way to do this in Cassandra is something like: -
>
> DOMAIN = { //columnfamily
>     thing.com { //row key
>         timestamp: "shorturl567", //column name: value
>         timestamp: "shorturl144",
>         timestamp: "shorturl112",
>         ...
>     }
>     somethingelse.com {
>         timestamp: "shorturl817",
>         ...
>     }
> }
>
> The values here are keys for another columnfamily containing various data on
> shortened URLs.
>
> The problem with this approach is that a popular domain (e.g. blogspot.com)
> could be used in many millions of shortened URLs, so would have that many
> columns and hit the row size limit mentioned at
> http://wiki.apache.org/cassandra/CassandraLimitations.
>
> Does anyone know an effective way to design this type of one-to-many index
> around this limitation (could be something obvious I'm missing)? If not, are
> the changes proposed for https://issues.apache.org/jira/browse/CASSANDRA-16
> likely to make this type of design workable?
>
> Thanks in advance for any advice,
>
> Richard
>




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra's 2GB row limit and indexing

Reply via email to