Interesting articles... (changing the subject line to broaden the scope) http://codemonkeyism.com/dark-side-nosql/ http://www.reportsanywhere.com/pebble/2010/04/16/1271437740000.html
These articulate the exact challenge we're trying to overcome. -brian On Fri, Jan 20, 2012 at 12:57 PM, Brian O'Neill <b...@alumni.brown.edu>wrote: > Not terribly large.... > ~50 million rows, each row has ~100-300 columns. > > But big enough that a map/reduce job takes longer than users would like. > > Actually maybe that is another question... > Does anyone have any benchmarks running map/reduce against Cassandra? > (even a simple count / or copy CF benchmark would be helpful) > > -brian > > On Fri, Jan 20, 2012 at 12:41 PM, Zach Richardson < > j.zach.richard...@gmail.com> wrote: > >> How much data do you think you will need ad hoc query ability for? >> >> >> On Fri, Jan 20, 2012 at 11:28 AM, Brian O'Neill <b...@alumni.brown.edu>wrote: >> >>> >>> I can't remember if I asked this question before, but.... >>> >>> We're using Cassandra as our transactional system, and building up quite >>> a library of map/reduce jobs that perform data quality analysis, >>> statistics, etc. >>> (> 100 jobs now) >>> >>> But... we are still struggling to provide an "ad-hoc" query mechanism >>> for our users. >>> >>> To fill that gap, I believe we still need to materialize our data in an >>> RDBMS. >>> >>> Anyone have any ideas? Better ways to support ad-hoc queries? >>> >>> Effectively, our users want to be able to select count(distinct Y) from >>> X group by Z. >>> Where Y and Z are arbitrary columns of rows in X. >>> >>> We believe we can create column families with different key structures >>> (using Y an Z as row keys), but some column names we don't know / can't >>> predict ahead of time. >>> >>> Are people doing bulk exports? >>> Anyone trying to keep an RDBMS in synch in real-time? >>> >>> -brian >>> >>> -- >>> Brian ONeill >>> Lead Architect, Health Market Science (http://healthmarketscience.com) >>> mobile:215.588.6024 >>> blog: http://weblogs.java.net/blog/boneill42/ >>> blog: http://brianoneill.blogspot.com/ >>> >>> >> > > > -- > Brian ONeill > Lead Architect, Health Market Science (http://healthmarketscience.com) > mobile:215.588.6024 > blog: http://weblogs.java.net/blog/boneill42/ > blog: http://brianoneill.blogspot.com/ > > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/