Not terribly large....
~50 million rows, each row has ~100-300 columns.

But big enough that a map/reduce job takes longer than users would like.

Actually maybe that is another question...
Does anyone have any benchmarks running map/reduce against Cassandra?
(even a simple count / or copy CF benchmark would be helpful)

-brian

On Fri, Jan 20, 2012 at 12:41 PM, Zach Richardson <
j.zach.richard...@gmail.com> wrote:

> How much data do you think you will need ad hoc query ability for?
>
>
> On Fri, Jan 20, 2012 at 11:28 AM, Brian O'Neill <b...@alumni.brown.edu>wrote:
>
>>
>> I can't remember if I asked this question before, but....
>>
>> We're using Cassandra as our transactional system, and building up quite
>> a library of map/reduce jobs that perform data quality analysis,
>> statistics, etc.
>> (> 100 jobs now)
>>
>> But... we are still struggling to provide an "ad-hoc" query mechanism for
>> our users.
>>
>> To fill that gap, I believe we still need to materialize our data in an
>> RDBMS.
>>
>> Anyone have any ideas?  Better ways to support ad-hoc queries?
>>
>> Effectively, our users want to be able to select count(distinct Y) from X
>> group by Z.
>> Where Y and Z are arbitrary columns of rows in X.
>>
>> We believe we can create column families with different key structures
>> (using Y an Z as row keys), but some column names we don't know / can't
>> predict ahead of time.
>>
>> Are people doing bulk exports?
>> Anyone trying to keep an RDBMS in synch in real-time?
>>
>> -brian
>>
>> --
>> Brian ONeill
>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>> mobile:215.588.6024
>> blog: http://weblogs.java.net/blog/boneill42/
>> blog: http://brianoneill.blogspot.com/
>>
>>
>


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Reply via email to