I certainly agree with "difficult to predict". There is a Danish
proverb, which goes "it's difficult to make predictions, especially
about the future".
My point was that it's equally difficult with noSQL and RDBMS.
The latter requires indexing to operate well, and that's a potential
performance problem.
On 1/20/2012 7:55 PM, Mohit Anchlia wrote:
I think the problem stems when you have data in a column that you need
to run adhoc query on which is not denormalized. In most cases it's
difficult to predict the type of query that would be required.
Another way of solving this could be to index the fields in search engine.
On Fri, Jan 20, 2012 at 7:37 PM, Maxim Potekhin<potek...@bnl.gov> wrote:
What makes you think that RDBMS will give you acceptable performance?
I guess you will try to index it to death (because otherwise the "ad hoc"
queries won't work well if at all), and at this point you may be hit with a
performance penalty.
It may be a good idea to interview users and build denormalized views in
Cassandra, maybe on a separate "look-up" cluster. A few percent of users
will be unhappy, but you'll find it hard to do better. I'm talking from my
experience with an industrial strength RDBMS which doesn't scale very well
for what you call "ad-hoc" queries.
Regards,
Maxim
On 1/20/2012 9:28 AM, Brian O'Neill wrote:
I can't remember if I asked this question before, but....
We're using Cassandra as our transactional system, and building up quite a
library of map/reduce jobs that perform data quality analysis, statistics,
etc.
(> 100 jobs now)
But... we are still struggling to provide an "ad-hoc" query mechanism for
our users.
To fill that gap, I believe we still need to materialize our data in an
RDBMS.
Anyone have any ideas? Better ways to support ad-hoc queries?
Effectively, our users want to be able to select count(distinct Y) from X
group by Z.
Where Y and Z are arbitrary columns of rows in X.
We believe we can create column families with different key structures
(using Y an Z as row keys), but some column names we don't know / can't
predict ahead of time.
Are people doing bulk exports?
Anyone trying to keep an RDBMS in synch in real-time?
-brian
--
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/