RE: Support for ad-hoc query

SEAN_R_DURITY Fri, 12 Jun 2015 07:08:37 -0700

I will note here that the limitations on ad-hoc querying (and aggregates) make 
it much more difficult to deal with data quality problems, QA testing, and 
similar efforts, especially where people are used to a more relational, ad-hoc 
model. We have often had to extract data from Cassandra to Hadoop for querying 
by hive.

Example: “We found a few records with incorrect data. How many more records 
like that are out there?”

Sean Durity

From: Peter Lin [mailto:wool...@gmail.com]
Sent: Wednesday, June 10, 2015 8:17 AM
To: user@cassandra.apache.org
Subject: Re: Support for ad-hoc query

I'll second Jack's detailed response and add that you really should do some 
discovery to figure out what kinds of queries you may need to support.
It might not be possible and often that is the case, but it's worth while to 
ask the end users what kind of reports they need to run. Allowing arbitrary 
ad-hoc queries is a known anti-pattern for cassandra. If the system needs to 
query multiple cf to derive/calculate some result, using Cassandra alone isn't 
going to do it. You'll need some other system to give you better query 
capabilities like Hive.
If you need data warehouse like features, look at http://www.kylin.io/ . They 
are doing some interesting things.
peter

On Wed, Jun 10, 2015 at 7:58 AM, Jack Krupansky 
<jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>> wrote:
Knowing your queries in advance is a hard-core requirement for effective 
deployment of Cassandra. Ad hoc queries are a very clear anti-pattern for 
Cassandra. DSE Search does provide support for advanced, complex, and ad hoc 
queries. Stratio and TupleJump Stargate can also be used.

Back to the question of what you mean by ad hoc queries:

1. Do you expect real-time results, like sub-second, or are these long-running 
queries that might take seconds, 10 seconds or more, or even minutes to run?
2. Will they be very rare or quite frequent - how much load do you expect them 
to place on the cluster?
3. How complex do you expect them to be - how many clauses and operators?
4. What is their net cardinality - are they selecting just a few rows or many 
rows?
5. Do they have individual query clauses that select many rows even if the net 
combination of all select clauses is not so many rows?

The requirement to perform advanced, complex, and ad hoc queries using DSE 
Search or the other techniques will almost certainly require that you use 
moderately more capable hardware, especially more RAM, for each node, and 
probably more nodes as well to reduce the row count per node since ad hoc 
queries will tend to be compute-intensive based on number of rows on the node.

Yes, it can be done. No, it is not free or cheap. And, no, it does not come out 
of the box for a non-DSE Cassandra release. And, yes, you must address this 
requirement before deployment, not after deployment.

-- Jack Krupansky

On Wed, Jun 10, 2015 at 1:18 AM, Srinivasa T N 
<seen...@gmail.com<mailto:seen...@gmail.com>> wrote:
Thanks guys for the inputs.
By ad-hoc queries I mean that I don't know the queries during cf design time.  
The data may be from single cf or multiple cf.  (This feature maybe required if 
I want to do analysis on the data stored in cassandra, do you have any better 
ideas)?
Regards,
Seenu.

On Tue, Jun 9, 2015 at 5:57 PM, Peter Lin 
<wool...@gmail.com<mailto:wool...@gmail.com>> wrote:

what do you mean by ad-hoc queries?
Do you mean simple queries against a single column family aka table?
Or do you mean MDX style queries that looks at multiple tables?
if it's MDX style queries, many people extract data from Cassandra into a data 
warehouse that support multi-dimensional cubes. This works well when the 
extracted data is a small subset and fits neatly in a data warehouse.
As others have stated, Cassandra isn't great at ad-hoc. For MDX style queries, 
Cassandra wasn't designed for it. One thing we've done for our own project is 
to combine solr with our own fuzzy index to make ad-hoc queries against a 
single table more friendly.

On Tue, Jun 9, 2015 at 2:38 AM, Srinivasa T N 
<seen...@gmail.com<mailto:seen...@gmail.com>> wrote:
Hi All,
   I have an web application running with my backend data stored in cassandra.  
Now I want to do some analysis on the data stored which requires some ad-hoc 
queries fired on cassandra.  How can I do the same?
Regards,
Seenu.

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: Support for ad-hoc query

Reply via email to