Re: Support for ad-hoc query

Jack Krupansky Wed, 10 Jun 2015 04:58:29 -0700

Knowing your queries in advance is a hard-core requirement for effective
deployment of Cassandra. Ad hoc queries are a very clear anti-pattern for
Cassandra. DSE Search does provide support for advanced, complex, and ad
hoc queries. Stratio and TupleJump Stargate can also be used.

Back to the question of what you mean by ad hoc queries:

1. Do you expect real-time results, like sub-second, or are these
long-running queries that might take seconds, 10 seconds or more, or even
minutes to run?
2. Will they be very rare or quite frequent - how much load do you expect
them to place on the cluster?
3. How complex do you expect them to be - how many clauses and operators?
4. What is their net cardinality - are they selecting just a few rows or
many rows?
5. Do they have individual query clauses that select many rows even if the
net combination of all select clauses is not so many rows?

The requirement to perform advanced, complex, and ad hoc queries using DSE
Search or the other techniques will almost certainly require that you use
moderately more capable hardware, especially more RAM, for each node, and
probably more nodes as well to reduce the row count per node since ad hoc
queries will tend to be compute-intensive based on number of rows on the
node.

Yes, it can be done. No, it is not free or cheap. And, no, it does not come
out of the box for a non-DSE Cassandra release. And, yes, you must address
this requirement before deployment, not after deployment.

-- Jack Krupansky

On Wed, Jun 10, 2015 at 1:18 AM, Srinivasa T N <[email protected]> wrote:

> Thanks guys for the inputs.
>
> By ad-hoc queries I mean that I don't know the queries during cf design
> time.  The data may be from single cf or multiple cf.  (This feature maybe
> required if I want to do analysis on the data stored in cassandra, do you
> have any better ideas)?
>
> Regards,
> Seenu.
>
> On Tue, Jun 9, 2015 at 5:57 PM, Peter Lin <[email protected]> wrote:
>
>>
>> what do you mean by ad-hoc queries?
>>
>> Do you mean simple queries against a single column family aka table?
>>
>> Or do you mean MDX style queries that looks at multiple tables?
>>
>> if it's MDX style queries, many people extract data from Cassandra into a
>> data warehouse that support multi-dimensional cubes. This works well when
>> the extracted data is a small subset and fits neatly in a data warehouse.
>>
>> As others have stated, Cassandra isn't great at ad-hoc. For MDX style
>> queries, Cassandra wasn't designed for it. One thing we've done for our own
>> project is to combine solr with our own fuzzy index to make ad-hoc queries
>> against a single table more friendly.
>>
>>
>>
>> On Tue, Jun 9, 2015 at 2:38 AM, Srinivasa T N <[email protected]> wrote:
>>
>>> Hi All,
>>>    I have an web application running with my backend data stored in
>>> cassandra.  Now I want to do some analysis on the data stored which
>>> requires some ad-hoc queries fired on cassandra.  How can I do the same?
>>>
>>> Regards,
>>> Seenu.
>>>
>>
>>
>

Re: Support for ad-hoc query

Reply via email to