Re: C 2.1

Jack Krupansky Wed, 17 Sep 2014 21:51:04 -0700

Stratio and Stargate are at the Lucene level – DSE/Solr is at the Solr level. 
DSE/Solr supports both inserts and queries from either Cassandra or Solr – a 
Solr server is running on each Cassandra node that indexes and queries the data 
on that node.


DSE/Solr does have CQL SELECT integration as well, but supports Solr query 
syntax rather than needing to pass a structured JSON format.

SELECT * FROM persons WHERE solr_query=’name:jo* age:[20 TO 40]’;

And your app can use SolrJ or raw HTTP requests to talk to Solr within DSE as 
well.

-- Jack Krupansky

From: Ram N 
Sent: Wednesday, September 17, 2014 5:25 PM
To: user 
Subject: Re: C 2.1


Thanks Rob for pointing me to that link. I haven't gone through all the JIRAs 
but I guess it talks about adv & disadv of Secondary Index in Cassandra which I 
understand by now but doesn't really talk about why the default implementation 
of Secondary Index didn't take the DSE/Solr approach?

Hi Jack,

Thats good to know but any pointers on how is this any different than 
https://github.com/Stratio/stratio-cassandra or 
http://stargate-core.readthedocs.org/en/latest/intro.html ? 

--Ram


On Tue, Sep 16, 2014 at 10:32 PM, Jack Krupansky <j...@basetechnology.com> 
wrote:

  DSE/Solr is tightly integrated, so there is no “external” system to manage – 
insert data in CQL and within a few seconds it is available for query from Solr 
running in the same JVM as Cassandra. DSE/Solr indexes the data on each 
Cassandra node, and uses Cassandra’s cluster management for distributing 
queries across the cluster. And... Lucene (underneath Solr) is optimal for 
queries that span multiple fields. DSE/Solr supports CQL3 wide rows (clustering 
columns.)

  -- Jack Krupansky

  From: Ram N 
  Sent: Monday, September 15, 2014 4:34 PM
  To: user 
  Subject: Re: C 2.1


  Jack, 

  Using Solr or an external search/indexing service is an option but increases 
the complexity of managing different systems. I am curious to understand the 
impact of having wide-rows on a separate CF for inverted index purpose which if 
I understand correctly is what Rob's response, having a separate CF for index 
is better than using the default Secondary index option. 

  Would be great to understand the design decision to go with present 
implementation on Secondary Index when the alternative is better? Looking at 
JIRAs is still confusing to come up with the why :) 

  --R 





  On Mon, Sep 15, 2014 at 11:17 AM, Jack Krupansky <j...@basetechnology.com> 
wrote:

    If you’re indexing and querying on that many columns (dozens, or more than 
a handful), consider DSE/Solr, especially if you need to query on multiple 
columns in the same query.

    -- Jack Krupansky

    From: Robert Coli 
    Sent: Monday, September 15, 2014 11:07 AM
    To: user@cassandra.apache.org 
    Subject: Re: C 2.1

    On Sat, Sep 13, 2014 at 3:49 PM, Ram N <yrami...@gmail.com> wrote:

      Is 2.1 a production ready release? 

    https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/


           Datastax Java driver - I get too confused with CQL and the 
underlying storage model. I am also not clear on the indexing structure of 
columns. Does CQL indexes create a separate CF for the index table? How is it 
different from maintaining inverted index? Internally both are the same? Does 
cql stmt to create index, creates a separate CF and has an atomic way of 
updating/managing them? Which one is better to scale? (something like 
stargate-core or the ones done by usergrid? or the CQL approach?)

    New projects should use CQL. Access to underlying storage via Thrift is 
likely to eventually be removed from Cassandra.

      On a separate note just curious if I have 1000's of columns in a given 
row and a fixed set of indexed column  (say 30 - 50 columns) which approach 
should I be taking? Will cassandra scale with these many indexed column? Are 
there any limits? How much of an impact do CQL indexes create on the system? I 
am also not sure if these use cases are the right choice for cassandra but 
would really appreciate any response on these. Thanks.

    Use of the "Secondary Indexes" feature is generally an anti-pattern in 
Cassandra. 30-50 indexed columns in a row sounds insane to me. However 30-50 
column families into which one manually denormalized does not sound too insane 
to me...

    =Rob
    http://twitter.com/rcolidba

Re: C 2.1

Reply via email to