Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Jonathan Haddad Wed, 27 Feb 2019 11:04:49 -0800

If the goal is arbitrary queries, I'd avoid Cassandra altogether.  Don't
use DSE Search or Ellesandra, they're two solutions designed to solve
problems that are Cassandra first, search second.


I'd go straight to elastic search for workloads that are primarily search
driven, like you listed above.  The idea of having one DB doing both things
sounds great until it's an operational nightmare.

On Wed, Feb 27, 2019 at 10:57 AM Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> +1 on Datastax and could consider looking at Elassandra.
>
> On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R <sean_r_dur...@homedepot.com>
> wrote:
>
>> Kenneth is right. Trying to port/support a relational model to a CQL
>> model the way you are doing it is not going to go well. You won’t be able
>> to scale or get the search flexibility that you want. It will make
>> Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –
>> availability, low latency, scalability, etc. so you need to store the data
>> the way you want to retrieve it (query first modeling!). You could look at
>> defining the “right” partition and clustering keys, so that the searches
>> are within a single, reasonably sized partition. And you could have lookup
>> tables for other common search patterns (item_by_model_name, etc.)
>>
>>
>>
>> If that kind of modeling gets you to a situation where you have too many
>> lookup tables to keep consistent, you could consider something like
>> DataStax Enterprise Search (embedded SOLR) to create SOLR indexes on
>> searchable fields. A SOLR query will typically be an order of magnitude
>> slower than a partition key lookup, though.
>>
>>
>>
>> It really boils down to the purpose of the data store. If you are looking
>> for primarily an “anything goes” search engine, Cassandra may not be a good
>> choice. If you need Cassandra-level availability, extremely low latency
>> queries (on known access patterns), high volume/low latency writes, easy
>> scalability, etc. then you are going to have to rethink how you model the
>> data.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Kenneth Brotman <kenbrot...@yahoo.com.INVALID>
>> *Sent:* Thursday, February 07, 2019 7:01 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] RE: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Peter,
>>
>>
>>
>> Sounds like you may need to use a different architecture.  Perhaps you
>> need something like Presto or Kafka as a part of the solution.  If the data
>> from the legacy system is wrong for Cassandra it’s an ETL problem?  You’d
>> have to transform the data you want to use with Cassandra so that a proper
>> data model for Cassandra can be used.
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us <pe...@heitman.us>]
>> *Sent:* Wednesday, February 06, 2019 10:05 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I have read the material. The problem is that the application has a
>> query facility available to the user where they can type in "(A = foo AND B
>> = bar) OR C = chex" where A, B, and C are from a defined list of terms,
>> many of which are columns in the mytable below while others are from other
>> tables. This query facility was implemented and shipped years before we
>> decided to move to Cassandra
>>
>> On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman <
>> kenbrot...@yahoo.com.invalid> wrote:
>>
>> The problem is you’re not using a query first design.  I would recommend
>> first reading chapter 5 of Cassandra: The Definitive Guide by Jeff
>> Carpenter and Eben Hewitt.  It’s available free online at this link
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__books.google.com_books-3Fid-3DuW-2DPDAAAQBAJ-26pg-3DPA79-26lpg-3DPA79-26dq-3Djeff-2Bcarpenter-2Bchapter-2B5-26source-3Dbl-26ots-3D58bUYyNM-2DJ-26sig-3DACfU3U22U58-2DQPlz6kzo0zziNF-2DbP30l4Q-26hl-3Den-26sa-3DX-26ved-3D2ahUKEwi0n-2DnWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ-23v-3Donepage-26q-3Djeff-2520carpenter-2520chapter-25205-26f-3Dfalse&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=dsY_P-wGUZe0KuIuE01HDz4w9EI5AH4457c9uWyQx5g&s=C6imJ8BRMoV5A9NzORjdrEq6B77ZSAEO9dP__FAXUz8&e=>
>> .
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:33 PM
>>
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> Yes, I "know" that allow filtering is a sign of a (possibly fatal)
>> inefficient data model. I haven't figured out how to do it correctly yet
>>
>> On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman <
>> kenbrot...@yahoo.com.invalid> wrote:
>>
>> Exactly.  When you design your data model correctly you shouldn’t have to
>> use ALLOW FILTERING in the queries.  That is not recommended.
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Wednesday, February 06, 2019 6:09 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> You are completely right! My problem is that I am trying to port code for
>> SQL to CQL for an application that provides the user with a relatively
>> general search facility. The original implementation didn't worry about
>> secondary indexes - it just took advantage of the ability to create
>> arbitrarily complex queries with inner joins, left joins, etc. I am
>> reimplimenting it to create a parse tree of CQL queries and doing the ANDs
>> and ORs in the application. Of course once I get enough of this implemented
>> I will have to load up the table with a large data set and see if it gives
>> acceptable performance for our use case.
>>
>> On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman <kenbrotman@yahoo.cominvalid>
>> wrote:
>>
>> Isn’t that a lot of SASI indexes for one table.  Could you denormalize
>> more to reduce both columns per table and SASI indexes per table?  Eight
>> SASI indexes on one table seems like a lot.
>>
>>
>>
>> Kenneth Brotman
>>
>>
>>
>> *From:* Peter Heitman [mailto:pe...@heitman.us]
>> *Sent:* Tuesday, February 05, 2019 6:59 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: SASI queries- cqlsh vs java driver
>>
>>
>>
>> The table and secondary indexes look generally like this Note that I have
>> changed the names of many of the columns to be generic since they aren't
>> important to the question as far as I know. I left the actual names for
>> those columns that I've created SASI indexes for. The query I use to try to
>> create a PreparedStatement is:
>>
>>
>>
>> SELECT sql_id, type, cpe_id, serial, product_class, manufacturer,
>> sw_version FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING
>>
>>
>>
>> the schema cql statements are:
>>
>>
>>
>> CREATE TABLE IF NOT EXISTS mykeyspace.mytable (
>>
>>   id text,
>>
>>   sql_id bigint,
>>
>>   cpe_id text,
>>
>>   sw_version text,
>>
>>   hw_version text,
>>
>>   manufacturer text,
>>
>>   product_class text,
>>
>>   manufacturer_oui text,
>>
>>   description text,
>>
>>   periodic_inform_interval text,
>>
>>   restricted_mode_enabled text,
>>
>>   restricted_mode_reason text,
>>
>>   type text,
>>
>>   model_name text,
>>
>>   serial text,
>>
>>   mac text,
>>
>>    text,
>>
>>   generic0 timestamp,
>>
>>   household_id text,
>>
>>   generic1 int,
>>
>>   generic2 text,
>>
>>   generic3 text,
>>
>>   generic4 int,
>>
>>   generic5 int,
>>
>>   generic6 text,
>>
>>   generic7 text,
>>
>>   generic8 text,
>>
>>   generic9 text,
>>
>>   generic10 text,
>>
>>   generic11 timestamp,
>>
>>   generic12 text,
>>
>>   generic13 text,
>>
>>   generic14 timestamp,
>>
>>   generic15 text,
>>
>>   generic16 text,
>>
>>   generic17 text,
>>
>>   generic18 text,
>>
>>   generic19 text,
>>
>>   generic20 text,
>>
>>   generic21 text,
>>
>>   generic22 text,
>>
>>   generic23 text,
>>
>>   generic24 text,
>>
>>   generic25 text,
>>
>>   generic26 text,
>>
>>   generic27 text,
>>
>>   generic28 int,
>>
>>   generic29 int,
>>
>>   generic30 text,
>>
>>   generic31 text,
>>
>>   generic32 text,
>>
>>   generic33 text,
>>
>>   generic34 text,
>>
>>   generic35 int,
>>
>>   generic36 int,
>>
>>   generic37 int,
>>
>>   generic38 int,
>>
>>   generic39 text,
>>
>>   generic40 text,
>>
>>   generic41 text,
>>
>>   generic42 text,
>>
>>   generic43 text,
>>
>>   generic44 text,
>>
>>   generic45 text,
>>
>>   PRIMARY KEY (id)
>>
>> );
>>
>>
>>
>> CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable
>> (serial)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable
>> (cpe_id)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON
>> mykeyspace.mytable (manufacturer)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON
>> mykeyspace.mytable (manufacturer_oui)
>>
>>        USING 'org.apache.cassandra.index.sasiSASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable
>> (hw_version)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON mykeyspace.mytable
>> (sw_version)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>> CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON
>> mykeyspace.mytable (household_id)
>>
>>        USING 'org.apache.cassandra.index.sasi.SASIIndex'
>>
>>        WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
>> 'org.apache.cassandra.indexsasi.analyzer.NonTokenizingAnalyzer',
>> 'case_sensitive': 'false'};
>>
>>
>>
>>
>>
>> On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov <
>> oleksandr.pet...@gmail.com> wrote:
>>
>> Could you post full table schema (names obfuscated, if required) with
>> index creation statements and queries?
>>
>>
>>
>> On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet <
>> jacques-henri.berthe...@genesys.com> wrote:
>>
>> I’m not sure why it`s not allowed by the Datastax driver, but maybe you
>> could try to use OR instead of IN?
>>
>> SELECT blah FROM foo WHERE <indexed column> = :val1 OR <indexed column> =
>> :val2 ALLOW FILTERING
>>
>>
>>
>> It should be the same as IN query, but I don’t if it makes a difference
>> for performance.
>>
>>
>>
>> *From: *Peter Heitman <pe...@heitman.us>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Monday 4 February 2019 at 07:17
>> *To: *"user@cassandra.apache.org <u...@cassandraapache.org>" <
>> user@cassandra.apache.org>
>> *Subject: *SASI queries- cqlsh vs java driver
>>
>>
>>
>> When I create a SASI index on a secondary column, from cqlsh I can
>> execute a query
>>
>>
>>
>> SELECT blah FROM foo WHERE <indexed column> IN ('mytext') ALLOW FILTERING;
>>
>>
>>
>> but not from the java driver:
>>
>>
>>
>> SELECT blah FROM foo WHERE <indexed column> IN :val ALLOW FILTERING
>>
>>
>>
>> Here I get an exception
>>
>>
>>
>> com.datastax.driver.coreexceptions.InvalidQueryException: IN predicates
>> on non-primary-key columns (<indexed column>) is not yet supported
>>
>> at
>> com.datastax.driver.coreexceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
>> ~[cassandra-driver-core-3.6.0.jar:na]
>>
>>
>>
>> Why are they different? Is there anything I can do with the java driver
>> to get past this exception?
>>
>>
>>
>> Peter
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> alex p
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

Reply via email to