Re: question when using SASI indexing

DuyHai Doan Fri, 05 Aug 2016 05:39:07 -0700

For the record, we've found the issue, it is not related to SASI, the
inconsistencies are due to inconsistent data, need a good repair to put
them back in sync.


Using QUORUM CL grant consistent results when querying

On Fri, Aug 5, 2016 at 1:18 PM, George Webster <webste...@gmail.com> wrote:

> Thanks DuyHai,
>
> I would agree but we have not performed any delete operations in over a
> month. To me this looks like a potential bug or misconfiguration (on my
> end) with SASI.
>
> I say this for a few reasons:
> 1) we have not performed a delete operation since the indexes were created
> 2) when I perform a query, against the same table, for the sha256 of an
> ELF file I do receive a result.
> SELECT * FROM testing.objects WHERE sha256 = '
> 1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f';
>
>  sha256                                                           | mime
> ------------------------------------------------------------
> ------+-----------------------------------------------------
> ----------------
>  1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f | ELF
> 32-bit MSB  executable, PowerPC or cisco 4500, version 1 (SYSV)
>
> 3) If I dont use the SASI index and instead loop through the entries
> manually, I get 187 results.
> 4) When I attempted the same SASI query again today, I again receive
> inconsistent results that were between 0-7. After a few attempts it again
> began to return 0.
>
> Do you see any errors in my index command?
>
> CREATE CUSTOM INDEX objects_mime_idx ON testing.objects (mime) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : 
> 'true', 'analyzer_class' : 
> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
> 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', 
> 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' : 
> 'true'};
>
>
> Some of our SASI indexes are fairly large as we were testing the ability
> to use SASI over elastic search or basic processing through spark. I will
> run some more tests today and see if I can uncover anything.
>
>
> On Fri, Aug 5, 2016 at 10:36 AM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Ok the fact that you see some rows and after a while you see 0 rows means
>> that those rows are deleted.
>>
>> Since SASI does only index INSERT & UPDATE but not DELETE, management of
>> tombstones is let to Cassandra to handle.
>>
>> It means that if you do an INSERT, you'll have an entry into SASI index
>> file but when you do a DELETE, SASI does not remove the entry from its
>> index file.
>>
>> When reading, SASI will give the partition offset to Cassandra and
>> Cassandra will fetch the data from SSTables, then realises that there is a
>> tombstone, thus return 0 row.
>>
>> The only moment those entries will be remove from SASI index file is when
>> your SSTable get compacted and the data are purged.
>>
>> The fact that you can see some rows then 0 rows mean that some of your
>> replicas have missed the tombstones.
>>
>> "However, after about 20 attempts, all servers started to only return 0
>> results. " --> Read-repair kicks in so the tombstones are propagated and
>> then you see 0 row.
>>
>>
>>
>> On Tue, Aug 2, 2016 at 10:52 PM, George Webster <webste...@gmail.com>
>> wrote:
>>
>>> The indexes were written about 1-2 months ago. No data has been added to
>>> the servers since the indexes were created. Additionally, the indexes
>>> appeared to be stable until I noticed the issue today. ... which occurred
>>> after a made a large query without setting a LIMIT
>>>
>>> I set the consistency level and moved the select statement between
>>> different nodes. The results remained inconsistent, returning a random
>>> number between 0 and 8. It did not appear to make much difference between
>>> the different nodes or consistency level. However, after about 20 attempts,
>>> all servers started to only return 0 results.
>>>
>>>
>>> Lastly, this appeared in the logs during that time:
>>>
>>> INFO  [IndexSummaryManager:1] 2016-08-02 22:11:43,245
>>> IndexSummaryRedistribution.java:74 - Redistributing index summaries
>>>
>>> INFO  [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 -
>>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of
>>> 1048576 bytes
>>>
>>> On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>>> One possible explanation is that you're querying data while the index
>>>> files are being built so that the result are different
>>>>  The second possible explanation is the consistency level.
>>>>
>>>> Try the query again using CL = QUORUM, try on several nodes to see if
>>>> the results are different
>>>>
>>>> On Tue, Aug 2, 2016 at 6:32 PM, George Webster <webste...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey DuyHai,
>>>>> Thank you for your help.
>>>>>
>>>>> 1) Cassandra version
>>>>> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4]
>>>>>
>>>>>
>>>>> 2) CREATE CUSTOM INDEX statement for your index
>>>>>
>>>>> CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING 
>>>>> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : 
>>>>> 'true', 'analyzer_class' : 
>>>>> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 
>>>>> 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', 
>>>>> 'tokenization_normalize_lowercase' : 'true', 
>>>>> 'tokenization_skip_stop_words' : 'true'};
>>>>>
>>>>>
>>>>> 3) Consistency level used for your SELECT
>>>>> I am using the default consistency
>>>>> cassandra@cqlsh> CONSISTENCY
>>>>> Current consistency level is ONE.
>>>>>
>>>>>
>>>>> 4) Replication factor
>>>>>
>>>>> CREATE KEYSPACE system_distributed WITH REPLICATION = {
>>>>>   'class' : 'org.apache.cassandra.locator.SimpleStrategy',
>>>>>   'replication_factor': '3' }
>>>>> AND DURABLE_WRITES = true;
>>>>>
>>>>>
>>>>> 5) Are you creating the index when the table is EMPTY or have you
>>>>> created the index when the table already contains some data ?
>>>>> I created the indexes after the tables contained data.
>>>>>
>>>>>
>>>>> On Tue, Aug 2, 2016 at 5:22 PM, DuyHai Doan <doanduy...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello George
>>>>>>
>>>>>> Can you provide more details ?
>>>>>>
>>>>>> 1) Cassandra version
>>>>>> 2) CREATE CUSTOM INDEX statement for your index
>>>>>> 3) Consistency level used for your SELECT
>>>>>> 4) Replication factor
>>>>>> 5) Are you creating the index when the table is EMPTY or have you
>>>>>> created the index when the table already contains some data ?
>>>>>>
>>>>>> On Tue, Aug 2, 2016 at 4:05 PM, George Webster <webste...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey guys and gals,
>>>>>>>
>>>>>>> I am having a strange issue with Cassandra SASI and I was hoping you
>>>>>>> could help solve the mystery. My issue is inconsistency between returned
>>>>>>> results and strange log errors.
>>>>>>>
>>>>>>> The biggest issue is that when I perform a query I am getting back
>>>>>>> inconsistent results. First few times I received between 3 and 7 results
>>>>>>> and then I finally received 187 results. At no point in time did I 
>>>>>>> change
>>>>>>> the query statement. However, after I received the 187 results, any on
>>>>>>> queries returned zero results.
>>>>>>>
>>>>>>> my query:
>>>>>>> SELECT *
>>>>>>>     FROM test.objects
>>>>>>>     WHERE mime LIKE 'ELF%';
>>>>>>>
>>>>>>> When I look in the system.log file I see the following:
>>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>>>>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>>>>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>>>>>
>>>>>>>
>>>>>>> When I look in the debug.log file I see the following when zero
>>>>>>> results are returned:
>>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:58:53,256
>>>>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>>>>> WARN  [SharedPool-Worker-1] 2016-08-02 15:59:02,978
>>>>>>> SelectStatement.java:351 - Aggregation query used without partition key
>>>>>>>
>>>>>>> Additionally, I see a lot of errors in the log that state:
>>>>>>> INFO  [OptionalTasks:1] 2016-08-02 15:40:04,310 NoSpamLogger.java:91
>>>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate chunk 
>>>>>>> of
>>>>>>> 1048576 bytes
>>>>>>> INFO  [OptionalTasks:1] 2016-08-02 15:55:04,387 NoSpamLogger.java:91
>>>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate chunk 
>>>>>>> of
>>>>>>> 1048576 bytes
>>>>>>>
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: question when using SASI indexing

Reply via email to