For the record, we've found the issue, it is not related to SASI, the inconsistencies are due to inconsistent data, need a good repair to put them back in sync.
Using QUORUM CL grant consistent results when querying On Fri, Aug 5, 2016 at 1:18 PM, George Webster <webste...@gmail.com> wrote: > Thanks DuyHai, > > I would agree but we have not performed any delete operations in over a > month. To me this looks like a potential bug or misconfiguration (on my > end) with SASI. > > I say this for a few reasons: > 1) we have not performed a delete operation since the indexes were created > 2) when I perform a query, against the same table, for the sha256 of an > ELF file I do receive a result. > SELECT * FROM testing.objects WHERE sha256 = ' > 1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f'; > > sha256 | mime > ------------------------------------------------------------ > ------+----------------------------------------------------- > ---------------- > 1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f | ELF > 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV) > > 3) If I dont use the SASI index and instead loop through the entries > manually, I get 187 results. > 4) When I attempted the same SASI query again today, I again receive > inconsistent results that were between 0-7. After a few attempts it again > began to return 0. > > Do you see any errors in my index command? > > CREATE CUSTOM INDEX objects_mime_idx ON testing.objects (mime) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : > 'true', 'analyzer_class' : > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', > 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' : > 'true'}; > > > Some of our SASI indexes are fairly large as we were testing the ability > to use SASI over elastic search or basic processing through spark. I will > run some more tests today and see if I can uncover anything. > > > On Fri, Aug 5, 2016 at 10:36 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Ok the fact that you see some rows and after a while you see 0 rows means >> that those rows are deleted. >> >> Since SASI does only index INSERT & UPDATE but not DELETE, management of >> tombstones is let to Cassandra to handle. >> >> It means that if you do an INSERT, you'll have an entry into SASI index >> file but when you do a DELETE, SASI does not remove the entry from its >> index file. >> >> When reading, SASI will give the partition offset to Cassandra and >> Cassandra will fetch the data from SSTables, then realises that there is a >> tombstone, thus return 0 row. >> >> The only moment those entries will be remove from SASI index file is when >> your SSTable get compacted and the data are purged. >> >> The fact that you can see some rows then 0 rows mean that some of your >> replicas have missed the tombstones. >> >> "However, after about 20 attempts, all servers started to only return 0 >> results. " --> Read-repair kicks in so the tombstones are propagated and >> then you see 0 row. >> >> >> >> On Tue, Aug 2, 2016 at 10:52 PM, George Webster <webste...@gmail.com> >> wrote: >> >>> The indexes were written about 1-2 months ago. No data has been added to >>> the servers since the indexes were created. Additionally, the indexes >>> appeared to be stable until I noticed the issue today. ... which occurred >>> after a made a large query without setting a LIMIT >>> >>> I set the consistency level and moved the select statement between >>> different nodes. The results remained inconsistent, returning a random >>> number between 0 and 8. It did not appear to make much difference between >>> the different nodes or consistency level. However, after about 20 attempts, >>> all servers started to only return 0 results. >>> >>> >>> Lastly, this appeared in the logs during that time: >>> >>> INFO [IndexSummaryManager:1] 2016-08-02 22:11:43,245 >>> IndexSummaryRedistribution.java:74 - Redistributing index summaries >>> >>> INFO [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 - >>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of >>> 1048576 bytes >>> >>> On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan <doanduy...@gmail.com> >>> wrote: >>> >>>> One possible explanation is that you're querying data while the index >>>> files are being built so that the result are different >>>> The second possible explanation is the consistency level. >>>> >>>> Try the query again using CL = QUORUM, try on several nodes to see if >>>> the results are different >>>> >>>> On Tue, Aug 2, 2016 at 6:32 PM, George Webster <webste...@gmail.com> >>>> wrote: >>>> >>>>> Hey DuyHai, >>>>> Thank you for your help. >>>>> >>>>> 1) Cassandra version >>>>> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4] >>>>> >>>>> >>>>> 2) CREATE CUSTOM INDEX statement for your index >>>>> >>>>> CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING >>>>> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : >>>>> 'true', 'analyzer_class' : >>>>> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', >>>>> 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', >>>>> 'tokenization_normalize_lowercase' : 'true', >>>>> 'tokenization_skip_stop_words' : 'true'}; >>>>> >>>>> >>>>> 3) Consistency level used for your SELECT >>>>> I am using the default consistency >>>>> cassandra@cqlsh> CONSISTENCY >>>>> Current consistency level is ONE. >>>>> >>>>> >>>>> 4) Replication factor >>>>> >>>>> CREATE KEYSPACE system_distributed WITH REPLICATION = { >>>>> 'class' : 'org.apache.cassandra.locator.SimpleStrategy', >>>>> 'replication_factor': '3' } >>>>> AND DURABLE_WRITES = true; >>>>> >>>>> >>>>> 5) Are you creating the index when the table is EMPTY or have you >>>>> created the index when the table already contains some data ? >>>>> I created the indexes after the tables contained data. >>>>> >>>>> >>>>> On Tue, Aug 2, 2016 at 5:22 PM, DuyHai Doan <doanduy...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hello George >>>>>> >>>>>> Can you provide more details ? >>>>>> >>>>>> 1) Cassandra version >>>>>> 2) CREATE CUSTOM INDEX statement for your index >>>>>> 3) Consistency level used for your SELECT >>>>>> 4) Replication factor >>>>>> 5) Are you creating the index when the table is EMPTY or have you >>>>>> created the index when the table already contains some data ? >>>>>> >>>>>> On Tue, Aug 2, 2016 at 4:05 PM, George Webster <webste...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hey guys and gals, >>>>>>> >>>>>>> I am having a strange issue with Cassandra SASI and I was hoping you >>>>>>> could help solve the mystery. My issue is inconsistency between returned >>>>>>> results and strange log errors. >>>>>>> >>>>>>> The biggest issue is that when I perform a query I am getting back >>>>>>> inconsistent results. First few times I received between 3 and 7 results >>>>>>> and then I finally received 187 results. At no point in time did I >>>>>>> change >>>>>>> the query statement. However, after I received the 187 results, any on >>>>>>> queries returned zero results. >>>>>>> >>>>>>> my query: >>>>>>> SELECT * >>>>>>> FROM test.objects >>>>>>> WHERE mime LIKE 'ELF%'; >>>>>>> >>>>>>> When I look in the system.log file I see the following: >>>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:58:53,256 >>>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:59:02,978 >>>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>>> >>>>>>> >>>>>>> When I look in the debug.log file I see the following when zero >>>>>>> results are returned: >>>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:58:53,256 >>>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:59:02,978 >>>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>>> >>>>>>> Additionally, I see a lot of errors in the log that state: >>>>>>> INFO [OptionalTasks:1] 2016-08-02 15:40:04,310 NoSpamLogger.java:91 >>>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate chunk >>>>>>> of >>>>>>> 1048576 bytes >>>>>>> INFO [OptionalTasks:1] 2016-08-02 15:55:04,387 NoSpamLogger.java:91 >>>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate chunk >>>>>>> of >>>>>>> 1048576 bytes >>>>>>> >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >