Thanks DuyHai, I would agree but we have not performed any delete operations in over a month. To me this looks like a potential bug or misconfiguration (on my end) with SASI.
I say this for a few reasons: 1) we have not performed a delete operation since the indexes were created 2) when I perform a query, against the same table, for the sha256 of an ELF file I do receive a result. SELECT * FROM testing.objects WHERE sha256 = '1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f'; sha256 | mime ------------------------------------------------------------------+--------------------------------------------------------------------- 1bffff218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f | ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV) 3) If I dont use the SASI index and instead loop through the entries manually, I get 187 results. 4) When I attempted the same SASI query again today, I again receive inconsistent results that were between 0-7. After a few attempts it again began to return 0. Do you see any errors in my index command? CREATE CUSTOM INDEX objects_mime_idx ON testing.objects (mime) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : 'true', 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' : 'true'}; Some of our SASI indexes are fairly large as we were testing the ability to use SASI over elastic search or basic processing through spark. I will run some more tests today and see if I can uncover anything. On Fri, Aug 5, 2016 at 10:36 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > Ok the fact that you see some rows and after a while you see 0 rows means > that those rows are deleted. > > Since SASI does only index INSERT & UPDATE but not DELETE, management of > tombstones is let to Cassandra to handle. > > It means that if you do an INSERT, you'll have an entry into SASI index > file but when you do a DELETE, SASI does not remove the entry from its > index file. > > When reading, SASI will give the partition offset to Cassandra and > Cassandra will fetch the data from SSTables, then realises that there is a > tombstone, thus return 0 row. > > The only moment those entries will be remove from SASI index file is when > your SSTable get compacted and the data are purged. > > The fact that you can see some rows then 0 rows mean that some of your > replicas have missed the tombstones. > > "However, after about 20 attempts, all servers started to only return 0 > results. " --> Read-repair kicks in so the tombstones are propagated and > then you see 0 row. > > > > On Tue, Aug 2, 2016 at 10:52 PM, George Webster <webste...@gmail.com> > wrote: > >> The indexes were written about 1-2 months ago. No data has been added to >> the servers since the indexes were created. Additionally, the indexes >> appeared to be stable until I noticed the issue today. ... which occurred >> after a made a large query without setting a LIMIT >> >> I set the consistency level and moved the select statement between >> different nodes. The results remained inconsistent, returning a random >> number between 0 and 8. It did not appear to make much difference between >> the different nodes or consistency level. However, after about 20 attempts, >> all servers started to only return 0 results. >> >> >> Lastly, this appeared in the logs during that time: >> >> INFO [IndexSummaryManager:1] 2016-08-02 22:11:43,245 >> IndexSummaryRedistribution.java:74 - Redistributing index summaries >> >> INFO [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 - >> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of >> 1048576 bytes >> >> On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan <doanduy...@gmail.com> wrote: >> >>> One possible explanation is that you're querying data while the index >>> files are being built so that the result are different >>> The second possible explanation is the consistency level. >>> >>> Try the query again using CL = QUORUM, try on several nodes to see if >>> the results are different >>> >>> On Tue, Aug 2, 2016 at 6:32 PM, George Webster <webste...@gmail.com> >>> wrote: >>> >>>> Hey DuyHai, >>>> Thank you for your help. >>>> >>>> 1) Cassandra version >>>> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4] >>>> >>>> >>>> 2) CREATE CUSTOM INDEX statement for your index >>>> >>>> CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING >>>> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : >>>> 'true', 'analyzer_class' : >>>> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', >>>> 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', >>>> 'tokenization_normalize_lowercase' : 'true', >>>> 'tokenization_skip_stop_words' : 'true'}; >>>> >>>> >>>> 3) Consistency level used for your SELECT >>>> I am using the default consistency >>>> cassandra@cqlsh> CONSISTENCY >>>> Current consistency level is ONE. >>>> >>>> >>>> 4) Replication factor >>>> >>>> CREATE KEYSPACE system_distributed WITH REPLICATION = { >>>> 'class' : 'org.apache.cassandra.locator.SimpleStrategy', >>>> 'replication_factor': '3' } >>>> AND DURABLE_WRITES = true; >>>> >>>> >>>> 5) Are you creating the index when the table is EMPTY or have you >>>> created the index when the table already contains some data ? >>>> I created the indexes after the tables contained data. >>>> >>>> >>>> On Tue, Aug 2, 2016 at 5:22 PM, DuyHai Doan <doanduy...@gmail.com> >>>> wrote: >>>> >>>>> Hello George >>>>> >>>>> Can you provide more details ? >>>>> >>>>> 1) Cassandra version >>>>> 2) CREATE CUSTOM INDEX statement for your index >>>>> 3) Consistency level used for your SELECT >>>>> 4) Replication factor >>>>> 5) Are you creating the index when the table is EMPTY or have you >>>>> created the index when the table already contains some data ? >>>>> >>>>> On Tue, Aug 2, 2016 at 4:05 PM, George Webster <webste...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hey guys and gals, >>>>>> >>>>>> I am having a strange issue with Cassandra SASI and I was hoping you >>>>>> could help solve the mystery. My issue is inconsistency between returned >>>>>> results and strange log errors. >>>>>> >>>>>> The biggest issue is that when I perform a query I am getting back >>>>>> inconsistent results. First few times I received between 3 and 7 results >>>>>> and then I finally received 187 results. At no point in time did I change >>>>>> the query statement. However, after I received the 187 results, any on >>>>>> queries returned zero results. >>>>>> >>>>>> my query: >>>>>> SELECT * >>>>>> FROM test.objects >>>>>> WHERE mime LIKE 'ELF%'; >>>>>> >>>>>> When I look in the system.log file I see the following: >>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:58:53,256 >>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:59:02,978 >>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>> >>>>>> >>>>>> When I look in the debug.log file I see the following when zero >>>>>> results are returned: >>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:58:53,256 >>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>> WARN [SharedPool-Worker-1] 2016-08-02 15:59:02,978 >>>>>> SelectStatement.java:351 - Aggregation query used without partition key >>>>>> >>>>>> Additionally, I see a lot of errors in the log that state: >>>>>> INFO [OptionalTasks:1] 2016-08-02 15:40:04,310 NoSpamLogger.java:91 >>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate chunk >>>>>> of >>>>>> 1048576 bytes >>>>>> INFO [OptionalTasks:1] 2016-08-02 15:55:04,387 NoSpamLogger.java:91 >>>>>> - Maximum memory usage reached (536870912 bytes), cannot allocate chunk >>>>>> of >>>>>> 1048576 bytes >>>>>> >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> >>>>> >>>> >>> >> >