Re: question when using SASI indexing
Ok the fact that you see some rows and after a while you see 0 rows means that those rows are deleted. Since SASI does only index INSERT & UPDATE but not DELETE, management of tombstones is let to Cassandra to handle. It means that if you do an INSERT, you'll have an entry into SASI index file but when you do a DELETE, SASI does not remove the entry from its index file. When reading, SASI will give the partition offset to Cassandra and Cassandra will fetch the data from SSTables, then realises that there is a tombstone, thus return 0 row. The only moment those entries will be remove from SASI index file is when your SSTable get compacted and the data are purged. The fact that you can see some rows then 0 rows mean that some of your replicas have missed the tombstones. "However, after about 20 attempts, all servers started to only return 0 results. " --> Read-repair kicks in so the tombstones are propagated and then you see 0 row. On Tue, Aug 2, 2016 at 10:52 PM, George Webster wrote: > The indexes were written about 1-2 months ago. No data has been added to > the servers since the indexes were created. Additionally, the indexes > appeared to be stable until I noticed the issue today. ... which occurred > after a made a large query without setting a LIMIT > > I set the consistency level and moved the select statement between > different nodes. The results remained inconsistent, returning a random > number between 0 and 8. It did not appear to make much difference between > the different nodes or consistency level. However, after about 20 attempts, > all servers started to only return 0 results. > > > Lastly, this appeared in the logs during that time: > > INFO [IndexSummaryManager:1] 2016-08-02 22:11:43,245 > IndexSummaryRedistribution.java:74 - Redistributing index summaries > > INFO [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 - > Maximum memory usage reached (536870912 bytes), cannot allocate chunk of > 1048576 bytes > > On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan wrote: > >> One possible explanation is that you're querying data while the index >> files are being built so that the result are different >> The second possible explanation is the consistency level. >> >> Try the query again using CL = QUORUM, try on several nodes to see if the >> results are different >> >> On Tue, Aug 2, 2016 at 6:32 PM, George Webster >> wrote: >> >>> Hey DuyHai, >>> Thank you for your help. >>> >>> 1) Cassandra version >>> [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4] >>> >>> >>> 2) CREATE CUSTOM INDEX statement for your index >>> >>> CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING >>> 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : >>> 'true', 'analyzer_class' : >>> 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', >>> 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', >>> 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' >>> : 'true'}; >>> >>> >>> 3) Consistency level used for your SELECT >>> I am using the default consistency >>> cassandra@cqlsh> CONSISTENCY >>> Current consistency level is ONE. >>> >>> >>> 4) Replication factor >>> >>> CREATE KEYSPACE system_distributed WITH REPLICATION = { >>> 'class' : 'org.apache.cassandra.locator.SimpleStrategy', >>> 'replication_factor': '3' } >>> AND DURABLE_WRITES = true; >>> >>> >>> 5) Are you creating the index when the table is EMPTY or have you >>> created the index when the table already contains some data ? >>> I created the indexes after the tables contained data. >>> >>> >>> On Tue, Aug 2, 2016 at 5:22 PM, DuyHai Doan >>> wrote: >>> Hello George Can you provide more details ? 1) Cassandra version 2) CREATE CUSTOM INDEX statement for your index 3) Consistency level used for your SELECT 4) Replication factor 5) Are you creating the index when the table is EMPTY or have you created the index when the table already contains some data ? On Tue, Aug 2, 2016 at 4:05 PM, George Webster wrote: > Hey guys and gals, > > I am having a strange issue with Cassandra SASI and I was hoping you > could help solve the mystery. My issue is inconsistency between returned > results and strange log errors. > > The biggest issue is that when I perform a query I am getting back > inconsistent results. First few times I received between 3 and 7 results > and then I finally received 187 results. At no point in time did I change > the query statement. However, after I received the 187 results, any on > queries returned zero results. > > my query: > SELECT * > FROM test.objects > WHERE mime LIKE 'ELF%'; > > When I look in the system.log file I see the following: > WARN [SharedPool-Worker-1] 2016-08-02 15:58:53,256 > SelectStatement.java:351 - Aggregation query used without
Re: question when using SASI indexing
Thanks DuyHai, I would agree but we have not performed any delete operations in over a month. To me this looks like a potential bug or misconfiguration (on my end) with SASI. I say this for a few reasons: 1) we have not performed a delete operation since the indexes were created 2) when I perform a query, against the same table, for the sha256 of an ELF file I do receive a result. SELECT * FROM testing.objects WHERE sha256 = '1b218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f'; sha256 | mime --+- 1b218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f | ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV) 3) If I dont use the SASI index and instead loop through the entries manually, I get 187 results. 4) When I attempted the same SASI query again today, I again receive inconsistent results that were between 0-7. After a few attempts it again began to return 0. Do you see any errors in my index command? CREATE CUSTOM INDEX objects_mime_idx ON testing.objects (mime) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : 'true', 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' : 'true'}; Some of our SASI indexes are fairly large as we were testing the ability to use SASI over elastic search or basic processing through spark. I will run some more tests today and see if I can uncover anything. On Fri, Aug 5, 2016 at 10:36 AM, DuyHai Doan wrote: > Ok the fact that you see some rows and after a while you see 0 rows means > that those rows are deleted. > > Since SASI does only index INSERT & UPDATE but not DELETE, management of > tombstones is let to Cassandra to handle. > > It means that if you do an INSERT, you'll have an entry into SASI index > file but when you do a DELETE, SASI does not remove the entry from its > index file. > > When reading, SASI will give the partition offset to Cassandra and > Cassandra will fetch the data from SSTables, then realises that there is a > tombstone, thus return 0 row. > > The only moment those entries will be remove from SASI index file is when > your SSTable get compacted and the data are purged. > > The fact that you can see some rows then 0 rows mean that some of your > replicas have missed the tombstones. > > "However, after about 20 attempts, all servers started to only return 0 > results. " --> Read-repair kicks in so the tombstones are propagated and > then you see 0 row. > > > > On Tue, Aug 2, 2016 at 10:52 PM, George Webster > wrote: > >> The indexes were written about 1-2 months ago. No data has been added to >> the servers since the indexes were created. Additionally, the indexes >> appeared to be stable until I noticed the issue today. ... which occurred >> after a made a large query without setting a LIMIT >> >> I set the consistency level and moved the select statement between >> different nodes. The results remained inconsistent, returning a random >> number between 0 and 8. It did not appear to make much difference between >> the different nodes or consistency level. However, after about 20 attempts, >> all servers started to only return 0 results. >> >> >> Lastly, this appeared in the logs during that time: >> >> INFO [IndexSummaryManager:1] 2016-08-02 22:11:43,245 >> IndexSummaryRedistribution.java:74 - Redistributing index summaries >> >> INFO [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 - >> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of >> 1048576 bytes >> >> On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan wrote: >> >>> One possible explanation is that you're querying data while the index >>> files are being built so that the result are different >>> The second possible explanation is the consistency level. >>> >>> Try the query again using CL = QUORUM, try on several nodes to see if >>> the results are different >>> >>> On Tue, Aug 2, 2016 at 6:32 PM, George Webster >>> wrote: >>> Hey DuyHai, Thank you for your help. 1) Cassandra version [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4] 2) CREATE CUSTOM INDEX statement for your index CREATE CUSTOM INDEX objects_mime_idx ON test.objects (mime) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : 'true', 'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' : 'true'}; 3) Consistency level used for your SELECT >
Re: question when using SASI indexing
For the record, we've found the issue, it is not related to SASI, the inconsistencies are due to inconsistent data, need a good repair to put them back in sync. Using QUORUM CL grant consistent results when querying On Fri, Aug 5, 2016 at 1:18 PM, George Webster wrote: > Thanks DuyHai, > > I would agree but we have not performed any delete operations in over a > month. To me this looks like a potential bug or misconfiguration (on my > end) with SASI. > > I say this for a few reasons: > 1) we have not performed a delete operation since the indexes were created > 2) when I perform a query, against the same table, for the sha256 of an > ELF file I do receive a result. > SELECT * FROM testing.objects WHERE sha256 = ' > 1b218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f'; > > sha256 | mime > > --+- > > 1b218c991960d48f3a6d7a7139ae8789886365606be9213c5b371e57115f | ELF > 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV) > > 3) If I dont use the SASI index and instead loop through the entries > manually, I get 187 results. > 4) When I attempted the same SASI query again today, I again receive > inconsistent results that were between 0-7. After a few attempts it again > began to return 0. > > Do you see any errors in my index command? > > CREATE CUSTOM INDEX objects_mime_idx ON testing.objects (mime) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzed' : > 'true', 'analyzer_class' : > 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer', > 'tokenization_enable_stemming' : 'false', 'tokenization_locale' : 'en', > 'tokenization_normalize_lowercase' : 'true', 'tokenization_skip_stop_words' : > 'true'}; > > > Some of our SASI indexes are fairly large as we were testing the ability > to use SASI over elastic search or basic processing through spark. I will > run some more tests today and see if I can uncover anything. > > > On Fri, Aug 5, 2016 at 10:36 AM, DuyHai Doan wrote: > >> Ok the fact that you see some rows and after a while you see 0 rows means >> that those rows are deleted. >> >> Since SASI does only index INSERT & UPDATE but not DELETE, management of >> tombstones is let to Cassandra to handle. >> >> It means that if you do an INSERT, you'll have an entry into SASI index >> file but when you do a DELETE, SASI does not remove the entry from its >> index file. >> >> When reading, SASI will give the partition offset to Cassandra and >> Cassandra will fetch the data from SSTables, then realises that there is a >> tombstone, thus return 0 row. >> >> The only moment those entries will be remove from SASI index file is when >> your SSTable get compacted and the data are purged. >> >> The fact that you can see some rows then 0 rows mean that some of your >> replicas have missed the tombstones. >> >> "However, after about 20 attempts, all servers started to only return 0 >> results. " --> Read-repair kicks in so the tombstones are propagated and >> then you see 0 row. >> >> >> >> On Tue, Aug 2, 2016 at 10:52 PM, George Webster >> wrote: >> >>> The indexes were written about 1-2 months ago. No data has been added to >>> the servers since the indexes were created. Additionally, the indexes >>> appeared to be stable until I noticed the issue today. ... which occurred >>> after a made a large query without setting a LIMIT >>> >>> I set the consistency level and moved the select statement between >>> different nodes. The results remained inconsistent, returning a random >>> number between 0 and 8. It did not appear to make much difference between >>> the different nodes or consistency level. However, after about 20 attempts, >>> all servers started to only return 0 results. >>> >>> >>> Lastly, this appeared in the logs during that time: >>> >>> INFO [IndexSummaryManager:1] 2016-08-02 22:11:43,245 >>> IndexSummaryRedistribution.java:74 - Redistributing index summaries >>> >>> INFO [OptionalTasks:1] 2016-08-02 22:25:06,508 NoSpamLogger.java:91 - >>> Maximum memory usage reached (536870912 bytes), cannot allocate chunk of >>> 1048576 bytes >>> >>> On Tue, Aug 2, 2016 at 6:58 PM, DuyHai Doan >>> wrote: >>> One possible explanation is that you're querying data while the index files are being built so that the result are different The second possible explanation is the consistency level. Try the query again using CL = QUORUM, try on several nodes to see if the results are different On Tue, Aug 2, 2016 at 6:32 PM, George Webster wrote: > Hey DuyHai, > Thank you for your help. > > 1) Cassandra version > [cqlsh 5.0.1 | Cassandra 3.5 | CQL spec 3.4.0 | Native protocol v4] > > > 2) CREATE CUSTOM INDEX statement for your index > > CREATE CUSTOM INDEX objects_mime_idx ON test
Re: Merging cells in compaction / compression?
Hi, As Spark is an example of something I really don't want. It's resource heavy, it involves copying data and it involves managing yet another distributed system. Actually I would also need a distributed system to schedule the spark jobs also. Sounds like a nightmare to implement a compression method. Might as well run Hadoop. - Micke - Original Message - From: "DuyHai Doan" To: user@cassandra.apache.org Sent: Thursday, August 4, 2016 11:26:09 PM Subject: Re: Merging cells in compaction / compression? Look like you're asking for some sort of ETL on your C* data, why not use Spark to compress those data into blobs and use User-Defined-Function to explode them when reading ? On Thu, Aug 4, 2016 at 10:08 PM, Michael Burman wrote: > Hi, > > No, I don't want to lose precision (if that's what you meant), but if you > meant just storing them in a larger bucket (which I could decompress either > on client side or server side). To clarify, it could be like: > > 04082016T230215.1234, value > 04082016T230225.4321, value > 04082016T230235.2563, value > 04082016T230245.1145, value > 04082016T230255.0204, value > > -> > > 04082016T230200 -> blob (that has all the points for this minute stored - > no data is lost to aggregated avgs or sums or anything). > > That's acceptable, of course the prettiest solution would be to keep this > hidden from a client so it would see while decompressing the original rows > (like with byte[] compressors), but this is acceptable for my use-case. If > this is what you meant, then yes. > > - Micke > > - Original Message - > From: "Eric Stevens" > To: user@cassandra.apache.org > Sent: Thursday, August 4, 2016 10:26:30 PM > Subject: Re: Merging cells in compaction / compression? > > When you say merge cells, do you mean re-aggregating the data into courser > time buckets? > > On Thu, Aug 4, 2016 at 5:59 AM Michael Burman wrote: > > > Hi, > > > > Considering the following example structure: > > > > CREATE TABLE data ( > > metric text, > > value double, > > time timestamp, > > PRIMARY KEY((metric), time) > > ) WITH CLUSTERING ORDER BY (time DESC) > > > > The natural inserting order is metric, value, timestamp pairs, one > > metric/value pair per second for example. That means creating more and > more > > cells to the same partition, which creates a large amount of overhead and > > reduces the compression ratio of LZ4 & Deflate (LZ4 reaches ~0.26 and > > Deflate ~0.10 ratios in some of the examples I've run). Now, to improve > > compression ratio, how could I merge the cells on the actual Cassandra > > node? I looked at ICompress and it provides only byte-level compression. > > > > Could I do this on the compaction phase, by extending the > > DateTieredCompaction for example? It has SSTableReader/Writer facilities > > and it seems to be able to see the rows? I'm fine with the fact that > repair > > run might have to do some conflict resolution as the final merged rows > > would be quite "small" (50kB) in size. The naive approach is of course to > > fetch all the rows from Cassandra - merge them on the client and send > back > > to the Cassandra, but this seems very wasteful and has its own problems. > > Compared to table-LZ4 I was able to reduce the required size to 1/20th > > (context-aware compression is sometimes just so much better) so there are > > real benefits to this approach, even if I would probably violate multiple > > design decisions. > > > > One approach is of course to write to another storage first and once the > > blocks are ready, write them to Cassandra. But that again seems idiotic > (I > > know some people are using Kafka in front of Cassandra for example, but > > that means maintaining yet another distributed solution and defeats the > > benefit of Cassandra's easy management & scalability). > > > > Has anyone done something similar? Even planned? If I need to extend > > something in Cassandra I can accept that approach also - but as I'm not > > that familiar with Cassandra source code I could use some hints. > > > > - Micke > > >
Sync failed between in AntiEntropySessions - Repair
Hi guys, Doing a repair I got this error for 2 tokenranges. ERROR [Thread-2499244] 2016-08-04 20:05:24,288 StorageService.java:3068 - Repair session 41e4bab0-5a63-11e6-9993-e11d93fd5b40 for range (487410372471205090,492009442088088379] failed with error org.apache.cassandra.exceptions.RepairExcep tion: [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, (487410372471205090,492009442088088379]] Sync failed between /192.168.0.144 and /192.168.0.37 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, (487410372471205090,492009442088088379]] Sync failed between /192.168.0.144 and /192.168.0.37 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_60] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_60] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3059) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.14.jar:2.1.14] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, (487410372471205090,492009442088088379]] Sync failed between /10.234.72.144 and /10.234.86.37 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) [apache-cassandra-2.1.14.jar:2.1.14] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_60] ... 1 common frames omitted Caused by: org.apache.cassandra.exceptions.RepairException: [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, (487410372471205090,492009442088088379]] Sync failed between /192.168.0.144 and /192.168.0.37 at org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:223) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:422) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.14.jar:2.1.14] ... 3 common frames omitted Is it a network problem or we are having a bug ? There is nothing else in the log of cassandra more than Sync failed between. We are having cassandra 2.1.14 Best regards Jean Carlo "The best way to predict the future is to invent it" Alan Kay
Re: Merging cells in compaction / compression?
Hadoop and Cassandra have very different use cases. If the ability to write a custom compression system is the primary factor in how you choose your database I suspect you may run into some trouble. Jon On Fri, Aug 5, 2016 at 6:14 AM Michael Burman wrote: > Hi, > > As Spark is an example of something I really don't want. It's resource > heavy, it involves copying data and it involves managing yet another > distributed system. Actually I would also need a distributed system to > schedule the spark jobs also. > > Sounds like a nightmare to implement a compression method. Might as well > run Hadoop. > > - Micke > > - Original Message - > From: "DuyHai Doan" > To: user@cassandra.apache.org > Sent: Thursday, August 4, 2016 11:26:09 PM > Subject: Re: Merging cells in compaction / compression? > > Look like you're asking for some sort of ETL on your C* data, why not use > Spark to compress those data into blobs and use User-Defined-Function to > explode them when reading ? > > On Thu, Aug 4, 2016 at 10:08 PM, Michael Burman > wrote: > > > Hi, > > > > No, I don't want to lose precision (if that's what you meant), but if you > > meant just storing them in a larger bucket (which I could decompress > either > > on client side or server side). To clarify, it could be like: > > > > 04082016T230215.1234, value > > 04082016T230225.4321, value > > 04082016T230235.2563, value > > 04082016T230245.1145, value > > 04082016T230255.0204, value > > > > -> > > > > 04082016T230200 -> blob (that has all the points for this minute stored - > > no data is lost to aggregated avgs or sums or anything). > > > > That's acceptable, of course the prettiest solution would be to keep this > > hidden from a client so it would see while decompressing the original > rows > > (like with byte[] compressors), but this is acceptable for my use-case. > If > > this is what you meant, then yes. > > > > - Micke > > > > - Original Message - > > From: "Eric Stevens" > > To: user@cassandra.apache.org > > Sent: Thursday, August 4, 2016 10:26:30 PM > > Subject: Re: Merging cells in compaction / compression? > > > > When you say merge cells, do you mean re-aggregating the data into > courser > > time buckets? > > > > On Thu, Aug 4, 2016 at 5:59 AM Michael Burman > wrote: > > > > > Hi, > > > > > > Considering the following example structure: > > > > > > CREATE TABLE data ( > > > metric text, > > > value double, > > > time timestamp, > > > PRIMARY KEY((metric), time) > > > ) WITH CLUSTERING ORDER BY (time DESC) > > > > > > The natural inserting order is metric, value, timestamp pairs, one > > > metric/value pair per second for example. That means creating more and > > more > > > cells to the same partition, which creates a large amount of overhead > and > > > reduces the compression ratio of LZ4 & Deflate (LZ4 reaches ~0.26 and > > > Deflate ~0.10 ratios in some of the examples I've run). Now, to improve > > > compression ratio, how could I merge the cells on the actual Cassandra > > > node? I looked at ICompress and it provides only byte-level > compression. > > > > > > Could I do this on the compaction phase, by extending the > > > DateTieredCompaction for example? It has SSTableReader/Writer > facilities > > > and it seems to be able to see the rows? I'm fine with the fact that > > repair > > > run might have to do some conflict resolution as the final merged rows > > > would be quite "small" (50kB) in size. The naive approach is of course > to > > > fetch all the rows from Cassandra - merge them on the client and send > > back > > > to the Cassandra, but this seems very wasteful and has its own > problems. > > > Compared to table-LZ4 I was able to reduce the required size to 1/20th > > > (context-aware compression is sometimes just so much better) so there > are > > > real benefits to this approach, even if I would probably violate > multiple > > > design decisions. > > > > > > One approach is of course to write to another storage first and once > the > > > blocks are ready, write them to Cassandra. But that again seems idiotic > > (I > > > know some people are using Kafka in front of Cassandra for example, but > > > that means maintaining yet another distributed solution and defeats the > > > benefit of Cassandra's easy management & scalability). > > > > > > Has anyone done something similar? Even planned? If I need to extend > > > something in Cassandra I can accept that approach also - but as I'm not > > > that familiar with Cassandra source code I could use some hints. > > > > > > - Micke > > > > > >
Re: Sync failed between in AntiEntropySessions - Repair
It seems you have a streaming error, look for ERROR statement in the streaming classes before that which may give you a more specific root cause. In any case, I'd suggest you to upgrade to 2.1.15 as there were a couple of streaming fixes on this version that might help. 2016-08-05 11:15 GMT-03:00 Jean Carlo : > > Hi guys, Doing a repair I got this error for 2 tokenranges. > > ERROR [Thread-2499244] 2016-08-04 20:05:24,288 StorageService.java:3068 - > Repair session 41e4bab0-5a63-11e6-9993-e11d93fd5b40 for range > (487410372471205090,492009442088088379] failed with error > org.apache.cassandra.exceptions.RepairExcep > tion: [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, > (487410372471205090,492009442088088379]] Sync failed between / > 192.168.0.144 and /192.168.0.37 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, > (487410372471205090,492009442088088379]] > Sync failed between /192.168.0.144 and /192.168.0.37 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > [na:1.8.0_60] > at > org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3059) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > [apache-cassandra-2.1.14.jar:2.1.14] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > Caused by: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: > [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, > (487410372471205090,492009442088088379]] Sync failed between / > 10.234.72.144 and > /10.234.86.37 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > [apache-cassandra-2.1.14.jar:2.1.14] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_60] > ... 1 common frames omitted > Caused by: org.apache.cassandra.exceptions.RepairException: [repair > #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, > (487410372471205090,492009442088088379]] > Sync failed between /192.168.0.144 and /192.168.0.37 > at > org.apache.cassandra.repair.RepairSession.syncComplete(RepairSession.java:223) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:422) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb( > RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.14.jar:2.1.14] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > ~[apache-cassandra-2.1.14.jar:2.1.14] > ... 3 common frames omitted > > Is it a network problem or we are having a bug ? There is nothing else in > the log of cassandra more than > Sync failed between. > > We are having cassandra 2.1.14 > > Best regards > > Jean Carlo > > "The best way to predict the future is to invent it" Alan Kay >
Re: Merging cells in compaction / compression?
Hi, For storing time series data, storage disk usage is quite significant factor - time series applications generate a lot of data (and of course the newest data is most important). Given that even DateTiered compaction was designed in keeping mind of these specialities of time series data, wouldn't it make sense to also improve the storage efficiency? Cassandra 3.x's one of the key improvements was that improved storage engine - but it's still far away from being efficient with time series data. Efficient compression methods for both floating points & integers have a lot of research behind them and can be applied to time series data. I wish to apply these methods to improve storage efficiency - and performance* * In my experience, storing blocks of data and decompressing them on the client side instead of letting Cassandra read more rows improves performance by several times. The query patterns for time series data are often in requesting a range of data (instead of single datapoint). And I wasn't comparing Cassandra & Hadoop, but the combination of Spark+Cassandra+distributed-scheduler+other stuff vs. a Hadoop installation. At that point they are quite comparable in many cases, with latter being easier to manage in the end. I don't want either for a simple time series storage solution as I have no need for other components than data storage. - Micke - Original Message - From: "Jonathan Haddad" To: user@cassandra.apache.org Sent: Friday, August 5, 2016 5:22:58 PM Subject: Re: Merging cells in compaction / compression? Hadoop and Cassandra have very different use cases. If the ability to write a custom compression system is the primary factor in how you choose your database I suspect you may run into some trouble. Jon On Fri, Aug 5, 2016 at 6:14 AM Michael Burman wrote: > Hi, > > As Spark is an example of something I really don't want. It's resource > heavy, it involves copying data and it involves managing yet another > distributed system. Actually I would also need a distributed system to > schedule the spark jobs also. > > Sounds like a nightmare to implement a compression method. Might as well > run Hadoop. > > - Micke > > - Original Message - > From: "DuyHai Doan" > To: user@cassandra.apache.org > Sent: Thursday, August 4, 2016 11:26:09 PM > Subject: Re: Merging cells in compaction / compression? > > Look like you're asking for some sort of ETL on your C* data, why not use > Spark to compress those data into blobs and use User-Defined-Function to > explode them when reading ? > > On Thu, Aug 4, 2016 at 10:08 PM, Michael Burman > wrote: > > > Hi, > > > > No, I don't want to lose precision (if that's what you meant), but if you > > meant just storing them in a larger bucket (which I could decompress > either > > on client side or server side). To clarify, it could be like: > > > > 04082016T230215.1234, value > > 04082016T230225.4321, value > > 04082016T230235.2563, value > > 04082016T230245.1145, value > > 04082016T230255.0204, value > > > > -> > > > > 04082016T230200 -> blob (that has all the points for this minute stored - > > no data is lost to aggregated avgs or sums or anything). > > > > That's acceptable, of course the prettiest solution would be to keep this > > hidden from a client so it would see while decompressing the original > rows > > (like with byte[] compressors), but this is acceptable for my use-case. > If > > this is what you meant, then yes. > > > > - Micke > > > > - Original Message - > > From: "Eric Stevens" > > To: user@cassandra.apache.org > > Sent: Thursday, August 4, 2016 10:26:30 PM > > Subject: Re: Merging cells in compaction / compression? > > > > When you say merge cells, do you mean re-aggregating the data into > courser > > time buckets? > > > > On Thu, Aug 4, 2016 at 5:59 AM Michael Burman > wrote: > > > > > Hi, > > > > > > Considering the following example structure: > > > > > > CREATE TABLE data ( > > > metric text, > > > value double, > > > time timestamp, > > > PRIMARY KEY((metric), time) > > > ) WITH CLUSTERING ORDER BY (time DESC) > > > > > > The natural inserting order is metric, value, timestamp pairs, one > > > metric/value pair per second for example. That means creating more and > > more > > > cells to the same partition, which creates a large amount of overhead > and > > > reduces the compression ratio of LZ4 & Deflate (LZ4 reaches ~0.26 and > > > Deflate ~0.10 ratios in some of the examples I've run). Now, to improve > > > compression ratio, how could I merge the cells on the actual Cassandra > > > node? I looked at ICompress and it provides only byte-level > compression. > > > > > > Could I do this on the compaction phase, by extending the > > > DateTieredCompaction for example? It has SSTableReader/Writer > facilities > > > and it seems to be able to see the rows? I'm fine with the fact that > > repair > > > run might have to do some conflict resolution as the final merged rows > > > would be q
Re: Sync failed between in AntiEntropySessions - Repair
Hi Paulo I found the lines, we got an exception "Outgoing stream handler has been closed" these are ERROR [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 StreamSession.java:621 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Remote peer 192.168.0.36 failed stream session. INFO [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 StreamResultFuture.java:180 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Session with /192.168.0.36 is complete WARN [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,773 StreamResultFuture.java:207 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Stream failed ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 StreamReceiveTask.java:183 - Error applying streamed data: java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar:2.1.14] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 StreamSession.java:505 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Streaming error occurred java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar:2.1.14] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] I al looking through the changes files to see if it is a bug fixed in the 2.1.15 Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Fri, Aug 5, 2016 at 4:24 PM, Paulo Motta wrote: > It seems you have a streaming error, look for ERROR statement in the > streaming classes before that which may give you a more specific root > cause. In any case, I'd suggest you to upgrade to 2.1.15 as there were a > couple of streaming fixes on this version that might help. > > 2016-08-05 11:15 GMT-03:00 Jean Carlo : > >> >> Hi guys, Doing a repair I got this error for 2 tokenranges. >> >> ERROR [Thread-2499244] 2016-08-04 20:05:24,288 StorageService.java:3068 - >> Repair session 41e4bab0-5a63-11e6-9993-e11d93fd5b40 for range >> (487410372471205090,492009442088088379] failed with error >> org.apache.cassandra.exceptions.RepairExcep >> tion: [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, >> (487410372471205090,492009442088088379]] Sync failed between / >> 192.168.0.144 and /192.168.0.37 >> java.util.concurrent.ExecutionException: java.lang.RuntimeException: >> org.apache.cassandra.exceptions.RepairException: [repair >> #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, >> (487410372471205090,492009442088088379]] Sync failed between / >> 192.168.0.144 and /192.168.0.37 >> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >> [na:1.8.0_60] >> at java.util.concurrent.FutureTask.get(FutureTask.java:192) >> [na:1.8.0_60] >> at >> org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3059) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >> [apache-cassandra-2.1.14.jar:2.1.14] >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> [na:1.8.0_60] >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> [na:1.8.0_60] >> a
Re: Merging cells in compaction / compression?
I think Duy Hai was suggesting Spark Streaming, which gives you the tools to build exactly what you asked for. A custom compression system for packing batches of values for a partition into an optimized byte array. On Fri, Aug 5, 2016 at 7:46 AM Michael Burman wrote: > Hi, > > For storing time series data, storage disk usage is quite significant > factor - time series applications generate a lot of data (and of course the > newest data is most important). Given that even DateTiered compaction was > designed in keeping mind of these specialities of time series data, > wouldn't it make sense to also improve the storage efficiency? Cassandra > 3.x's one of the key improvements was that improved storage engine - but > it's still far away from being efficient with time series data. > > Efficient compression methods for both floating points & integers have a > lot of research behind them and can be applied to time series data. I wish > to apply these methods to improve storage efficiency - and performance* > > * In my experience, storing blocks of data and decompressing them on the > client side instead of letting Cassandra read more rows improves > performance by several times. The query patterns for time series data are > often in requesting a range of data (instead of single datapoint). > > And I wasn't comparing Cassandra & Hadoop, but the combination of > Spark+Cassandra+distributed-scheduler+other stuff vs. a Hadoop > installation. At that point they are quite comparable in many cases, with > latter being easier to manage in the end. I don't want either for a simple > time series storage solution as I have no need for other components than > data storage. > > - Micke > > - Original Message - > From: "Jonathan Haddad" > To: user@cassandra.apache.org > Sent: Friday, August 5, 2016 5:22:58 PM > Subject: Re: Merging cells in compaction / compression? > > Hadoop and Cassandra have very different use cases. If the ability to > write a custom compression system is the primary factor in how you choose > your database I suspect you may run into some trouble. > > Jon > > On Fri, Aug 5, 2016 at 6:14 AM Michael Burman wrote: > > > Hi, > > > > As Spark is an example of something I really don't want. It's resource > > heavy, it involves copying data and it involves managing yet another > > distributed system. Actually I would also need a distributed system to > > schedule the spark jobs also. > > > > Sounds like a nightmare to implement a compression method. Might as well > > run Hadoop. > > > > - Micke > > > > - Original Message - > > From: "DuyHai Doan" > > To: user@cassandra.apache.org > > Sent: Thursday, August 4, 2016 11:26:09 PM > > Subject: Re: Merging cells in compaction / compression? > > > > Look like you're asking for some sort of ETL on your C* data, why not use > > Spark to compress those data into blobs and use User-Defined-Function to > > explode them when reading ? > > > > On Thu, Aug 4, 2016 at 10:08 PM, Michael Burman > > wrote: > > > > > Hi, > > > > > > No, I don't want to lose precision (if that's what you meant), but if > you > > > meant just storing them in a larger bucket (which I could decompress > > either > > > on client side or server side). To clarify, it could be like: > > > > > > 04082016T230215.1234, value > > > 04082016T230225.4321, value > > > 04082016T230235.2563, value > > > 04082016T230245.1145, value > > > 04082016T230255.0204, value > > > > > > -> > > > > > > 04082016T230200 -> blob (that has all the points for this minute > stored - > > > no data is lost to aggregated avgs or sums or anything). > > > > > > That's acceptable, of course the prettiest solution would be to keep > this > > > hidden from a client so it would see while decompressing the original > > rows > > > (like with byte[] compressors), but this is acceptable for my use-case. > > If > > > this is what you meant, then yes. > > > > > > - Micke > > > > > > - Original Message - > > > From: "Eric Stevens" > > > To: user@cassandra.apache.org > > > Sent: Thursday, August 4, 2016 10:26:30 PM > > > Subject: Re: Merging cells in compaction / compression? > > > > > > When you say merge cells, do you mean re-aggregating the data into > > courser > > > time buckets? > > > > > > On Thu, Aug 4, 2016 at 5:59 AM Michael Burman > > wrote: > > > > > > > Hi, > > > > > > > > Considering the following example structure: > > > > > > > > CREATE TABLE data ( > > > > metric text, > > > > value double, > > > > time timestamp, > > > > PRIMARY KEY((metric), time) > > > > ) WITH CLUSTERING ORDER BY (time DESC) > > > > > > > > The natural inserting order is metric, value, timestamp pairs, one > > > > metric/value pair per second for example. That means creating more > and > > > more > > > > cells to the same partition, which creates a large amount of overhead > > and > > > > reduces the compression ratio of LZ4 & Deflate (LZ4 reaches ~0.26 and > > > > Deflate ~0.10 ratios in some of the examples I've run).
Re: Merging cells in compaction / compression?
Btw, I'm not trying to say what you're asking for is a bad idea, or shouldn't / can't be done. If you're asking for a new feature, you should file a JIRA with all the details you provided above. Just keep in mind it'll be a while before it ends up in a stable version. The advice on this ML will usually gravitate towards solving your problem with the tools that are available today, as "wait a year or so" is usually unacceptable. https://issues.apache.org/jira/browse/cassandra/ On Fri, Aug 5, 2016 at 8:10 AM Jonathan Haddad wrote: > I think Duy Hai was suggesting Spark Streaming, which gives you the tools > to build exactly what you asked for. A custom compression system for > packing batches of values for a partition into an optimized byte array. > > On Fri, Aug 5, 2016 at 7:46 AM Michael Burman wrote: > >> Hi, >> >> For storing time series data, storage disk usage is quite significant >> factor - time series applications generate a lot of data (and of course the >> newest data is most important). Given that even DateTiered compaction was >> designed in keeping mind of these specialities of time series data, >> wouldn't it make sense to also improve the storage efficiency? Cassandra >> 3.x's one of the key improvements was that improved storage engine - but >> it's still far away from being efficient with time series data. >> >> Efficient compression methods for both floating points & integers have a >> lot of research behind them and can be applied to time series data. I wish >> to apply these methods to improve storage efficiency - and performance* >> >> * In my experience, storing blocks of data and decompressing them on the >> client side instead of letting Cassandra read more rows improves >> performance by several times. The query patterns for time series data are >> often in requesting a range of data (instead of single datapoint). >> >> And I wasn't comparing Cassandra & Hadoop, but the combination of >> Spark+Cassandra+distributed-scheduler+other stuff vs. a Hadoop >> installation. At that point they are quite comparable in many cases, with >> latter being easier to manage in the end. I don't want either for a simple >> time series storage solution as I have no need for other components than >> data storage. >> >> - Micke >> >> - Original Message - >> From: "Jonathan Haddad" >> To: user@cassandra.apache.org >> Sent: Friday, August 5, 2016 5:22:58 PM >> Subject: Re: Merging cells in compaction / compression? >> >> Hadoop and Cassandra have very different use cases. If the ability to >> write a custom compression system is the primary factor in how you choose >> your database I suspect you may run into some trouble. >> >> Jon >> >> On Fri, Aug 5, 2016 at 6:14 AM Michael Burman >> wrote: >> >> > Hi, >> > >> > As Spark is an example of something I really don't want. It's resource >> > heavy, it involves copying data and it involves managing yet another >> > distributed system. Actually I would also need a distributed system to >> > schedule the spark jobs also. >> > >> > Sounds like a nightmare to implement a compression method. Might as well >> > run Hadoop. >> > >> > - Micke >> > >> > - Original Message - >> > From: "DuyHai Doan" >> > To: user@cassandra.apache.org >> > Sent: Thursday, August 4, 2016 11:26:09 PM >> > Subject: Re: Merging cells in compaction / compression? >> > >> > Look like you're asking for some sort of ETL on your C* data, why not >> use >> > Spark to compress those data into blobs and use User-Defined-Function to >> > explode them when reading ? >> > >> > On Thu, Aug 4, 2016 at 10:08 PM, Michael Burman >> > wrote: >> > >> > > Hi, >> > > >> > > No, I don't want to lose precision (if that's what you meant), but if >> you >> > > meant just storing them in a larger bucket (which I could decompress >> > either >> > > on client side or server side). To clarify, it could be like: >> > > >> > > 04082016T230215.1234, value >> > > 04082016T230225.4321, value >> > > 04082016T230235.2563, value >> > > 04082016T230245.1145, value >> > > 04082016T230255.0204, value >> > > >> > > -> >> > > >> > > 04082016T230200 -> blob (that has all the points for this minute >> stored - >> > > no data is lost to aggregated avgs or sums or anything). >> > > >> > > That's acceptable, of course the prettiest solution would be to keep >> this >> > > hidden from a client so it would see while decompressing the original >> > rows >> > > (like with byte[] compressors), but this is acceptable for my >> use-case. >> > If >> > > this is what you meant, then yes. >> > > >> > > - Micke >> > > >> > > - Original Message - >> > > From: "Eric Stevens" >> > > To: user@cassandra.apache.org >> > > Sent: Thursday, August 4, 2016 10:26:30 PM >> > > Subject: Re: Merging cells in compaction / compression? >> > > >> > > When you say merge cells, do you mean re-aggregating the data into >> > courser >> > > time buckets? >> > > >> > > On Thu, Aug 4, 2016 at 5:59 AM Michael Burman >> > wrot
Re: Sync failed between in AntiEntropySessions - Repair
you need to check 192.168.0.36/10.234.86.36 for streaming ERRORS 2016-08-05 12:08 GMT-03:00 Jean Carlo : > Hi Paulo > > I found the lines, we got an exception "Outgoing stream handler has been > closed" these are > > ERROR [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 > StreamSession.java:621 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] > Remote peer 192.168.0.36 failed stream session. > INFO [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 > StreamResultFuture.java:180 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] > Session with /192.168.0.36 is complete > WARN [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,773 > StreamResultFuture.java:207 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] > Stream failed > ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 > StreamReceiveTask.java:183 - Error applying streamed data: > java.lang.RuntimeException: Outgoing stream handler has been closed > at org.apache.cassandra.streaming.ConnectionHandler. > sendMessage(ConnectionHandler.java:138) ~[apache-cassandra-2.1.14.jar: > 2.1.14] > at org.apache.cassandra.streaming.StreamSession. > maybeCompleted(StreamSession.java:697) ~[apache-cassandra-2.1.14.jar: > 2.1.14] > at org.apache.cassandra.streaming.StreamSession. > taskCompleted(StreamSession.java:653) ~[apache-cassandra-2.1.14.jar: > 2.1.14] > at org.apache.cassandra.streaming.StreamReceiveTask$ > OnCompletionRunnable.run(StreamReceiveTask.java:179) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > > ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 > StreamSession.java:505 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] > Streaming error occurred > java.lang.RuntimeException: Outgoing stream handler has been closed > at org.apache.cassandra.streaming.ConnectionHandler. > sendMessage(ConnectionHandler.java:138) ~[apache-cassandra-2.1.14.jar: > 2.1.14] > at org.apache.cassandra.streaming.StreamSession. > maybeCompleted(StreamSession.java:697) ~[apache-cassandra-2.1.14.jar: > 2.1.14] > at org.apache.cassandra.streaming.StreamSession. > taskCompleted(StreamSession.java:653) ~[apache-cassandra-2.1.14.jar: > 2.1.14] > at org.apache.cassandra.streaming.StreamReceiveTask$ > OnCompletionRunnable.run(StreamReceiveTask.java:179) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > > I al looking through the changes files to see if it is a bug fixed in the > 2.1.15 > > > > Saludos > > Jean Carlo > > "The best way to predict the future is to invent it" Alan Kay > > On Fri, Aug 5, 2016 at 4:24 PM, Paulo Motta > wrote: > >> It seems you have a streaming error, look for ERROR statement in the >> streaming classes before that which may give you a more specific root >> cause. In any case, I'd suggest you to upgrade to 2.1.15 as there were a >> couple of streaming fixes on this version that might help. >> >> 2016-08-05 11:15 GMT-03:00 Jean Carlo : >> >>> >>> Hi guys, Doing a repair I got this error for 2 tokenranges. >>> >>> ERROR [Thread-2499244] 2016-08-04 20:05:24,288 StorageService.java:3068 >>> - Repair session 41e4bab0-5a63-11e6-9993-e11d93fd5b40 for range >>> (487410372471205090,492009442088088379] failed with error >>> org.apache.cassandra.exceptions.RepairExcep >>> tion: [repair #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, >>> (487410372471205090,492009442088088379]] Sync failed between / >>> 192.168.0.144 and /192.168.0.37 >>> java.util.concurrent.ExecutionException: java.lang.RuntimeException: >>> org.apache.cassandra.exceptions.RepairException: [repair >>> #41e4bab0-5a63-11e6-9993-e11d93fd5b40 on ks/cf_adv, (487410372471205090, >>> 492009442088088379]] Sync failed between /192.168.0.144 and / >>> 192.168.0.37 >>> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >>> [na:1.8.0_60] >>> at java.util.concurrent.FutureTask.get(FutureTask.java:192) >>> [na:1.8.0_60] >>> at >>> org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3059) >>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>> at >>> o
Re: Sync failed between in AntiEntropySessions - Repair
Hello Paulo, Thx for your fast replay. You are right about that node, I did not see it that fast. In this node we have errors of .SocketTimeoutException: null ERROR [STREAM-IN-/192.168.0.146] 2016-08-04 19:10:59,456 StreamSession.java:505 - [Stream #06c02460-5a5e-11e6-8e9a-a5bf51981ad8] Streaming error occurred java.net.SocketTimeoutException: null at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) ~[na:1.8.0_60] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_60] at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.8.0_60] at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257) ~[apache-cassandra-2.1.14.jar:2.1.14] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] We have this error with at least 5 more nodes in the same log. But the error doesn't say that much Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Fri, Aug 5, 2016 at 5:16 PM, Paulo Motta wrote: > you need to check 192.168.0.36/10.234.86.36 for streaming ERRORS > > 2016-08-05 12:08 GMT-03:00 Jean Carlo : > >> Hi Paulo >> >> I found the lines, we got an exception "Outgoing stream handler has been >> closed" these are >> >> ERROR [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 >> StreamSession.java:621 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >> Remote peer 192.168.0.36 failed stream session. >> INFO [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 >> StreamResultFuture.java:180 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >> Session with /192.168.0.36 is complete >> WARN [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,773 >> StreamResultFuture.java:207 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >> Stream failed >> ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 >> StreamReceiveTask.java:183 - Error applying streamed data: >> java.lang.RuntimeException: Outgoing stream handler has been closed >> at >> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at >> org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at >> org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio >> nRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar: >> 2.1.14] >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> [na:1.8.0_60] >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> [na:1.8.0_60] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [na:1.8.0_60] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_60] >> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] >> >> ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 >> StreamSession.java:505 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >> Streaming error occurred >> java.lang.RuntimeException: Outgoing stream handler has been closed >> at >> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at >> org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at >> org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio >> nRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar: >> 2.1.14] >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> [na:1.8.0_60] >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> [na:1.8.0_60] >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> [na:1.8.0_60] >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_60] >> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] >> >> I al looking through the changes files to see if it is a bug fixed in the >> 2.1.15 >> >> >> >> Saludos >> >> Jean Carlo >> >> "The best way to predict the future is to invent it" Alan Kay >> >> On Fri, Aug 5, 2016 at 4:24 PM, Paulo Motta >> wrote: >> >>> It seems you have a streaming error, look for ERROR statement in the >>> streaming classes before that which may give you a more specific root >>>
Re: Sync failed between in AntiEntropySessions - Repair
https://issues.apache.org/jira/browse/CASSANDRA-11840 increase streaming_socket_timeout to 8640 or upgrade to cassandra-2.1.15. 2016-08-05 12:28 GMT-03:00 Jean Carlo : > > Hello Paulo, > > Thx for your fast replay. > > You are right about that node, I did not see it that fast. In this node we > have errors of .SocketTimeoutException: null > > > > ERROR [STREAM-IN-/192.168.0.146] 2016-08-04 19:10:59,456 > StreamSession.java:505 - [Stream #06c02460-5a5e-11e6-8e9a-a5bf51981ad8] > Streaming error occurred > java.net.SocketTimeoutException: null > at > sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) > ~[na:1.8.0_60] > at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) > ~[na:1.8.0_60] > at > java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) > ~[na:1.8.0_60] > at org.apache.cassandra.streaming.messages. > StreamMessage.deserialize(StreamMessage.java:51) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at org.apache.cassandra.streaming.ConnectionHandler$ > IncomingMessageHandler.run(ConnectionHandler.java:257) > ~[apache-cassandra-2.1.14.jar:2.1.14] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > > We have this error with at least 5 more nodes in the same log. But the > error doesn't say that much > > > Saludos > > Jean Carlo > > "The best way to predict the future is to invent it" Alan Kay > > On Fri, Aug 5, 2016 at 5:16 PM, Paulo Motta > wrote: > >> you need to check 192.168.0.36/10.234.86.36 for streaming ERRORS >> >> 2016-08-05 12:08 GMT-03:00 Jean Carlo : >> >>> Hi Paulo >>> >>> I found the lines, we got an exception "Outgoing stream handler has been >>> closed" these are >>> >>> ERROR [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 >>> StreamSession.java:621 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >>> Remote peer 192.168.0.36 failed stream session. >>> INFO [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 >>> StreamResultFuture.java:180 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >>> Session with /192.168.0.36 is complete >>> WARN [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,773 >>> StreamResultFuture.java:207 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >>> Stream failed >>> ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 >>> StreamReceiveTask.java:183 - Error applying streamed data: >>> java.lang.RuntimeException: Outgoing stream handler has been closed >>> at >>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) >>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>> at >>> org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) >>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>> at >>> org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) >>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio >>> nRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar: >>> 2.1.14] >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> [na:1.8.0_60] >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> [na:1.8.0_60] >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> [na:1.8.0_60] >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> [na:1.8.0_60] >>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] >>> >>> ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 >>> StreamSession.java:505 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] >>> Streaming error occurred >>> java.lang.RuntimeException: Outgoing stream handler has been closed >>> at >>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) >>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>> at >>> org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) >>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>> at >>> org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) >>> ~[apache-cassandra-2.1.14.jar:2.1.14] >>> at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio >>> nRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar: >>> 2.1.14] >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> [na:1.8.0_60] >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> [na:1.8.0_60] >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> [na:1.8.0_60] >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> [na:1.8.0_60] >>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] >>> >>> I al looking through the changes files to see if it is a bug fixed in >>>
Re: Sync failed between in AntiEntropySessions - Repair
Thank you very much Paulo On Aug 5, 2016 17:31, "Paulo Motta" wrote: > https://issues.apache.org/jira/browse/CASSANDRA-11840 > > increase streaming_socket_timeout to 8640 or upgrade to > cassandra-2.1.15. > > 2016-08-05 12:28 GMT-03:00 Jean Carlo : > >> >> Hello Paulo, >> >> Thx for your fast replay. >> >> You are right about that node, I did not see it that fast. In this node >> we have errors of .SocketTimeoutException: null >> >> >> >> ERROR [STREAM-IN-/192.168.0.146] 2016-08-04 19:10:59,456 >> StreamSession.java:505 - [Stream #06c02460-5a5e-11e6-8e9a-a5bf51981ad8] >> Streaming error occurred >> java.net.SocketTimeoutException: null >> at >> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) >> ~[na:1.8.0_60] >> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) >> ~[na:1.8.0_60] >> at >> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) >> ~[na:1.8.0_60] >> at org.apache.cassandra.streaming.messages.StreamMessage. >> deserialize(StreamMessage.java:51) ~[apache-cassandra-2.1.14.jar:2.1.14] >> at org.apache.cassandra.streaming.ConnectionHandler$IncomingMes >> sageHandler.run(ConnectionHandler.java:257) >> ~[apache-cassandra-2.1.14.jar:2.1.14] >> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] >> >> We have this error with at least 5 more nodes in the same log. But the >> error doesn't say that much >> >> >> Saludos >> >> Jean Carlo >> >> "The best way to predict the future is to invent it" Alan Kay >> >> On Fri, Aug 5, 2016 at 5:16 PM, Paulo Motta >> wrote: >> >>> you need to check 192.168.0.36/10.234.86.36 for streaming ERRORS >>> >>> 2016-08-05 12:08 GMT-03:00 Jean Carlo : >>> Hi Paulo I found the lines, we got an exception "Outgoing stream handler has been closed" these are ERROR [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 StreamSession.java:621 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Remote peer 192.168.0.36 failed stream session. INFO [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,772 StreamResultFuture.java:180 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Session with /192.168.0.36 is complete WARN [STREAM-IN-/10.234.86.36] 2016-08-04 16:55:53,773 StreamResultFuture.java:207 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Stream failed ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 StreamReceiveTask.java:183 - Error applying streamed data: java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio nRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar:2.1.14] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] ERROR [StreamReceiveTask:107995] 2016-08-04 16:55:53,782 StreamSession.java:505 - [Stream #c4e79260-5a46-11e6-9993-e11d93fd5b40] Streaming error occurred java.lang.RuntimeException: Outgoing stream handler has been closed at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:697) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) ~[apache-cassandra-2.1.14.jar:2.1.14] at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletio nRunnable.run(StreamReceiveTask.java:179) ~[apache-cassandra-2.1.14.jar:2.1.14] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Wo
Re: CPU high load
Thank you, Alain. There was no frequent GC nor compaction so it have been a mystery,however, once I stopped chef-client(we're managing the cluster though chef-cookbook), the load was eased for almost all of the servers. so we're now refactoring our cookbook, in the meanwhile, we also decided to rebuild a cluster with DSE5.0.1. Thank you very much for your advices on the debugging processes, Aoi 2016-07-20 4:03 GMT-07:00 Alain RODRIGUEZ : > Hi Aoi, > >> >> since few weeks >> ago, all of the cluster nodes are hitting avg. 15-20 cpu load. >> These nodes are running on VMs(VMware vSphere) that have 8vcpu >> (1core/socket)-16 vRAM.(JVM options : -Xms8G -Xmx8G -Xmn800M) > > > I take my chance, a few ideas / questions below: > > What Cassandra version are you running? > How is your GC doing? > > Run something like: grep "GC" /var/log/cassandra/system.log > If you have a lot of long CMS pauses you might not be keeping things in the > new gen long enough: Xmn800M looks too small to me, it has been a default > but I never saw a case where this setting worked better than a higher value > (let's say 2G), also tenuring threshold gives better results if set a bit > higher than default (let's say 16). Those options are in cassandra-env.sh. > > Do you have other warnings or errors? Anything about tombstones or > compacting wide rows incrementally? > What compaction strategy are you using > How many concurrent compactors do you use (if you have 8 cores, this value > should probably be between 2 and 6, 4 is a good starting point) > If your compaction is not fast enough and disk are doing fine, consider > increasing the compaction throughput from default 16 to 32 or 64 Mbps to > mitigate the impact of the point above. > Do you use compression ? What kind ? > Did the request count increased recently? Do you consider adding capacity or > do you think you're hitting a new bug / issue that is worth it investigating > / solving? > Are you using default configuration? What did you change? > > No matter what you try, do it as much as possible on one canary node first, > and incrementally (one change at the time - using NEWHEAP = 2GB + > tenuringThreshold = 16 would be one change, it makes sense to move those 2 > values together) > >> >> I have enabled a auto repair service on opscenter and it's running behind > > > Also when did you do that, starting repairs? Repair is an expensive > operation, consuming a lot of resources that is often needed, but that is > hard to tune correctly. Are you sure you have enough CPU power to handle the > load + repairs? > > Some other comments probably not directly related: > >> >> I also realized that my cluster isn't well balanced > > > Well you cluster looks balanced to me 7 GB isn't that far from 11 GB. To > have a more accurate information, use 'nodetool status mykeyspace'. This way > ownership will be displayed, replacing (?) by ownership (xx %). Total > ownership = 300 % in your case (RF=3) > >> >> I am running 6 nodes vnode cluster with DSE 4.8.1, and since few weeks >> ago, all of the cluster nodes are hitting avg. 15-20 cpu load. > > > By the way, from > https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/RNdse.html: > > "Warning: DataStax does not recommend 4.8.1 or 4.8.2 versions for > production, see warning. Use 4.8.3 instead.". > > I am not sure what happened there but I would move to 4.8.3+ asap, datastax > people know their products and I don't like this kind of orange and bold > warnings :-). > > C*heers, > --- > Alain Rodriguez - al...@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > 2016-07-14 4:36 GMT+02:00 Aoi Kadoya : >> >> Hi Romain, >> >> No, I don't think we upgraded cassandra version or changed any of >> those schema elements. After I realized this high load issue, I found >> that some of the tables have a shorter gc_grace_seconds(1day) than the >> rest and because it seemed causing constant compaction cycles, I have >> changed them to 10days. but again, that's after load hit this high >> number. >> some of nodes got eased a little bit after changing gc_grace_seconds >> values and repairing nodes, but since few days ago, all of nodes are >> constantly reporting load 15-20. >> >> Thank you for the suggestion about logging, let me try to change the >> log level to see what I can get from it. >> >> Thanks, >> Aoi >> >> >> 2016-07-13 13:28 GMT-07:00 Romain Hardouin : >> > Did you upgrade from a previous version? DId you make some schema >> > changes >> > like compaction strategy, compression, bloom filter, etc.? >> > What about the R/W requests? >> > SharedPool Workers are... shared ;-) Put logs in debug to see some >> > examples >> > of what services are using this pool (many actually). >> > >> > Best, >> > >> > Romain >> > >> > >> > Le Mercredi 13 juillet 2016 18h15, Patrick McFadin >> > a >> > écrit : >> > >> > >> > Might be more clear looking at nodetool tpstats >> >
OutOfMemoryError when initializing a secondary index
Running Cassandra 3.0.7 we have 3 out of 6 nodes that threw an OOM error when a developer created a secondary index. I'm trying to repair the cluster. I stopped all nodes, deleted all traces of the table and secondary index from disk, removed commit logs and saved caches, and restarted the instances. The 3 nodes that didn't have the OOM error started fine, but the other three are getting stuck while trying to initialize the secondary index – which shouldn't even have data to load. """ ... INFO 19:51:59 Initializing notifications_v1.notifications_tray INFO 19:51:59 Initializing notifications_v1.notifications_tray.notifications_tray_event_id """ The instances spin for a long time then throw an OutOfMemoryError. I don't need to save this table, but I do need to save other keyspaces. Is there any way I can get these nodes operational again?