Re: Pending compactions not going down on some nodes of the cluster

2016-03-21 Thread Gianluca Borello
increase compaction_throupghput to more than 16MB/s (48 may be a good > start) > > > What kind of data are you storing in theses tables ? timeseries ? > > > > 2016-03-21 23:37 GMT+01:00 Gianluca Borello : > > Thank you for your reply, definitely appreciate the ti

Re: Pending compactions not going down on some nodes of the cluster

2016-03-21 Thread Gianluca Borello
e size reported by compactionstats is the uncompressed size – if you’re > using compression, it’s perfectly reasonable for 30G of data to show up as > 118G of data during compaction. > > - Jeff > > From: Gianluca Borello > Reply-To: "user@cassandra.apache.org" > Date:

Re: Pending compactions not going down on some nodes of the cluster

2016-03-21 Thread Gianluca Borello
On Mon, Mar 21, 2016 at 12:50 PM, Gianluca Borello wrote: > > - It's also interesting to notice how the compaction in the previous > example is trying to compact ~37 GB, which is essentially the whole size of > the column family message_data1 as reported by cfstats: > Also r

Re: Pending compactions not going down on some nodes of the cluster

2016-03-21 Thread Gianluca Borello
On Mon, Mar 21, 2016 at 2:15 PM, Alain RODRIGUEZ wrote: > > What hardware do you use? Can you see it running at the limits (CPU / > disks IO)? Is there any error on system logs, are disks doing fine? > > Nodes are c3.2xlarge instances on AWS. The nodes are relatively idle, and, as said in the ori

Pending compactions not going down on some nodes of the cluster

2016-03-21 Thread Gianluca Borello
Hi, We added a bunch of new nodes to a cluster (2.1.13) and everything went fine, except for the number of pending compactions that is staying quite high on a subset of the new nodes. Over the past 3 days, the pending compactions have never been less than ~130 on such nodes, with peaks of ~200. On

Re: Unexpected high internode network activity

2016-02-26 Thread Gianluca Borello
Thank you for your reply. - Repairs are not running on the cluster, in fact we've been "slacking" when it comes to repair, mainly because we never manually delete our data as it's always TTLed and we haven't had major failures or outages that required repairing data (I know that's not a good reaso

Re: Unexpected high internode network activity

2016-02-26 Thread Gianluca Borello
e but really dc=az) I am not seeing the bandwidth as that much out of > line. > > > > *...* > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* > > On Thu, Feb 25, 2016 at 11:00 PM, Gianluca Borello > wrote: > >> I

Re: Unexpected high internode network activity

2016-02-25 Thread Gianluca Borello
on C.M. ReiydelleUSA (+1) 415.501.0198 > <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 > <%28%2B44%29%20%280%29%2020%208144%209872>* > > On Thu, Feb 25, 2016 at 8:12 PM, Gianluca Borello > wrote: > >> Thank you for your reply. >> >> To ans

Re: Unexpected high internode network activity

2016-02-25 Thread Gianluca Borello
ould not be charged if intra AZ, > but inter AZ and inter DC will get that double count. > > So, my guess is reverse indexes, and you forgot to include receive and > transmit.​ > ​ > > > *...* > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44)

Unexpected high internode network activity

2016-02-25 Thread Gianluca Borello
Hello, We have a Cassandra 2.1.9 cluster on EC2 for one of our live applications. There's a total of 21 nodes across 3 AWS availability zones, c3.2xlarge instances. The configuration is pretty standard, we use the default settings that come with the datastax AMI and the driver in our application

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
mp selecting a single row a few rows or many rows > (dozens, hundreds, thousands)? > > > -- Jack Krupansky > > On Sun, Feb 14, 2016 at 7:40 PM, Gianluca Borello > wrote: > >> Thanks again. >> >> One clarification about "reading in a single SELECT": in m

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
t; You can definitely read all of columns in a single SELECT. And the > n-INSERTS can be batched and will insert fewer cells in the storage engine > than the previous approach. > > -- Jack Krupansky > > On Sun, Feb 14, 2016 at 7:31 PM, Gianluca Borello > wrote: > >>

Re: Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
STORAGE for even more efficient storage and > access (assuming there is only a single non-PK data column, the blob > value.) You can then access (read or write) an individual column/blob or a > slice of them. > > -- Jack Krupansky > > On Sun, Feb 14, 2016 at 5:22 PM, Gianluca

Performance issues with "many" CQL columns

2016-02-14 Thread Gianluca Borello
Hi I've just painfully discovered a "little" detail in Cassandra: Cassandra touches all columns on a CQL select (related issues https://issues.apache.org/jira/browse/CASSANDRA-6586, https://issues.apache.org/jira/browse/CASSANDRA-6588, https://issues.apache.org/jira/browse/CASSANDRA-7085). My dat

Re: Error on nodetool cleanup

2015-02-28 Thread Gianluca Borello
Jeff > > On Fri, Feb 27, 2015 at 6:01 PM, Gianluca Borello > wrote: > >> Hello, >> >> I have a cluster of four nodes running 2.0.12. I added one more node and >> then went on with the cleanup procedure on the other four nodes, but I get >> this error (the sa

Error on nodetool cleanup

2015-02-27 Thread Gianluca Borello
Hello, I have a cluster of four nodes running 2.0.12. I added one more node and then went on with the cleanup procedure on the other four nodes, but I get this error (the same error on each node): INFO [CompactionExecutor:10] 2015-02-28 01:55:15,097 CompactionManager.java (line 619) Cleaned up t

Re: Wide rows best practices and GC impact

2014-12-03 Thread Gianluca Borello
tion. On Dec 3, 2014 6:33 PM, "Robert Coli" wrote: > > On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello wrote: >> >> We mainly store time series-like data, where each data point is a binary blob of 5-20KB. We use wide rows, and try to put in the same row all the data that

Wide rows best practices and GC impact

2014-12-02 Thread Gianluca Borello
Hi, We have a cluster (2.0.11) of 6 nodes (RF=3), c3.4xlarge instances, about 50 column families. Cassandra heap takes 8GB out of the 30GB of every instance. We mainly store time series-like data, where each data point is a binary blob of 5-20KB. We use wide rows, and try to put in the same row a

How to avoid column family duplication (when query requires multiple restrictions)

2014-09-22 Thread Gianluca Borello
Hi, I have a column family storing very large blobs that I would not like to duplicate, if possible. Here's a simplified version: CREATE TABLE timeline ( key text, a int, b int, value blob, PRIMARY KEY (key, a, b) ); On this, I run exactly two types of query. Both of them must hav