Huge daily outbound network traffic

2018-08-07 Thread Behnam B.Marandi
Hi,
I have a 3 node Cassandra cluster (version 3.11.1) on m4.xlarge EC2
instances with separate EBS volumes for root (gp2), data (gp2) and
commitlog (io1).
I get daily outbound traffic at a certain time everyday. As you can see in
the attached screenshot, whiile my normal networkl oad hardly meets 200MB,
this outbound (orange) spikes up to 2GB while inbound (purple) is less than
800MB.
There is no repair or backup process giong on in that time window, so I am
wondering where to look. Any idea?

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Huge daily outbound network traffic

2018-08-07 Thread Rahul Singh
Are you sure you don’t have an outside process that is doing an export , Spark 
job, non AWS managed backup process ?

Is this network out from Cassandra or from the network?


Rahul
On Aug 7, 2018, 4:09 AM -0400, Behnam B.Marandi , wrote:
> Hi,
> I have a 3 node Cassandra cluster (version 3.11.1) on m4.xlarge EC2 instances 
> with separate EBS volumes for root (gp2), data (gp2) and commitlog (io1).
> I get daily outbound traffic at a certain time everyday. As you can see in 
> the attached screenshot, whiile my normal networkl oad hardly meets 200MB, 
> this outbound (orange) spikes up to 2GB while inbound (purple) is less than 
> 800MB.
> There is no repair or backup process giong on in that time window, so I am 
> wondering where to look. Any idea?
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org


Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Laszlo Szabo
Hi,

Thanks for the fast response!

We are not using any materialized views, but there are several indexes.  I
don't have a recent heap dump, and it will be about 24 before I can
generate an interesting one, but most of the memory was allocated to byte
buffers, so not entirely helpful.

nodetool cfstats is also below.

I also see a lot of flushing happening, but it seems like there are too
many small allocations to be effective.  Here are the messages I see,

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459 ColumnFamilyStore.java:1305
> - Flushing largest CFS(Keyspace='userinfo', ColumnFamily='gpsmessages') to
> free up room. Used total: 0.54/0.05, live: 0.00/0.00, flushing: 0.40/0.04,
> this: 0.00/0.00

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459 ColumnFamilyStore.java:915
> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.014KiB (0%)
> off-heap

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,460 ColumnFamilyStore.java:1305
> - Flushing largest CFS(Keyspace='userinfo', ColumnFamily='user_history') to
> free up room. Used total: 0.54/0.05, live: 0.00/0.00, flushing: 0.40/0.04,
> this: 0.00/0.00

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,461 ColumnFamilyStore.java:915
> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.011KiB (0%)
> off-heap

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465 ColumnFamilyStore.java:1305
> - Flushing largest CFS(Keyspace='userinfo', ColumnFamily='tweets') to free
> up room. Used total: 0.54/0.05, live: 0.00/0.00, flushing: 0.40/0.04, this:
> 0.00/0.00

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465 ColumnFamilyStore.java:915
> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:1305
> - Flushing largest CFS(Keyspace='userinfo', ColumnFamily='user_history') to
> free up room. Used total: 0.54/0.05, live: 0.00/0.00, flushing: 0.40/0.04,
> this: 0.00/0.00

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.024KiB (0%)
> off-heap

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:1305
> - Flushing largest CFS(Keyspace='userinfo', ColumnFamily='tweets') to free
> up room. Used total: 0.54/0.05, live: 0.00/0.00, flushing: 0.40/0.04, this:
> 0.00/0.00

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472 ColumnFamilyStore.java:1305
> - Flushing largest CFS(Keyspace='userinfo', ColumnFamily='gpsmessages') to
> free up room. Used total: 0.54/0.05, live: 0.00/0.00, flushing: 0.40/0.04,
> this: 0.00/0.00

DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472 ColumnFamilyStore.java:915
> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.013KiB (0%)
> off-heap


>

Stack traces from errors are below.


> java.io.IOException: Broken pipe

at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> ~[na:1.8.0_181]

at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> ~[na:1.8.0_181]

at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> ~[na:1.8.0_181]

at sun.nio.ch.IOUtil.write(IOUtil.java:51) ~[na:1.8.0_181]

at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
> ~[na:1.8.0_181]

at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.doFlush(BufferedDataOutputStreamPlus.java:323)
> ~[apache-cassandra-3.11.1.jar:3.11.1]

at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.flush(BufferedDataOutputStreamPlus.java:331)
> ~[apache-cassandra-3.11.1.jar:3.11.1]

at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:409)
> [apache-cassandra-3.11.1.jar:3.11.1]

at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380)
> [apache-cassandra-3.11.1.jar:3.11.1]

at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]

ERROR [MutationStage-226] 2018-08-06 07:16:08,236
> JVMStabilityInspector.java:142 - JVM state determined to be unstable.
> Exiting forcefully due to:

java.lang.OutOfMemoryError: Direct buffer memory

at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_181]

at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> ~[na:1.8.0_181]

at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> ~[na:1.8.0_181]

at
> org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAllocator.java:139)
> ~[apache-cassandra-3.11.1.jar:3.11.1]

at
> org.apache.cassandra.utils.memory.SlabAllocator.allocate(SlabAllocator.java:104)
> ~[apache-cassandra-3.11.1.jar:3.11.1]

at
> org.apache.cassandra.utils.memory.ContextAllocator.allocate(ContextAllocator.java:57)
> ~[apache-cassandra-3.11.1.jar:3.11.1]

at
> org.apache.cassandra.utils.memory.

Re: Hinted Handoff

2018-08-07 Thread Rahul Singh
What is the data size that you are talking about ? What is your compaction 
strategy?

I wouldn’t recommend having such an aggressive TTL. Why not put a clustering 
key that allows you to get the data fairly quickly but have a longer TTL?



Cassandra can still be used if the there is a legitimate need for multi-dc 
global replication and redundancy not quite available at the same level of 
uptime as in dist. Coaches like REDIS.


Rahul
On Aug 7, 2018, 1:19 AM -0400, kurt greaves , wrote:
> > Does Cassandra TTL out the hints after max_hint_window_in_ms? From my 
> > understanding, Cassandra only stops collecting hints after 
> > max_hint_window_in_ms but can still keep replaying the hints if the node 
> > comes back again. Is this correct? Is there a way to TTL out hints?
>
> No, but it won't send hints that have passed HH window. Also, this shouldn't 
> be caused by HH as the hints maintain the original timestamp with which they 
> were written.
>
> Honestly, this sounds more like a use case for a distributed cache rather 
> than Cassandra. Keeping data for 30 minutes and then deleting it is going to 
> be a nightmare to manage in Cassandra.
>
> > On 7 August 2018 at 07:20, Agrawal, Pratik  
> > wrote:
> > > Does Cassandra TTL out the hints after max_hint_window_in_ms? From my 
> > > understanding, Cassandra only stops collecting hints after 
> > > max_hint_window_in_ms but can still keep replaying the hints if the node 
> > > comes back again. Is this correct? Is there a way to TTL out hints?
> > >
> > > Thanks,
> > > Pratik
> > >
> > > From: Kyrylo Lebediev 
> > > Reply-To: "user@cassandra.apache.org" 
> > > Date: Monday, August 6, 2018 at 4:10 PM
> > > To: "user@cassandra.apache.org" 
> > > Subject: Re: Hinted Handoff
> > >
> > > Small gc_grace_seconds value lowers max allowed node downtime, which is 
> > > 15 minutes in your case. After 15 minutes of downtime you'll need to 
> > > replace the node, as you described. This interval looks too short to be 
> > > able to do planned maintenance. So, in case you set larger value for 
> > > gc_grace_seconds (lets say, hours or a day) will you get visible read 
> > > amplification / waste a lot of disk space / issues with compactions?
> > >
> > > Hinted handoff may be the reason in case hinted handoff window is longer 
> > > than gc_grace_seconds. To me it looks like hinted handoff window 
> > > (max_hint_window_in_ms in cassandra.yaml, which defaults to 3h) must 
> > > always be set to a value less than gc_grace_seconds.
> > >
> > > Regards,
> > > Kyrill
> > > From: Agrawal, Pratik 
> > > Sent: Monday, August 6, 2018 8:22:27 PM
> > > To: user@cassandra.apache.org
> > > Subject: Hinted Handoff
> > >
> > > Hello all,
> > > We use Cassandra in non-conventional way, where our data is short termed 
> > > (life cycle of about 20-30 minutes) where each record is updated ~5 times 
> > > and then deleted. We have GC grace of 15 minutes.
> > > We are seeing 2 problems
> > > 1.) A certain number of Cassandra nodes goes down and then we remove it 
> > > from the cluster using Cassandra removenode command and replace the dead 
> > > nodes with new nodes. While new nodes are joining in, we see more nodes 
> > > down (which are not actually down) but we see following errors in the log
> > > “Gossip not settled after 321 polls. Gossip Stage 
> > > active/pending/completed: 1/816/0”
> > >
> > > To fix the issue, I restarted the server and the nodes now appear to be 
> > > up and the problem is solved
> > >
> > > Can this problem be related to 
> > > https://issues.apache.org/jira/browse/CASSANDRA-6590 ?
> > >
> > > 2.) Meanwhile, after restarting the nodes mentioned above, we see that 
> > > some old deleted data is resurrected (because of short lifecycle of our 
> > > data). My guess at the moment is that these data is resurrected due to 
> > > hinted handoff. Interesting point to note here is that data keeps 
> > > resurrecting at periodic intervals (like an hour) and then finally stops. 
> > > Could this be caused by hinted handoff? if so is there any setting which 
> > > we can set to specify that “invalidate, hinted handoff data after 5-10 
> > > minutes”.
> > >
> > > Thanks,
> > > Pratik
>


Re: ETL options from Hive/Presto/s3 to cassandra

2018-08-07 Thread Rahul Singh
Spark is scalable to as many nodes as you want and could be collocated with the 
data nodes — sstableloader wont be as performant for larger datasets. Although 
it can be run in parallel on different nodes I don’t believe it to be as fault 
tolerant.

If you have to do it continuously I would even think about leveraging Kafka as 
the transport layer and using Kafka Connect. It brings other tooling to get 
data into Cassandra from a variety of sources.

Rahul
On Aug 6, 2018, 3:16 PM -0400, srimugunthan dhandapani 
, wrote:
> Hi all,
> We have data that gets filled into Hive/ presto  every few hours.
> We want that data to be transferred to cassandra tables.
> What are some of the high performance ETL options for transferring data 
> between hive  or presto into cassandra?
>
> Also does anybody have any performance numbers comparing
> - loading data from S3 to cassandra using SStableloader
> - and loading data from S3 to cassandra using other means (like spark-api)?
>
> Thanks,
> mugunthan


Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Jonathan Haddad
By default Cassandra is set to generate a heap dump on OOM. It can be a bit
tricky to figure out what’s going on exactly but it’s the best evidence you
can work with.

On Tue, Aug 7, 2018 at 6:30 AM Laszlo Szabo 
wrote:

> Hi,
>
> Thanks for the fast response!
>
> We are not using any materialized views, but there are several indexes.  I
> don't have a recent heap dump, and it will be about 24 before I can
> generate an interesting one, but most of the memory was allocated to byte
> buffers, so not entirely helpful.
>
> nodetool cfstats is also below.
>
> I also see a lot of flushing happening, but it seems like there are too
> many small allocations to be effective.  Here are the messages I see,
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='gpsmessages') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459 ColumnFamilyStore.java:915
>> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.014KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,460
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='user_history') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,461 ColumnFamilyStore.java:915
>> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.011KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='tweets') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465 ColumnFamilyStore.java:915
>> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='user_history') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
>> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.024KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='tweets') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
>> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='gpsmessages') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472 ColumnFamilyStore.java:915
>> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.013KiB (0%)
>> off-heap
>
>
>>
>
> Stack traces from errors are below.
>
>
>> java.io.IOException: Broken pipe
>
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.IOUtil.write(IOUtil.java:51) ~[na:1.8.0_181]
>
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>> ~[na:1.8.0_181]
>
> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.doFlush(BufferedDataOutputStreamPlus.java:323)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.flush(BufferedDataOutputStreamPlus.java:331)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:409)
>> [apache-cassandra-3.11.1.jar:3.11.1]
>
> at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380)
>> [apache-cassandra-3.11.1.jar:3.11.1]
>
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
>
> ERROR [MutationStage-226] 2018-08-06 07:16:08,236
>> JVMStabilityInspector.java:142 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
> at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_181]
>
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>> ~[na:1.8.0_181]
>
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
>> ~[na:1.8.0_181]
>
> at
>> org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAl

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Jeff Jirsa
That's a direct memory OOM - it's not the heap, it's the offheap.

You can see
that gpsmessages.addressreceivedtime_idxgpsmessages.addressreceivedtime_idx
is holding about 2GB of offheap memory (most of it for the bloom filter),
but none of the others look like they're holding a ton offheap (either in
bloom filter, memtable, etc).  With what JVM args are you starting
cassandra (how much direct memory are you allocating)? Are all of your OOMs
in direct memory?



On Tue, Aug 7, 2018 at 6:30 AM, Laszlo Szabo 
wrote:

> Hi,
>
> Thanks for the fast response!
>
> We are not using any materialized views, but there are several indexes.  I
> don't have a recent heap dump, and it will be about 24 before I can
> generate an interesting one, but most of the memory was allocated to byte
> buffers, so not entirely helpful.
>
> nodetool cfstats is also below.
>
> I also see a lot of flushing happening, but it seems like there are too
> many small allocations to be effective.  Here are the messages I see,
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='gpsmessages') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459 ColumnFamilyStore.java:915
>> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.014KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,460
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='user_history') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,461 ColumnFamilyStore.java:915
>> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.011KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='tweets') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,465 ColumnFamilyStore.java:915
>> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='user_history') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
>> - Enqueuing flush of user_history: 0.000KiB (0%) on-heap, 0.024KiB (0%)
>> off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='tweets') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,470 ColumnFamilyStore.java:915
>> - Enqueuing flush of tweets: 0.000KiB (0%) on-heap, 0.188KiB (0%) off-heap
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472
>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>> ColumnFamily='gpsmessages') to free up room. Used total: 0.54/0.05, live:
>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>
> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,472 ColumnFamilyStore.java:915
>> - Enqueuing flush of gpsmessages: 0.000KiB (0%) on-heap, 0.013KiB (0%)
>> off-heap
>
>
>>
>
> Stack traces from errors are below.
>
>
>> java.io.IOException: Broken pipe
>
> at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>> ~[na:1.8.0_181]
>
> at sun.nio.ch.IOUtil.write(IOUtil.java:51) ~[na:1.8.0_181]
>
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
>> ~[na:1.8.0_181]
>
> at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.
>> doFlush(BufferedDataOutputStreamPlus.java:323)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.
>> flush(BufferedDataOutputStreamPlus.java:331)
>> ~[apache-cassandra-3.11.1.jar:3.11.1]
>
> at org.apache.cassandra.streaming.ConnectionHandler$
>> OutgoingMessageHandler.sendMessage(ConnectionHandler.java:409)
>> [apache-cassandra-3.11.1.jar:3.11.1]
>
> at org.apache.cassandra.streaming.ConnectionHandler$
>> OutgoingMessageHandler.run(ConnectionHandler.java:380)
>> [apache-cassandra-3.11.1.jar:3.11.1]
>
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
>
> ERROR [MutationStage-226] 2018-08-06 07:16:08,236
>> JVMStabilityInspector.java:142 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
> at java.nio.Bits.r

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Laszlo Szabo
The last run I attempted used 135GB of RAM allocated to the JVM (arguments
below), and while there are OOM errors, there is not a stack trace in
either the system or debug log.  On direct memory runs, there is a stack
trace.  The last Direct memory run used 60GB heaps and 60GB for off heap
(that was the stack trace attached).  The HPROF file is 135GB and I'm
trying to generate the heap information from that now, but its been running
for 2 hours.

The closest I can get to the stack trace for the 135GB heap run is below.

ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-08-07 00:34:13,980
> JVMStabilityInspector.java:82 - Exiting due to error while processing
> commit log during initialization.
> java.lang.OutOfMemoryError: Java heap space
> ERROR [MessagingService-Incoming-/10.1.1.11] 2018-08-07 00:34:13,980
> CassandraDaemon.java:228 - Exception in thread
> Thread[MessagingService-Incoming-/10.1.1.11,5,main]
> java.lang.OutOfMemoryError: Java heap space
> ERROR [HintsWriteExecutor:1] 2018-08-07 00:34:13,981
> CassandraDaemon.java:228 - Exception in thread
> Thread[HintsWriteExecutor:1,5,main]
> java.lang.OutOfMemoryError: Java heap space
> ERROR [MessagingService-Incoming-/10.1.1.13] 2018-08-07 00:34:13,980
> CassandraDaemon.java:228 - Exception in thread
> Thread[MessagingService-Incoming-/10.1.1.13,5,main]
> java.lang.OutOfMemoryError: Java heap space
> INFO  [Service Thread] 2018-08-07 00:34:13,982 StatusLogger.java:101 -
> system.schema_triggers0,0



JVM Arguments: [-Xloggc:/var/log/cassandra/gc.log, -ea,
> -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42,
> -XX:+HeapDumpOnOutOfMemoryError, -Xss256k, -XX:StringTableSize=103

, -XX:+AlwaysPreTouch, -XX:-UseBiasedLocking, -XX:+UseTLAB,
> -XX:+ResizeTLAB, -XX:+UseNUMA, -XX:+PerfDisableSharedMem,
> -Djava.net.preferIPv4Stack=true, -Xms135G, -Xmx135G, -XX:+UseParNewGC,
> -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled,

-XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1,
> -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly,
> -XX:CMSWaitDuration=1, -XX:+CMSParallelInitialMarkEnabled,
> -XX:+CMSEdenChunksRecordAlways, -XX:+CMSClassUnloadingEnab

led, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintHeapAtGC,
> -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime,
> -XX:+PrintPromotionFailure, -XX:+UseGCLogFileRotation,
> -XX:NumberOfGCLogFiles=10, -XX:GCLogFileSize=10M, -X

mn2048M, -XX:+UseCondCardMark,
> -XX:CompileCommandFile=/etc/cassandra/hotspot_compiler,
> -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar,
> -Dcassandra.jmx.local.port=7199,
> -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.j

mxremote.password.file=/etc/cassandra/jmxremote.password,
> -Djava.library.path=/usr/share/cassandra/lib/sigar-bin,
> -Dlogback.configurationFile=logback.xml,
> -Dcassandra.logdir=/var/log/cassandra,
> -Dcassandra.storagedir=/var/lib/cassandra, -Dcassan

dra-pidfile=/var/run/cassandra/cassandra.pid,
> -XX:HeapDumpPath=/var/lib/cassandra/java_1533559747.hprof,
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1533559747.log]



On Tue, Aug 7, 2018 at 11:12 AM, Jeff Jirsa  wrote:

> That's a direct memory OOM - it's not the heap, it's the offheap.
>
> You can see that 
> gpsmessages.addressreceivedtime_idxgpsmessages.addressreceivedtime_idx
> is holding about 2GB of offheap memory (most of it for the bloom filter),
> but none of the others look like they're holding a ton offheap (either in
> bloom filter, memtable, etc).  With what JVM args are you starting
> cassandra (how much direct memory are you allocating)? Are all of your OOMs
> in direct memory?
>
>
>
> On Tue, Aug 7, 2018 at 6:30 AM, Laszlo Szabo <
> laszlo.viktor.sz...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for the fast response!
>>
>> We are not using any materialized views, but there are several indexes.
>> I don't have a recent heap dump, and it will be about 24 before I can
>> generate an interesting one, but most of the memory was allocated to byte
>> buffers, so not entirely helpful.
>>
>> nodetool cfstats is also below.
>>
>> I also see a lot of flushing happening, but it seems like there are too
>> many small allocations to be effective.  Here are the messages I see,
>>
>> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459
>>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>>> ColumnFamily='gpsmessages') to free up room. Used total: 0.54/0.05, live:
>>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>>
>> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,459
>>> ColumnFamilyStore.java:915 - Enqueuing flush of gpsmessages: 0.000KiB (0%)
>>> on-heap, 0.014KiB (0%) off-heap
>>
>> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,460
>>> ColumnFamilyStore.java:1305 - Flushing largest CFS(Keyspace='userinfo',
>>> ColumnFamily='user_history') to free up room. Used total: 0.54/0.05, live:
>>> 0.00/0.00, flushing: 0.40/0.04, this: 0.00/0.00
>>
>> DEBUG [SlabPoolCleaner] 2018-08-06 07:16:08,461
>>> ColumnFa

Re: Hinted Handoff

2018-08-07 Thread Agrawal, Pratik
Please find my comments inline.

From: kurt greaves 
Reply-To: "user@cassandra.apache.org" 
Date: Tuesday, August 7, 2018 at 1:20 AM
To: User 
Subject: Re: Hinted Handoff

Does Cassandra TTL out the hints after max_hint_window_in_ms? From my 
understanding, Cassandra only stops collecting hints after 
max_hint_window_in_ms but can still keep replaying the hints if the node comes 
back again. Is this correct? Is there a way to TTL out hints?

No, but it won't send hints that have passed HH window. Also, this shouldn't be 
caused by HH as the hints maintain the original timestamp with which they were 
written.

  *   We actually saw data resurrecting after HH window. One interesting thing 
to notice is that, the data was resurrecting in intervals (after ~1Hr).
  *   Original timestamp doesn’t help since the other copies of the data are 
actually deleted and tombstones are wiped out after 15 minutes.
  *   The Cassandra version we are using is 2.2.8

Honestly, this sounds more like a use case for a distributed cache rather than 
Cassandra. Keeping data for 30 minutes and then deleting it is going to be a 
nightmare to manage in Cassandra.

  *   Agreed, we are looking into other databases (like Redis, Aerospike). We 
have a write heavy use case and also need optimistic locking + columnar updates.

Thanks,
Pratik

On 7 August 2018 at 07:20, Agrawal, Pratik 
mailto:paagr...@amazon.com.invalid>> wrote:
Does Cassandra TTL out the hints after max_hint_window_in_ms? From my 
understanding, Cassandra only stops collecting hints after 
max_hint_window_in_ms but can still keep replaying the hints if the node comes 
back again. Is this correct? Is there a way to TTL out hints?

Thanks,
Pratik

From: Kyrylo Lebediev 
mailto:kyrylo_lebed...@epam.com>.INVALID>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, August 6, 2018 at 4:10 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Hinted Handoff


Small gc_grace_seconds value lowers max allowed node downtime, which is 15 
minutes in your case. After 15 minutes of downtime you'll need to replace the 
node, as you described. This interval looks too short to be able to do planned 
maintenance. So, in case you set larger value for gc_grace_seconds (lets say, 
hours or a day) will you get visible read amplification / waste a lot of disk 
space / issues with compactions?



Hinted handoff may be the reason in case hinted handoff window is longer than 
gc_grace_seconds. To me it looks like hinted handoff window 
(max_hint_window_in_ms in cassandra.yaml, which defaults to 3h) must always be 
set to a value less than gc_grace_seconds.



Regards,

Kyrill


From: Agrawal, Pratik 
Sent: Monday, August 6, 2018 8:22:27 PM
To: user@cassandra.apache.org
Subject: Hinted Handoff


Hello all,

We use Cassandra in non-conventional way, where our data is short termed (life 
cycle of about 20-30 minutes) where each record is updated ~5 times and then 
deleted. We have GC grace of 15 minutes.

We are seeing 2 problems

1.) A certain number of Cassandra nodes goes down and then we remove it from 
the cluster using Cassandra removenode command and replace the dead nodes with 
new nodes. While new nodes are joining in, we see more nodes down (which are 
not actually down) but we see following errors in the log

“Gossip not settled after 321 polls. Gossip Stage active/pending/completed: 
1/816/0”



To fix the issue, I restarted the server and the nodes now appear to be up and 
the problem is solved



Can this problem be related to 
https://issues.apache.org/jira/browse/CASSANDRA-6590 ?



2.) Meanwhile, after restarting the nodes mentioned above, we see that some old 
deleted data is resurrected (because of short lifecycle of our data). My guess 
at the moment is that these data is resurrected due to hinted handoff. 
Interesting point to note here is that data keeps resurrecting at periodic 
intervals (like an hour) and then finally stops. Could this be caused by hinted 
handoff? if so is there any setting which we can set to specify that 
“invalidate, hinted handoff data after 5-10 minutes”.



Thanks,
Pratik



Re: Secure data

2018-08-07 Thread rajpal reddy
Hi Jon,

Was trying the LUKS  encryption following the Doc. 
https://aws.amazon.com/blogs/security/how-to-protect-data-at-rest-with-amazon-ec2-instance-store-encryption/
 

 on ec2 i3.large machine.
i don’t see the disk mounted.  and see the mapper being at 100%. do you see 
anything wrong following below statements.
i see this error is /var/log/messages
 ERROR [instanceID=i-0de508d7fc188ab20] [MessagingDeliveryService] 
[Association] Unable to load instance associations, unable to retrieve 
associations unable to retrieve associations NoCredentialProviders: no valid 
providers in chain. Deprecated

df -h /dev/mapper/
Filesystem  Size  Used Avail Use% Mounted on
devtmpfs7.5G  7.5G 0 100% /dev

#!/bin/bash

## Initial setup to be executed on boot
##

# Create an empty file. This file will be used to host the file system.
# In this example we create a 2 GB file called secretfs (Secret File System).
dd of=secretfs bs=1G count=0 seek=2
# Lock down normal access to the file.
chmod 600 secretfs
# Associate a loopback device with the file.
losetup /dev/nvme0 secretfs
#Copy encrypted password file from S3. The password is used to configure LUKE 
later on.
aws s3 cp s3://mybucket/LuksInternalStorageKey .
# Decrypt the password from the file with KMS, save the secret password in 
LuksClearTextKey
LuksClearTextKey=$(aws --region us-east-1 kms decrypt --ciphertext-blob 
fileb://LuksInternalStorageKey --output text --query Plaintext | base64 
--decode)
# Encrypt storage in the device. cryptsetup will use the Linux
# device mapper to create, in this case, /dev/mapper/secretfs.
# Initialize the volume and set an initial key.
echo "$LuksClearTextKey" | cryptsetup -y luksFormat /dev/nvme0
# Open the partition, and create a mapping to /dev/mapper/secretfs.
echo "$LuksClearTextKey" | cryptsetup luksOpen /dev/nvme0 secretfs
# Clear the LuksClearTextKey variable because we don't need it anymore.
unset LuksClearTextKey
# Check its status (optional).
cryptsetup status secretfs
# Zero out the new encrypted device.
dd if=/dev/zero of=/dev/mapper/secretfs
# Create a file system and verify its status.
mke2fs -j -O dir_index /dev/mapper/secretfs
# List file system configuration (optional).
tune2fs -l /dev/mapper/secretfs
# Mount the new file system to /data_e/secretfs.
sudo mkdir /data_e/secretfs
sudo mount /dev/mapper/secretfs /data_e/secretfs


> On Aug 1, 2018, at 3:38 PM, Jonathan Haddad  wrote:
> 
> You can also get full disk encryption with LUKS, which I've used before.
> 
> On Wed, Aug 1, 2018 at 12:36 PM Jeff Jirsa  > wrote:
> EBS encryption worked well on gp2 volumes (never tried it on any others)
> 
> -- 
> Jeff Jirsa
> 
> 
> On Aug 1, 2018, at 7:57 AM, Rahul Reddy  > wrote:
> 
>> Hello,
>> 
>> Any one tried aws ec2 volume encryption for Cassandra instances?
>> 
>> On Tue, Jul 31, 2018, 12:25 PM Rahul Reddy > > wrote:
>> Hello,
>> 
>> I'm trying to find a good document on to enable encryption for Apache 
>> Cassandra  (not on dse) tables and commilogs and store the keystore in kms 
>> or vault. If any of you already configured please direct me to documentation 
>> for it.
> 
> 
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com 
> twitter: rustyrazorblade



Re: dynamic_snitch=false, prioritisation/order or reads from replicas

2018-08-07 Thread Alain RODRIGUEZ
Hello Kyrill,

But in case of CL=QUORUM/LOCAL_QUORUM, if I'm not wrong, read request is
> sent to all replicas waiting for first 2 to reply.
>

My understanding is that this sentence is wrong. It is as you described it
for writes indeed, all the replicas got the information (and to all the
data centers). It's not the case for reads. For reads, x nodes are picked
and used (x = ONE, QUORUM, ALL, ...).

Looks like the only change for dynamic_snitch=false is that "data" request
> is sent to a determined node instead of "currently the fastest one".
>

Indeed, the problem is that the 'currently the fastest one' changes very
often in certain cases, thus removing the efficiency from the cache without
enough compensation in many cases.
The idea of not using the 'bad' nodes is interesting to have more
predictable latencies when a node is slow for some reason. Yet one of the
side effects of this (and of the scoring that does not seem to be
absolutely reliable) is that the clients are often routed to distinct nodes
when under pressure, due to GC pauses for example or any other pressure.
Saving disk reads in read-heavy workloads under pressure is more important
than trying to save a few milliseconds picking the 'best' node I guess.
I can imagine that alleviating these disks, reducing the number of disk
IO/throughput ends up lowering the latency for all the nodes, thus the
client application latency improves overall. That is my understanding of
why it is so often good to disable the dynamic_snitch.

Did you get improved response for CL=ONE only or for higher CL's as well?
>

I must admit I don't remember for sure, but many people are using
'LOCAL_QUORUM' and I think I saw this for this consistency level as well.
Plus this question might no longer stand as reads in Cassandra work
slightly differently than what you thought.

I am not 100% comfortable with this 'dynamic_snitch theory' topic, so I
hope someone else can correct me if I am wrong, confirm or add information
:). But for sure I have seen this disabled giving some really nice
improvement (as many others here as you mentioned). Sometimes it was not
helpful, but I have never seen this change being really harmful though.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-08-06 22:27 GMT+01:00 Kyrylo Lebediev :

> Thank you for replying, Alain!
>
>
> Better use of cache for 'pinned' requests explains good the case when
> CL=ONE.
>
>
> But in case of CL=QUORUM/LOCAL_QUORUM, if I'm not wrong, read request is
> sent to all replicas waiting for first 2 to reply.
>
> When dynamic snitching is turned on, "data" request is sent to "the
> fastest replica", and "digest" requests - to the rest of replicas.
>
> But anyway digest is the same read operation [from SSTables through
> filesystem cache] + calculating and sending hash to coordinator. Looks like
> the only change for dynamic_snitch=false is that "data" request is sent to
> a determined node instead of "currently the fastest one".
>
> So, if there are no mistakes in above description, improvement shouldn't
> be much visible for CL=*QUORUM...
>
>
> Did you get improved response for CL=ONE only or for higher CL's as well?
>
>
> Indeed an interesting thread in Jira.
>
>
> Thanks,
>
> Kyrill
> --
> *From:* Alain RODRIGUEZ 
> *Sent:* Monday, August 6, 2018 8:26:43 PM
> *To:* user cassandra.apache.org
> *Subject:* Re: dynamic_snitch=false, prioritisation/order or reads from
> replicas
>
> Hello,
>
>
> There are reports (in this ML too) that disabling dynamic snitching
> decreases response time.
>
>
> I confirm that I have seen this improvement on clusters under pressure.
>
> What effects stand behind this improvement?
>
>
> My understanding is that this is due to the fact that the clients are then
> 'pinned', more sticking to specific nodes when the dynamic snitching is
> off. I guess there is a better use of caches and in-memory structures,
> reducing the amount of disk read needed, which can lead to way more
> performances than switching from node to node as soon as the score of some
> node is not good enough.
> I am also not sure that the score calculation is always relevant, thus
> increasing the threshold before switching reads to another node is still
> often worst than disabling it completely. I am not sure if the score
> calculation was fixed, but in most cases, I think it's safer to run with
> 'dynamic_snitch: false'. Anyway, it's possible to test it on a canary node
> (or entire rack) and look at the p99 for read latencies for example :).
>
> This ticket is old, but was precisely on that topic:
> https://issues.apache.org/jira/browse/CASSANDRA-6908
>
> C*heers
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2018-08-04 15:37

TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hey guys, quick question:

I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
one drive, data on nvme.  That was working very well, it's a ts db and has
been accumulating data for about 4weeks.

The nodes have increased in load and compaction seems to be falling
behind.  I used to get about 1 file per day for this column family, about
~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
50mb.

How to recover from this?

I can scale out to give some breathing room but will it go back and compact
the old days into nicely packed files for the day?

I tried setting compaction throughput to 1000 from 256 and it seemed to
make things worse for the CPU, it's configured on i3.2xl with 8 compaction
threads.

-B

Lastly, I have mixed TTLs in this CF and need to run a repair (I think) to
get rid of old tombstones, however running repairs in 2.1 on TWCS column
families causes a very large spike in sstable counts due to anti-compaction
which causes a lot of disruption, is there any other way?


Re: TWCS Compaction backed up

2018-08-07 Thread Jonathan Haddad
What's your window size?

When you say backed up, how are you measuring that?  Are there pending
tasks or do you just see more files than you expect?

On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
wrote:

> Hey guys, quick question:
>
> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
> one drive, data on nvme.  That was working very well, it's a ts db and has
> been accumulating data for about 4weeks.
>
> The nodes have increased in load and compaction seems to be falling
> behind.  I used to get about 1 file per day for this column family, about
> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
> 50mb.
>
> How to recover from this?
>
> I can scale out to give some breathing room but will it go back and
> compact the old days into nicely packed files for the day?
>
> I tried setting compaction throughput to 1000 from 256 and it seemed to
> make things worse for the CPU, it's configured on i3.2xl with 8 compaction
> threads.
>
> -B
>
> Lastly, I have mixed TTLs in this CF and need to run a repair (I think) to
> get rid of old tombstones, however running repairs in 2.1 on TWCS column
> families causes a very large spike in sstable counts due to anti-compaction
> which causes a lot of disruption, is there any other way?
>
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hi Jonathan, both I believe.

The window size is 1 day, full settings:
AND compaction = {'timestamp_resolution': 'MILLISECONDS',
'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
'tombstone_threshold': '0.2', 'class':
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}


nodetool tpstats

Pool NameActive   Pending  Completed   Blocked  All
time blocked
MutationStage 0 068582241832 0
   0
ReadStage 0 0  209566303 0
   0
RequestResponseStage  0 044680860850 0
   0
ReadRepairStage   0 0   24562722 0
   0
CounterMutationStage  0 0  0 0
   0
MiscStage 0 0  0 0
   0
HintedHandoff 1 1203 0
   0
GossipStage   0 08471784 0
   0
CacheCleanupExecutor  0 0122 0
   0
InternalResponseStage 0 0 552125 0
   0
CommitLogArchiver 0 0  0 0
   0
CompactionExecutor8421433715 0
   0
ValidationExecutor0 0   2521 0
   0
MigrationStage0 0 527549 0
   0
AntiEntropyStage  0 0   7697 0
   0
PendingRangeCalculator0 0 17 0
   0
Sampler   0 0  0 0
   0
MemtableFlushWriter   0 0 116966 0
   0
MemtablePostFlush 0 0 209103 0
   0
MemtableReclaimMemory 0 0 116966 0
   0
Native-Transport-Requests 1 0 1715937778 0
  176262

Message type   Dropped
READ 2
RANGE_SLICE  0
_TRACE   0
MUTATION  4390
COUNTER_MUTATION 0
BINARY   0
REQUEST_RESPONSE  1882
PAGED_RANGE  0
READ_REPAIR  0


On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:

> What's your window size?
>
> When you say backed up, how are you measuring that?  Are there pending
> tasks or do you just see more files than you expect?
>
> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
> wrote:
>
>> Hey guys, quick question:
>>
>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
>> one drive, data on nvme.  That was working very well, it's a ts db and has
>> been accumulating data for about 4weeks.
>>
>> The nodes have increased in load and compaction seems to be falling
>> behind.  I used to get about 1 file per day for this column family, about
>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
>> 50mb.
>>
>> How to recover from this?
>>
>> I can scale out to give some breathing room but will it go back and
>> compact the old days into nicely packed files for the day?
>>
>> I tried setting compaction throughput to 1000 from 256 and it seemed to
>> make things worse for the CPU, it's configured on i3.2xl with 8 compaction
>> threads.
>>
>> -B
>>
>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think)
>> to get rid of old tombstones, however running repairs in 2.1 on TWCS column
>> families causes a very large spike in sstable counts due to anti-compaction
>> which causes a lot of disruption, is there any other way?
>>
>>
>>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


Re: TWCS Compaction backed up

2018-08-07 Thread Jeff Jirsa
You could toggle off the tombstone compaction to see if that helps, but that 
should be lower priority than normal compactions

Are the lots-of-little-files from memtable flushes or repair/anticompaction?

Do you do normal deletes? Did you try to run Incremental repair?  

-- 
Jeff Jirsa


> On Aug 7, 2018, at 5:00 PM, Brian Spindler  wrote:
> 
> Hi Jonathan, both I believe.  
> 
> The window size is 1 day, full settings: 
> AND compaction = {'timestamp_resolution': 'MILLISECONDS', 
> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 
> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', 
> 'tombstone_threshold': '0.2', 'class': 
> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} 
> 
> 
> nodetool tpstats 
> 
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 068582241832 0
>  0
> ReadStage 0 0  209566303 0
>  0
> RequestResponseStage  0 044680860850 0
>  0
> ReadRepairStage   0 0   24562722 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 1 1203 0
>  0
> GossipStage   0 08471784 0
>  0
> CacheCleanupExecutor  0 0122 0
>  0
> InternalResponseStage 0 0 552125 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor8421433715 0
>  0
> ValidationExecutor0 0   2521 0
>  0
> MigrationStage0 0 527549 0
>  0
> AntiEntropyStage  0 0   7697 0
>  0
> PendingRangeCalculator0 0 17 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0 116966 0
>  0
> MemtablePostFlush 0 0 209103 0
>  0
> MemtableReclaimMemory 0 0 116966 0
>  0
> Native-Transport-Requests 1 0 1715937778 0
> 176262
> 
> Message type   Dropped
> READ 2
> RANGE_SLICE  0
> _TRACE   0
> MUTATION  4390
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE  1882
> PAGED_RANGE  0
> READ_REPAIR  0
> 
> 
>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:
>> What's your window size?
>> 
>> When you say backed up, how are you measuring that?  Are there pending tasks 
>> or do you just see more files than you expect?
>> 
>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler  
>>> wrote:
>>> Hey guys, quick question: 
>>>  
>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on 
>>> one drive, data on nvme.  That was working very well, it's a ts db and has 
>>> been accumulating data for about 4weeks.  
>>> 
>>> The nodes have increased in load and compaction seems to be falling behind. 
>>>  I used to get about 1 file per day for this column family, about ~30GB 
>>> Data.db file per day.  I am now getting hundreds per day at  1mb - 50mb.
>>> 
>>> How to recover from this? 
>>> 
>>> I can scale out to give some breathing room but will it go back and compact 
>>> the old days into nicely packed files for the day?
>>> 
>>> I tried setting compaction throughput to 1000 from 256 and it seemed to 
>>> make things worse for the CPU, it's configured on i3.2xl with 8 compaction 
>>> threads. 
>>> 
>>> -B
>>> 
>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think) to 
>>> get rid of old tombstones, however running repairs in 2.1 on TWCS column 
>>> families causes a very large spike in sstable counts due to anti-compaction 
>>> which causes a lot of disruption, is there any other way?  
>>> 
>>> 
>> 
>> 
>> -- 
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade


Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hi Jeff, mostly lots of little files, like there will be 4-5 that are
1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.

Re incremental repair; Yes one of my engineers started an incremental
repair on this column family that we had to abort.  In fact, the node that
the repair was initiated on ran out of disk space and we ended replacing
that node like a dead node.

Oddly the new node is experiencing this issue as well.

-B


On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:

> You could toggle off the tombstone compaction to see if that helps, but
> that should be lower priority than normal compactions
>
> Are the lots-of-little-files from memtable flushes or
> repair/anticompaction?
>
> Do you do normal deletes? Did you try to run Incremental repair?
>
> --
> Jeff Jirsa
>
>
> On Aug 7, 2018, at 5:00 PM, Brian Spindler 
> wrote:
>
> Hi Jonathan, both I believe.
>
> The window size is 1 day, full settings:
> AND compaction = {'timestamp_resolution': 'MILLISECONDS',
> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
> 'tombstone_threshold': '0.2', 'class':
> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>
>
> nodetool tpstats
>
> Pool NameActive   Pending  Completed   Blocked
> All time blocked
> MutationStage 0 068582241832 0
>  0
> ReadStage 0 0  209566303 0
>  0
> RequestResponseStage  0 044680860850 0
>  0
> ReadRepairStage   0 0   24562722 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 1 1203 0
>  0
> GossipStage   0 08471784 0
>  0
> CacheCleanupExecutor  0 0122 0
>  0
> InternalResponseStage 0 0 552125 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor8421433715 0
>  0
> ValidationExecutor0 0   2521 0
>  0
> MigrationStage0 0 527549 0
>  0
> AntiEntropyStage  0 0   7697 0
>  0
> PendingRangeCalculator0 0 17 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0 116966 0
>  0
> MemtablePostFlush 0 0 209103 0
>  0
> MemtableReclaimMemory 0 0 116966 0
>  0
> Native-Transport-Requests 1 0 1715937778 0
> 176262
>
> Message type   Dropped
> READ 2
> RANGE_SLICE  0
> _TRACE   0
> MUTATION  4390
> COUNTER_MUTATION 0
> BINARY   0
> REQUEST_RESPONSE  1882
> PAGED_RANGE  0
> READ_REPAIR  0
>
>
> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:
>
>> What's your window size?
>>
>> When you say backed up, how are you measuring that?  Are there pending
>> tasks or do you just see more files than you expect?
>>
>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
>> wrote:
>>
>>> Hey guys, quick question:
>>>
>>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on
>>> one drive, data on nvme.  That was working very well, it's a ts db and has
>>> been accumulating data for about 4weeks.
>>>
>>> The nodes have increased in load and compaction seems to be falling
>>> behind.  I used to get about 1 file per day for this column family, about
>>> ~30GB Data.db file per day.  I am now getting hundreds per day at  1mb -
>>> 50mb.
>>>
>>> How to recover from this?
>>>
>>> I can scale out to give some breathing room but will it go back and
>>> compact the old days into nicely packed files for the day?
>>>
>>> I tried setting compaction throughput to 1000 from 256 and it seemed to
>>> make things worse for the CPU, it's configured on i3.2xl with 8 compaction
>>> threads.
>>>
>>> -B
>>>
>>> Lastly, I have mixed TTLs in this CF and need to run a repair (I think)
>>> to get rid of old tombstones, however running repairs in 2.1 on TWCS column
>>> families causes a very large spike in sstable counts due to anti-compaction
>>> which causes a lot of disruptio

Re: TWCS Compaction backed up

2018-08-07 Thread Jeff Jirsa
May be worth seeing if any of the sstables got promoted to repaired - if so 
they’re not eligible for compaction with unrepaired sstables and that could 
explain some higher counts

Do you actually do deletes or is everything ttl’d?
 

-- 
Jeff Jirsa


> On Aug 7, 2018, at 5:09 PM, Brian Spindler  wrote:
> 
> Hi Jeff, mostly lots of little files, like there will be 4-5 that are 1-1.5gb 
> or so and then many at 5-50MB and many at 40-50MB each.   
> 
> Re incremental repair; Yes one of my engineers started an incremental repair 
> on this column family that we had to abort.  In fact, the node that the 
> repair was initiated on ran out of disk space and we ended replacing that 
> node like a dead node.   
> 
> Oddly the new node is experiencing this issue as well.  
> 
> -B
> 
> 
>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>> You could toggle off the tombstone compaction to see if that helps, but that 
>> should be lower priority than normal compactions
>> 
>> Are the lots-of-little-files from memtable flushes or repair/anticompaction?
>> 
>> Do you do normal deletes? Did you try to run Incremental repair?  
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler  wrote:
>>> 
>>> Hi Jonathan, both I believe.  
>>> 
>>> The window size is 1 day, full settings: 
>>> AND compaction = {'timestamp_resolution': 'MILLISECONDS', 
>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 
>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400', 
>>> 'tombstone_threshold': '0.2', 'class': 
>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} 
>>> 
>>> 
>>> nodetool tpstats 
>>> 
>>> Pool NameActive   Pending  Completed   Blocked  All 
>>> time blocked
>>> MutationStage 0 068582241832 0  
>>>0
>>> ReadStage 0 0  209566303 0  
>>>0
>>> RequestResponseStage  0 044680860850 0  
>>>0
>>> ReadRepairStage   0 0   24562722 0  
>>>0
>>> CounterMutationStage  0 0  0 0  
>>>0
>>> MiscStage 0 0  0 0  
>>>0
>>> HintedHandoff 1 1203 0  
>>>0
>>> GossipStage   0 08471784 0  
>>>0
>>> CacheCleanupExecutor  0 0122 0  
>>>0
>>> InternalResponseStage 0 0 552125 0  
>>>0
>>> CommitLogArchiver 0 0  0 0  
>>>0
>>> CompactionExecutor8421433715 0  
>>>0
>>> ValidationExecutor0 0   2521 0  
>>>0
>>> MigrationStage0 0 527549 0  
>>>0
>>> AntiEntropyStage  0 0   7697 0  
>>>0
>>> PendingRangeCalculator0 0 17 0  
>>>0
>>> Sampler   0 0  0 0  
>>>0
>>> MemtableFlushWriter   0 0 116966 0  
>>>0
>>> MemtablePostFlush 0 0 209103 0  
>>>0
>>> MemtableReclaimMemory 0 0 116966 0  
>>>0
>>> Native-Transport-Requests 1 0 1715937778 0  
>>>   176262
>>> 
>>> Message type   Dropped
>>> READ 2
>>> RANGE_SLICE  0
>>> _TRACE   0
>>> MUTATION  4390
>>> COUNTER_MUTATION 0
>>> BINARY   0
>>> REQUEST_RESPONSE  1882
>>> PAGED_RANGE  0
>>> READ_REPAIR  0
>>> 
>>> 
 On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:
 What's your window size?
 
 When you say backed up, how are you measuring that?  Are there pending 
 tasks or do you just see more files than you expect?
 
> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler  
> wrote:
> Hey guys, quick question: 
>  
> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on 
> one drive, data on nvme.  That was working very well, it's a ts db and 
> has been accumulating data for about 4weeks.  
> 
> The nodes have increased in load and compaction seems to be falling 
> behind.  I used to get about 1 file per day for this column family, about 
> ~30GB Data.db file per day.  I am now getting hundreds per

Re: TWCS Compaction backed up

2018-08-07 Thread brian . spindler
Everything is ttl’d 

I suppose I could use sstablemeta to see the repaired bit, could I just set 
that to unrepaired somehow and that would fix? 

Thanks!

> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa  wrote:
> 
> May be worth seeing if any of the sstables got promoted to repaired - if so 
> they’re not eligible for compaction with unrepaired sstables and that could 
> explain some higher counts
> 
> Do you actually do deletes or is everything ttl’d?
>  
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Aug 7, 2018, at 5:09 PM, Brian Spindler  wrote:
>> 
>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are 
>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.   
>> 
>> Re incremental repair; Yes one of my engineers started an incremental repair 
>> on this column family that we had to abort.  In fact, the node that the 
>> repair was initiated on ran out of disk space and we ended replacing that 
>> node like a dead node.   
>> 
>> Oddly the new node is experiencing this issue as well.  
>> 
>> -B
>> 
>> 
>>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>>> You could toggle off the tombstone compaction to see if that helps, but 
>>> that should be lower priority than normal compactions
>>> 
>>> Are the lots-of-little-files from memtable flushes or repair/anticompaction?
>>> 
>>> Do you do normal deletes? Did you try to run Incremental repair?  
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
 On Aug 7, 2018, at 5:00 PM, Brian Spindler  
 wrote:
 
 Hi Jonathan, both I believe.  
 
 The window size is 1 day, full settings: 
 AND compaction = {'timestamp_resolution': 'MILLISECONDS', 
 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1', 
 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': 
 '86400', 'tombstone_threshold': '0.2', 'class': 
 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'} 
 
 
 nodetool tpstats 
 
 Pool NameActive   Pending  Completed   Blocked  
 All time blocked
 MutationStage 0 068582241832 0 
 0
 ReadStage 0 0  209566303 0 
 0
 RequestResponseStage  0 044680860850 0 
 0
 ReadRepairStage   0 0   24562722 0 
 0
 CounterMutationStage  0 0  0 0 
 0
 MiscStage 0 0  0 0 
 0
 HintedHandoff 1 1203 0 
 0
 GossipStage   0 08471784 0 
 0
 CacheCleanupExecutor  0 0122 0 
 0
 InternalResponseStage 0 0 552125 0 
 0
 CommitLogArchiver 0 0  0 0 
 0
 CompactionExecutor8421433715 0 
 0
 ValidationExecutor0 0   2521 0 
 0
 MigrationStage0 0 527549 0 
 0
 AntiEntropyStage  0 0   7697 0 
 0
 PendingRangeCalculator0 0 17 0 
 0
 Sampler   0 0  0 0 
 0
 MemtableFlushWriter   0 0 116966 0 
 0
 MemtablePostFlush 0 0 209103 0 
 0
 MemtableReclaimMemory 0 0 116966 0 
 0
 Native-Transport-Requests 1 0 1715937778 0 
176262
 
 Message type   Dropped
 READ 2
 RANGE_SLICE  0
 _TRACE   0
 MUTATION  4390
 COUNTER_MUTATION 0
 BINARY   0
 REQUEST_RESPONSE  1882
 PAGED_RANGE  0
 READ_REPAIR  0
 
 
> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:
> What's your window size?
> 
> When you say backed up, how are you measuring that?  Are there pending 
> tasks or do you just see more files than you expect?
> 
>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler  
>> wrote:
>> Hey guys, quick question: 
>>  
>> I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log on 
>> one d

Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
Hi, I spot checked a couple of the files that were ~200MB and the mostly
had "Repaired at: 0" so maybe that's not it?

-B


On Tue, Aug 7, 2018 at 8:16 PM  wrote:

> Everything is ttl’d
>
> I suppose I could use sstablemeta to see the repaired bit, could I just
> set that to unrepaired somehow and that would fix?
>
> Thanks!
>
> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa  wrote:
>
> May be worth seeing if any of the sstables got promoted to repaired - if
> so they’re not eligible for compaction with unrepaired sstables and that
> could explain some higher counts
>
> Do you actually do deletes or is everything ttl’d?
>
>
> --
> Jeff Jirsa
>
>
> On Aug 7, 2018, at 5:09 PM, Brian Spindler 
> wrote:
>
> Hi Jeff, mostly lots of little files, like there will be 4-5 that are
> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.
>
> Re incremental repair; Yes one of my engineers started an incremental
> repair on this column family that we had to abort.  In fact, the node that
> the repair was initiated on ran out of disk space and we ended replacing
> that node like a dead node.
>
> Oddly the new node is experiencing this issue as well.
>
> -B
>
>
> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>
>> You could toggle off the tombstone compaction to see if that helps, but
>> that should be lower priority than normal compactions
>>
>> Are the lots-of-little-files from memtable flushes or
>> repair/anticompaction?
>>
>> Do you do normal deletes? Did you try to run Incremental repair?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 7, 2018, at 5:00 PM, Brian Spindler 
>> wrote:
>>
>> Hi Jonathan, both I believe.
>>
>> The window size is 1 day, full settings:
>> AND compaction = {'timestamp_resolution': 'MILLISECONDS',
>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
>> 'tombstone_threshold': '0.2', 'class':
>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>>
>>
>> nodetool tpstats
>>
>> Pool NameActive   Pending  Completed   Blocked
>> All time blocked
>> MutationStage 0 068582241832 0
>>  0
>> ReadStage 0 0  209566303 0
>>  0
>> RequestResponseStage  0 044680860850 0
>>  0
>> ReadRepairStage   0 0   24562722 0
>>  0
>> CounterMutationStage  0 0  0 0
>>  0
>> MiscStage 0 0  0 0
>>  0
>> HintedHandoff 1 1203 0
>>  0
>> GossipStage   0 08471784 0
>>  0
>> CacheCleanupExecutor  0 0122 0
>>  0
>> InternalResponseStage 0 0 552125 0
>>  0
>> CommitLogArchiver 0 0  0 0
>>  0
>> CompactionExecutor8421433715 0
>>  0
>> ValidationExecutor0 0   2521 0
>>  0
>> MigrationStage0 0 527549 0
>>  0
>> AntiEntropyStage  0 0   7697 0
>>  0
>> PendingRangeCalculator0 0 17 0
>>  0
>> Sampler   0 0  0 0
>>  0
>> MemtableFlushWriter   0 0 116966 0
>>  0
>> MemtablePostFlush 0 0 209103 0
>>  0
>> MemtableReclaimMemory 0 0 116966 0
>>  0
>> Native-Transport-Requests 1 0 1715937778 0
>> 176262
>>
>> Message type   Dropped
>> READ 2
>> RANGE_SLICE  0
>> _TRACE   0
>> MUTATION  4390
>> COUNTER_MUTATION 0
>> BINARY   0
>> REQUEST_RESPONSE  1882
>> PAGED_RANGE  0
>> READ_REPAIR  0
>>
>>
>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad  wrote:
>>
>>> What's your window size?
>>>
>>> When you say backed up, how are you measuring that?  Are there pending
>>> tasks or do you just see more files than you expect?
>>>
>>> On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
>>> wrote:
>>>
 Hey guys, quick question:

 I've got a v2.1 cassandra cluster, 12 nodes on aws i3.2xl, commit log
 on one drive, data on nvme.  That was working very well, it's a ts db and
 has been accumulating data for about 4weeks.

 The nodes have increased in load and

Re: TWCS Compaction backed up

2018-08-07 Thread Brian Spindler
In fact all of them say Repaired at: 0.

On Tue, Aug 7, 2018 at 9:13 PM Brian Spindler 
wrote:

> Hi, I spot checked a couple of the files that were ~200MB and the mostly
> had "Repaired at: 0" so maybe that's not it?
>
> -B
>
>
> On Tue, Aug 7, 2018 at 8:16 PM  wrote:
>
>> Everything is ttl’d
>>
>> I suppose I could use sstablemeta to see the repaired bit, could I just
>> set that to unrepaired somehow and that would fix?
>>
>> Thanks!
>>
>> On Aug 7, 2018, at 8:12 PM, Jeff Jirsa  wrote:
>>
>> May be worth seeing if any of the sstables got promoted to repaired - if
>> so they’re not eligible for compaction with unrepaired sstables and that
>> could explain some higher counts
>>
>> Do you actually do deletes or is everything ttl’d?
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Aug 7, 2018, at 5:09 PM, Brian Spindler 
>> wrote:
>>
>> Hi Jeff, mostly lots of little files, like there will be 4-5 that are
>> 1-1.5gb or so and then many at 5-50MB and many at 40-50MB each.
>>
>> Re incremental repair; Yes one of my engineers started an incremental
>> repair on this column family that we had to abort.  In fact, the node that
>> the repair was initiated on ran out of disk space and we ended replacing
>> that node like a dead node.
>>
>> Oddly the new node is experiencing this issue as well.
>>
>> -B
>>
>>
>> On Tue, Aug 7, 2018 at 8:04 PM Jeff Jirsa  wrote:
>>
>>> You could toggle off the tombstone compaction to see if that helps, but
>>> that should be lower priority than normal compactions
>>>
>>> Are the lots-of-little-files from memtable flushes or
>>> repair/anticompaction?
>>>
>>> Do you do normal deletes? Did you try to run Incremental repair?
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Aug 7, 2018, at 5:00 PM, Brian Spindler 
>>> wrote:
>>>
>>> Hi Jonathan, both I believe.
>>>
>>> The window size is 1 day, full settings:
>>> AND compaction = {'timestamp_resolution': 'MILLISECONDS',
>>> 'unchecked_tombstone_compaction': 'true', 'compaction_window_size': '1',
>>> 'compaction_window_unit': 'DAYS', 'tombstone_compaction_interval': '86400',
>>> 'tombstone_threshold': '0.2', 'class':
>>> 'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
>>>
>>>
>>> nodetool tpstats
>>>
>>> Pool NameActive   Pending  Completed   Blocked
>>> All time blocked
>>> MutationStage 0 068582241832 0
>>>0
>>> ReadStage 0 0  209566303 0
>>>0
>>> RequestResponseStage  0 044680860850 0
>>>0
>>> ReadRepairStage   0 0   24562722 0
>>>0
>>> CounterMutationStage  0 0  0 0
>>>0
>>> MiscStage 0 0  0 0
>>>0
>>> HintedHandoff 1 1203 0
>>>0
>>> GossipStage   0 08471784 0
>>>0
>>> CacheCleanupExecutor  0 0122 0
>>>0
>>> InternalResponseStage 0 0 552125 0
>>>0
>>> CommitLogArchiver 0 0  0 0
>>>0
>>> CompactionExecutor8421433715 0
>>>0
>>> ValidationExecutor0 0   2521 0
>>>0
>>> MigrationStage0 0 527549 0
>>>0
>>> AntiEntropyStage  0 0   7697 0
>>>0
>>> PendingRangeCalculator0 0 17 0
>>>0
>>> Sampler   0 0  0 0
>>>0
>>> MemtableFlushWriter   0 0 116966 0
>>>0
>>> MemtablePostFlush 0 0 209103 0
>>>0
>>> MemtableReclaimMemory 0 0 116966 0
>>>0
>>> Native-Transport-Requests 1 0 1715937778 0
>>>   176262
>>>
>>> Message type   Dropped
>>> READ 2
>>> RANGE_SLICE  0
>>> _TRACE   0
>>> MUTATION  4390
>>> COUNTER_MUTATION 0
>>> BINARY   0
>>> REQUEST_RESPONSE  1882
>>> PAGED_RANGE  0
>>> READ_REPAIR  0
>>>
>>>
>>> On Tue, Aug 7, 2018 at 7:57 PM Jonathan Haddad 
>>> wrote:
>>>
 What's your window size?

 When you say backed up, how are you measuring that?  Are there pending
 tasks or do you just see more files than you expect?

 On Tue, Aug 7, 2018 at 4:38 PM Brian Spindler 
 wrote:


Apache Cassandra Blog is now live

2018-08-07 Thread sankalp kohli
Hi,
 Apache Cassandra Blog is now live. Check out the first blog post.

http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html

Thanks,
Sankalp


New community blog with inaugural post on faster streaming in 4.0

2018-08-07 Thread Nate McCall
Hi folks,
We just added a blog section to our site, with a post detailing
performance improvements of streaming coming in 4.0:
http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html

I think it's a good indicator of what we are going for that our first
author is not a committer or PMC member. Any subject ideas, please
bring them up on the the dev list (d...@cassandra.apache.org) or open a
JIRA. As long as it's informative and about Apache Cassandra, we are
interested.

Thanks,
-Nate

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Apache Cassandra Blog is now live

2018-08-07 Thread Nate McCall
You can tell how psyched we are about it because we cross posted!

Seriously though - this is by the community for the community, so any
ideas - please send them along.

On Wed, Aug 8, 2018 at 1:53 PM, sankalp kohli  wrote:
> Hi,
>  Apache Cassandra Blog is now live. Check out the first blog post.
>
> http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html
>
> Thanks,
> Sankalp

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org