Re: Connection reset during repair service

Alain RODRIGUEZ Wed, 17 Jun 2015 00:44:09 -0700

Hi David, Edouard,

Depending on your data model on event_data, you might want to consider
upgrading to use DTCS (C* 2.0.11+).


Basically if those tombstones are due to a a Constant TTL and this is a
time series, it could be a real improvement.

See:
https://labs.spotify.com/2014/12/18/date-tiered-compaction/
http://www.datastax.com/dev/blog/datetieredcompactionstrategy

I am not sure this is related to your problem but having 8904 tombstones
read at once is pretty bad. Also you might want to paginate queries a bit
since it looks like you retrieve a lot of data at once.

Meanwhile, if you are using STCS you can consider performing major
compaction on a regular basis (taking into consideration major compaction
downsides)

C*heers,

Alain





2015-06-12 15:08 GMT+02:00 David CHARBONNIER <david.charbonn...@rgsystem.com
>:

>  Hi,
>
>
>
> We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and we’re
> experiencing issues with OPSCenter (version 5.1.3) Repair Service.
>
> When Repair Service is running, we can see repair timing out on a few
> ranges in OPSCenter’s event log viewer. See screenshot attached.
>
>
>
> On our Cassandra nodes, we can see a lot of theese messages in
> cassandra/system.log log file while a timeout shows up in OPSCenter :
>
>
>
>                 ERROR [Native-Transport-Requests:3372] 2015-06-12
> 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during
> request
>
> java.io.IOException: Connection reset by peer
>
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>
>         at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>
>         at sun.nio.ch.IOUtil.read(Unknown Source)
>
>         at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>
>         at
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
>
>         at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>
>         at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>
>         at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>
>         at
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>         at java.lang.Thread.run(Unknown Source)
>
>
>
> You’ll find attached an extract of the system.log file with some more
> informations.
>
>
>
> Do you have any idea of what’s happening ?
>
>
>
> We suspect timeouts happening because we have some tables with many
> tombstones, and a warning is something triggered. We have edited the
> configuration allow warning, but still perform until encounter 1.000.000
> tombstones.
>
>
>
> During a compaction, we’ve also warning messages telling us that we’ve a
> lot of tombstones too :
>
>
>
>                 WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904
> SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells
> in rgsupv.event_data (see tombstone_warn_threshold). 10000 columns was
> requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
> localDeletion=2147483647}
>
>
>
> Do you think it’s related to our first problem ?
>
>
>
> Our cluster is configured as follow :
>
> -          8 nodes with Debian 7.8 x64
>
> -          16Gb of memory and 4 CPU
>
> -          2  HDD : 1 for the system and the other for the data directory
>
>
>
> Best regards,
>
>
>
>     *David CHARBONNIER*
>
> Sysadmin
>
> T : +33 411 934 200
>
> david.charbonn...@rgsystem.com
>
> ZAC Aéroport
>
> 125 Impasse Adam Smith
>
> 34470 Pérols - France
>
> *www.rgsystem.com* <http://www.rgsystem.com/>
>
>
>
>
>
>
>

Re: Connection reset during repair service

Reply via email to