Re: Open source equivalents of OpsCenter

2016-07-14 Thread Michał Łowicki
t : >> >> >> we use datadog (metrics emitted as raw statsd) for the dashboard. All >> repair & compaction is done via blender & serf[1]. >> [1]https://github.com/pagerduty/blender >> >> >> On Wed, Jul 13, 2016 at 2:42 PM, Kevin O'Connor > > wrote: >> >> Now that OpsCenter doesn't work with open source installs, are there any >> runs at an open source equivalent? I'd be more interested in looking at >> metrics of a running cluster and doing other tasks like managing >> repairs/rolling restarts more so than historical data. >> >> >> >> >> >> >> >> > -- BR, Michał Łowicki

Re: Cassandra monitoring

2016-06-14 Thread Michał Łowicki
;>> it does not support cassandra versions greater than v2.1. It is pretty >>>> surprising considering cassandra v2.1 came out in 2014. >>>> >>>> We would consider downgrading to datastax cassandra 2.1 just to have >>>> robust monitoring tools. But, I am not sure if having opscenter offsets all >>>> the improvements that have been added to cassandra since 2.1. >>>> >>>> Sematext has a integrations for monitoring cassandra. Does anyone have >>>> good experience with it ? >>>> >>>> How much work would be involved to setup Ganglia or some such option >>>> for cassandra ? >>>> >>>> Thanks, >>>> Arun >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> -- BR, Michał Łowicki

Re: Replacing disks

2016-02-29 Thread Michał Łowicki
r a compaction >> operation. >> (This is an overly simplistic description. Reality is always more >> nuanced. datastax had a blog post that describes this better as well as >> limitations to the algorithm in 2.1 which are addressed in the 3.x releases >> ) >> >&g

Re: Replacing disks

2016-02-28 Thread Michał Łowicki
wap and > restart of the node before the hinted handoff window expires on the other > nodes. If you do not complete in time, you'll want to perform a repair on > the node. > Yes. Thanks! > > > Clint > On Feb 28, 2016 9:33 AM, "Michał Łowicki" wrote: > &

Replacing disks

2016-02-28 Thread Michał Łowicki
ionInfo.db foo-bar-ka-1630184-Data.db foo-bar-ka-1630184-Digest.sha1 foo-bar-ka-1630184-Filter.db foo-bar-ka-1630184-Index.db foo-bar-ka-1630184-Statistics.db foo-bar-ka-1630184-Summary.db foo-bar-ka-1630184-TOC.txt Is this something which should work or you see some obstacles? (C* 2.1.13).

Re: Increase compaction performance

2016-02-12 Thread Michał Łowicki
ually to see if can make it any better. Thanks! On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki wrote: > > > On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ > wrote: > >> Also, are you using incremental repairs (not sure about the available >> options in Spotify Rea

Re: Increase compaction performance

2016-02-11 Thread Michał Łowicki
> >> You can lower the stream throughput to make sure nodes can cope with what >> repairs are feeding them. >> >> nodetool getstreamthroughput >> nodetool setstreamthroughput X >> > Yes, this sounds interesting. As we're having problem with repair

Increase compaction performance

2016-02-11 Thread Michał Łowicki
compaction? Increased compaction throughput and concurrent compactors but no change. Seems there is plenty idle resources but can't force C* to use it. Any clue where there might be a bottleneck? -- BR, Michał Łowicki

Much less connected native clients after node join

2015-11-15 Thread Michał Łowicki
hours so I restarted newly joined node ~9:50 and everything looked much better. I guess expected behaviour would be to have same number connected clients after some time. ​ -- BR, Michał Łowicki

compaction became super slow after interrupted repair

2015-09-26 Thread Michał Łowicki
stem.log?dl=0). Note that I'm experiencing CASSANDRA-9935 while running repair on each node from the cluster. Any help will be much appreciated. -- BR, Michał Łowicki

Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Looks that memtable heap size is growing on some nodes rapidly ( https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0). Drops are the places when nodes have been restarted. On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki wrote: > Hi, > > Two datacente

Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
, Michał Łowicki

Re: How to interpret some GC logs

2015-06-02 Thread Michał Łowicki
Stamps -Xloggc:/var/log/cassandra/gc.log > > > Best Regards, > Sebastian Martinka > > > > *Von:* Michał Łowicki [mailto:mlowi...@gmail.com] > *Gesendet:* Montag, 1. Juni 2015 11:47 > *An:* user@cassandra.apache.org > *Betreff:* How to interpret some GC logs >

Re: How to interpret some GC logs

2015-06-02 Thread Michał Łowicki
gt; On Mon, Jun 1, 2015 at 5:46 PM, Michał Łowicki wrote: > >> Hi, >> >> Normally I get logs like: >> >> 2015-06-01T09:19:50.610+: 4736.314: [GC 6505591K->4895804K(8178944K), >> 0.0494560 secs] >> >> which is fine and understandable but oc

How to interpret some GC logs

2015-06-01 Thread Michał Łowicki
it? Does it miss only part before "->" so memory occupied before GC cycle? -- BR, Michał Łowicki

Compaction freezes

2015-05-10 Thread Michał Łowicki
mpactor is doing currently? I've enabled DEBUG logging but it's too verbose as node is getting some traffic. Can I enable DEBUG for compaction only? -- BR, Michał Łowicki

Re: C* 2.1.2 invokes oom-killer

2015-02-23 Thread Michał Łowicki
After couple of days it's still behaving fine. Case closed. On Thu, Feb 19, 2015 at 11:15 PM, Michał Łowicki wrote: > Upgrade to 2.1.3 seems to help so far. After ~12 hours total memory > consumption grew from 10GB to 10.5GB. > > On Thu, Feb 19, 2015 at 2:02 PM, Carlos Rolo

Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
; Tel: 1649 > www.pythian.com > > On Thu, Feb 19, 2015 at 12:16 PM, Michał Łowicki > wrote: > >> |trickle_fsync| has been enabled for long time in our settings (just >> noticed): >> >> trickle_fsync: true >> >> trickle_fsync_interval_in_kb: 10240 &g

Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
|trickle_fsync| has been enabled for long time in our settings (just noticed): trickle_fsync: true trickle_fsync_interval_in_kb: 10240 On Thu, Feb 19, 2015 at 12:12 PM, Michał Łowicki wrote: > > > On Thu, Feb 19, 2015 at 11:02 AM, Carlos Rolo wrote: > >> Do you have tri

Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
ian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo > <http://linkedin.com/in/carlosjuzarterolo>* > Tel: 1649 > www.pythian.com > > On Thu, Feb 19, 2015 at 10:49 AM, Michał Łowicki > wrote: > >> >> >> On Thu, Feb 19, 2015 at 10:41 AM, Carlos Rol

Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
this before, and there was a tipping point around 70ms. > Write request latency is below 0.05 ms/op (avg). Checked with OpsCenter. > > -- > > > > -- BR, Michał Łowicki

Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
nkedin.com/in/carlosjuzarterolo>* > Tel: 1649 > www.pythian.com > > On Thu, Feb 19, 2015 at 9:16 AM, Michał Łowicki > wrote: > >> We don't have other things running on these boxes and C* is consuming all >> the memory. >> >> Will try to upgrade to 2.1.3 and if w

Re: C* 2.1.2 invokes oom-killer

2015-02-19 Thread Michał Łowicki
__ > Sent from iPhone >> On 19 Feb 2015, at 5:28 am, Michał Łowicki wrote: >> >> Hi, >> >> Couple of times a day 2 out of 4 members cluster nodes are killed >> >> root@db4:~# dmesg | grep -i oom >> [4811135.792657] [ pid ] uid tgid total_vm

C* 2.1.2 invokes oom-killer

2015-02-18 Thread Michał Łowicki
50GB. Any help will be appreciated. -- BR, Michał Łowicki

Re: Timeouts but returned consistency level is invalid

2015-01-30 Thread Michał Łowicki
almost certainly > running into https://issues.apache.org/jira/browse/CASSANDRA-7947 which is > fixed in 2.1.3 and 2.0.12. > On Fri, Jan 30, 2015 at 8:37 AM, Michał Łowicki wrote: >> Hi Jan, >> >> I'm using only one keyspace. Even if it defaults to ONE why sometimes AL

Re: Timeouts but returned consistency level is invalid

2015-01-30 Thread Michał Łowicki
for the keyspace. > > Could it be possible that your queries are spanning multiple keyspaces > which bear different levels of consistency ? > > cheers > Jan > > C* Architect > > > On Friday, January 30, 2015 1:36 AM, Michał Łowicki > wrote: > > > Hi, > &

Timeouts but returned consistency level is invalid

2015-01-30 Thread Michał Łowicki
onses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} Any idea why it might happen? -- BR, Michał Łowicki

Re: Inconsistencies between two tables if BATCH used

2015-01-16 Thread Michał Łowicki
Done. https://issues.apache.org/jira/browse/CASSANDRA-8636 On Thu, Jan 15, 2015 at 7:46 PM, Robert Coli wrote: > On Thu, Jan 15, 2015 at 9:09 AM, Michał Łowicki > wrote: > >> We were using LOCAL_QUROUM. C* 2.1.2. Two datacenters. We didn't get any >> exception

Inconsistencies between two tables if BATCH used

2015-01-15 Thread Michał Łowicki
;) VALUES (%(5)s, %(6)s, %(7)s, %(8)s, %(9)s, %(10)s, %(11)s)\nAPPLY BATCH;',) We suspect that it's a problem in the C* itself. Any ideas how to debug what is going on as BATCH is needed in this case? -- BR, Michał Łowicki

Re: Number of SSTables grows after repair

2015-01-05 Thread Michał Łowicki
@Robert could you point me to some of those issues? I would be very graceful for some explanation why this is semi-expected. On Fri, Jan 2, 2015 at 8:01 PM, Robert Coli wrote: > On Mon, Dec 15, 2014 at 1:51 AM, Michał Łowicki > wrote: > >> We've noticed that number of SSTa

Number of SSTables grows after repair

2014-12-15 Thread Michał Łowicki
are ~60 bytes in size (http://paste.ofcode.org/6yyH2X52emPNrKdw3WXW3d) Table information - http://paste.ofcode.org/32RijfxQkNeb9cx9GAAnM45 We're using Cassandra 2.1.2. -- BR, Michał Łowicki