from:"Steinmaurer, Thomas"

Repair of 5GB data vs. disk throughput does not make sense

2018-04-26 Thread Steinmaurer, Thomas

Hello, yet another question/issue with repair. Cassandra 2.1.18, 3 nodes, RF=3, vnode=256, data volume ~ 5G per node only. A repair (nodetool repair -par) issued on a single node at this data volume takes around 36min with an AVG of ~ 15MByte/s disk throughput (read+write) for the entire time-

RE: Anyone try out C* with latest Oracle JDK update?

2018-05-23 Thread Steinmaurer, Thomas

Hi Sam, in our pre-production stages, we are running Cassandra 3.0 and 3.11 with 8u172 (previously u102 then u162) without any visible troubles/regressions. In case of Cassandra 3.11, you need 3.11.2 due to: https://issues.apache.org/jira/browse/CASSANDRA-14173. Cassandra 3.0 is not affected b

nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-28 Thread Steinmaurer, Thomas

Hello, on a quite capable machine with 32 physical cores (64 vCPUs) we see sporadic CPU usage up to 50% caused by nodetool on this box, thus digged a bit further. A few observations: 1) nodetool is reusing the $MAX_HEAP_SIZE environment variable, thus if we are running Cassandra with e.g. Xmx3

RE: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-29 Thread Steinmaurer, Thomas

Hi Kurt, thanks for pointing me to the Xmx issue. JIRA + patch (for Linux only based on C* 3.11) for the parallel GC thread issue is available here: https://issues.apache.org/jira/browse/CASSANDRA-14475 Thanks, Thomas From: kurt greaves [mailto:k...@instaclustr.com] Sent: Dienstag, 29. Mai 201

RE: nodetool (2.1.18) - Xmx, ParallelGCThreads, High CPU usage

2018-05-29 Thread Steinmaurer, Thomas

he minimum heapsize by default will be 256mb, which isn't hugely problematic, and it's unlikely more than that would get allocated. On 29 May 2018 at 09:29, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hi Kurt, thanks for pointing me to the Xmx issue. JIRA + p

RE: 3.11.2 memory leak

2018-06-04 Thread Steinmaurer, Thomas

Jeff, FWIW, when talking about https://issues.apache.org/jira/browse/CASSANDRA-13929, there is a patch available since March without getting further attention. Regards, Thomas From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Dienstag, 05. Juni 2018 00:51 To: cassandra Subject: Re: 3.11.2 memor

Compaction throughput vs. number of compaction threads?

2018-06-05 Thread Steinmaurer, Thomas

Hello, most likely obvious and perhaps already answered in the past, but just want to be sure ... E.g. I have set: concurrent_compactors: 4 compaction_throughput_mb_per_sec: 16 I guess this will lead to ~ 4MB/s per Thread if I have 4 compactions running in parallel? So, in case of upscaling

compaction_throughput: Difference between 0 (unthrottled) and large value

2018-06-10 Thread Steinmaurer, Thomas

Hello, on a 3 node loadtest cluster with very capable machines (32 physical cores, 512G RAM, 20T storage (26 disk RAID)), I'm trying to max out compaction, thus currently testing with: concurrent_compactors: 16 compaction_throughput_mb_per_sec: 0 With our simulated incoming load + compaction e

RE: compaction_throughput: Difference between 0 (unthrottled) and large value

2018-06-11 Thread Steinmaurer, Thomas

Sorry, should have first looked at the source code. In case of 0, it is set to Double.MAX_VALUE. Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent: Montag, 11. Juni 2018 08:53 To: user@cassandra.apache.org Subject: compaction_throughput: Difference between 0

RE: G1GC CPU Spike

2018-06-13 Thread Steinmaurer, Thomas

Explicitly setting Xmn with G1 basically results in overriding the target pause-time goal, thus should be avoided. http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html Thomas From: rajpal reddy [mailto:rajpalreddy...@gmail.com] Sent: Mittwoch, 13. Juni 2018 17:27 To: user@cassandra

RE: Reaper 1.2 released

2018-07-25 Thread Steinmaurer, Thomas

Jon, eager trying it out. 😊 Just FYI. Followed the installation instructions on http://cassandra-reaper.io/docs/download/install/ Debian-based. 1) Importing the key results in: XXX:~$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 2895100917357435 Executing: /tmp/tmp.tP0KAKG6iT/

Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-06 Thread Steinmaurer, Thomas

Hello, with 2.1, in case a second Cassandra process/instance is started on a host (by accident), may this result in some sort of corruption, although Cassandra will exit at some point in time due to not being able to bind TCP ports already in use? What we have seen in this scenario is somethin

Configuration parameter to reject incremental repair?

2018-08-06 Thread Steinmaurer, Thomas

Hello, we are running Cassandra in AWS and On-Premise at customer sites, currently 2.1 in production with 3.11 in loadtest. In a migration path from 2.1 to 3.11.x, I'm afraid that at some point in time we end up in incremental repairs being enabled / ran a first time unintentionally, cause: a)

RE: Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-13 Thread Steinmaurer, Thomas

540 should have gone to 2.1 in the first place, but it just got missed. Very simple patch so I think a backport should be accepted. On 7 August 2018 at 15:57, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, with 2.1, in case a second Cassandra process/instance is s

RE: Data Corruption due to multiple Cassandra 2.1 processes?

2018-09-05 Thread Steinmaurer, Thomas

2.1 processes? New ticket for backporting, referencing the existing. On Mon., 13 Aug. 2018, 22:50 Steinmaurer, Thomas, mailto:thomas.steinmau...@dynatrace.com>> wrote: Thanks Kurt. What is the proper workflow here to get this accepted? Create a new ticket dedicated for the backport refer

nodetool cleanup - compaction remaining time

2018-09-05 Thread Steinmaurer, Thomas

Hello, is it a known issue / limitation that cleanup compactions aren't counted in the compaction remaining time? nodetool compactionstats -H pending tasks: 1 compaction type keyspace table completed totalunit progress CleanupXXX YYY

RE: nodetool cleanup - compaction remaining time

2018-09-06 Thread Steinmaurer, Thomas

to:jji...@gmail.com>> a écrit : Probably worth a JIRA (especially if you can repro in 3.0 or higher, since 2.1 is critical fixes only) On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, is it a known issue / limitation that

RE: nodetool cleanup - compaction remaining time

2018-09-07 Thread Steinmaurer, Thomas

in 3.0 or higher, since 2.1 is critical fixes only) On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, is it a known issue / limitation that cleanup compactions aren’t counted in the compaction remaining time? nodetool compactionst

RE: Configuration parameter to reject incremental repair?

2018-09-09 Thread Steinmaurer, Thomas

incremental repair? No flag currently exists. Probably a good idea considering the serious issues with incremental repairs since forever, and the change of defaults since 3.0. On 7 August 2018 at 16:44, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we are r

Scrub a single SSTable only?

2018-09-10 Thread Steinmaurer, Thomas

Hello, is there a way to Online scrub a particular SSTable file only and not the entire column family? According to the Cassandra logs we have a corrupted SSTable smallish compared to the entire data volume of the column family in question. To my understanding, both, nodetool scrub and sstable

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas

From: Jeff Jirsa Sent: Montag, 10. September 2018 19:40 To: cassandra Subject: Re: Drop TTLd rows: upgradesstables -a or scrub? I think it's important to describe exactly what's going on for people who just read the list but who don't have context. This blog does a really good job: http://the

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas

As far as I remember, in newer Cassandra versions, with STCS, nodetool compact offers a ‘-s’ command-line option to split the output into files with 50%, 25% … in size, thus in this case, not a single largish SSTable anymore. By default, without -s, it is a single SSTable though. Thomas From:

RE: Drop TTLd rows: upgradesstables -a or scrub?

2018-09-11 Thread Steinmaurer, Thomas

. September 2018 09:47 To: User Subject: Re: Drop TTLd rows: upgradesstables -a or scrub? On Tue, Sep 11, 2018 at 9:31 AM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: As far as I remember, in newer Cassandra versions, with STCS, nodetool compact offers a ‘-s’ comman

RE: Read timeouts when performing rolling restart

2018-09-12 Thread Steinmaurer, Thomas

Hi, I remember something that a client using the native protocol gets notified too early by Cassandra being ready due to the following issue: https://issues.apache.org/jira/browse/CASSANDRA-8236 which looks similar, but above was marked as fixed in 2.2. Thomas From: Riccardo Ferrari Sent: Mit

Apache Thrift library 0.9.2 update due to security vulnerability?

2018-09-14 Thread Steinmaurer, Thomas

Hello, a Blackduck security scan of our product detected a security vulnerability in the Apache Thrift library 0.9.2, which is shipped in Cassandra up to 3.11 (haven't checked 4.0), also pointed out here: https://www.cvedetails.com/vulnerability-list/vendor_id-45/product_id-38295/Apache-Thrift.h

RE: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re: Drop TTLd rows: upgradesstables -a or scrub?))

2018-09-18 Thread Steinmaurer, Thomas

Alex, any indications in Cassandra log about insufficient disk space during compactions? Thomas From: Oleksandr Shulgin Sent: Dienstag, 18. September 2018 10:01 To: User Subject: Major compaction ignoring one SSTable? (was Re: Fresh SSTable files (due to repair?) in a static table (was Re: D

Cassandra 2.1.21 ETA?

2018-09-21 Thread Steinmaurer, Thomas

Hello, is there an ETA for 2.1.21 containing the logback update (security vulnerability fix)? Thanks, Thomas The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, y

RE: Cassandra 2.1.21 ETA?

2018-09-21 Thread Steinmaurer, Thomas

> On 9/21/18 3:28 AM, Steinmaurer, Thomas wrote: > > > > is there an ETA for 2.1.21 containing the logback update (security > > vulnerability fix)? > > Are you using SocketServer? Is your cluster firewalled? > > Feb 2018 2.1->3.11 commits noting this in NEWS.txt: > ht

RE: Cassandra 2.1.21 ETA?

2018-10-01 Thread Steinmaurer, Thomas

age- > From: Michael Shuler On Behalf Of Michael > Shuler > Sent: Freitag, 21. September 2018 15:49 > To: user@cassandra.apache.org > Subject: Re: Cassandra 2.1.21 ETA? > > On 9/21/18 3:28 AM, Steinmaurer, Thomas wrote: > > > > is there an ETA for 2.1.21 con

Cassandra 2.1 bootstrap - No streaming progress from one node

2018-11-07 Thread Steinmaurer, Thomas

Hello, while bootstrapping a new node into an existing cluster, a node which is acting as source for streaming got restarted unfortunately. Since then, from nodetool netstats I don't see any progress for this particular node anymore. E.g.: /X.X.X.X Receiving 94 files, 260.09 GB total.

JMX metric for dropped hints?

2019-01-21 Thread Steinmaurer, Thomas

Hello, is there a JMX metric for monitoring dropped hints as a counter/rate, equivalent to what we see in Cassandra log, e.g.: WARN [HintedHandoffManager:1] 2018-11-13 13:28:46,991 HintedHandoffMetrics.java:79 - /XXX has 18180 dropped hints, because node is down past configured hint window. W

RE: JMX metric for dropped hints?

2019-01-22 Thread Steinmaurer, Thomas

org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/FiveMinuteRate org.apache.cassandra.metrics/DroppedMessage/HINT/Attributes/FifteenMinuteRate Hayato On Tue, 22 Jan 2019 at 07:45, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, is there a JMX metric for monitoring d

Cassandra 2.1.18 - NPE during startup

2019-02-05 Thread Steinmaurer, Thomas

Hello, at a particular customer location, we are seeing the following NPE during startup with Cassandra 2.1.18. INFO [SSTableBatchOpen:2] 2019-02-03 13:32:56,131 SSTableReader.java:475 - Opening /var/opt/data/cassandra/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_key

RE: Cassandra 2.1.18 - NPE during startup

2019-03-27 Thread Steinmaurer, Thomas

Hello, any ideas regarding below, cause it happened again on a different node. Thanks Thomas From: Steinmaurer, Thomas Sent: Dienstag, 05. Februar 2019 23:03 To: user@cassandra.apache.org Subject: Cassandra 2.1.18 - NPE during startup Hello, at a particular customer location, we are seeing

Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas

Hello, using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based), vnodes=256, RF=3, we are trying to add a 4th node. The two options to my knowledge, mainly affecting throughput, namely stream output and compaction throttling has been set to very high values (e.g. stream output = 800 Mbit/s resp. comp

RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas

: Oleksandr Shulgin Sent: Dienstag, 22. Oktober 2019 16:35 To: User Subject: Re: Cassandra 2.1.18 - Question on stream/bootstrap throughput On Tue, Oct 22, 2019 at 12:47 PM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: using 2.1.8, 3 nodes (m4.10xlarge, ESB SSD-based),

RE: Cassandra 2.1.18 - Question on stream/bootstrap throughput

2019-10-22 Thread Steinmaurer, Thomas

of memory copies is not going to manifest that much CPU, it’s memory bus bandwidth you are fighting against then. It is easy to have a box that looks unused but in reality its struggling. Given that you’ve opened up the floodgates on compaction, that would seem quite plausible to be what you

Questions on time series use case, tombstones, TWCS

2017-08-09 Thread Steinmaurer, Thomas

Hello, our top contributor from a data volume perspective is time series data. We are running with STCS since our initial production deployment in 2014 with several clusters with a varying number of nodes, but currently with max. 9 nodes per single cluster per different region in AWS with m4.xl

Multi-node repair fails after upgrading to 3.0.14

2017-09-14 Thread Steinmaurer, Thomas

Hello, we are currently in the process of upgrading from 2.1.18 to 3.0.14. After upgrading a few test environments, we start to see some suspicious log entries regarding repair issues. We have a cron job on all nodes basically executing the following repair call on a daily basis: nodetool rep

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas

ext node only when nodetool or the logs show that repair is over (which will include the anticompaction phase). Cheers, On Fri, Sep 15, 2017 at 8:42 AM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we are currently in the process of upgrading from 2.1.18 to 3.

GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas

Hello, we have a test (regression) environment hosted in AWS, which is used for auto deploying our software on a daily basis and attach constant load across all deployments. Basically to allow us to detect any regressions in our software on a daily basis. On the Cassandra-side, this is single-

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas

ps://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableRepairedSet.html>. Cheers, On Fri, Sep 15, 2017 at 10:27 AM Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hi Alex, thanks a lot. Somehow missed that incremental repairs are the default now. We

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-15 Thread Steinmaurer, Thomas

ep 15, 2017, at 2:37 AM, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we have a test (regression) environment hosted in AWS, which is used for auto deploying our software on a daily basis and attach constant load across all deployments. Basically to allow u

RE: Compaction in cassandra

2017-09-15 Thread Steinmaurer, Thomas

Hi, usually automatic minor compactions are fine, but you may need much more free disk space to reclaim disk space via automatic minor compactions, especially in a time series use case with size-tiered compaction strategy (possibly with leveled as well, I’m not familiar with this strategy type)

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas

t using incremental repairs). If you wish to do this, you'll have to mark back all your sstables to unrepaired, using nodetool sstablerepairedset<https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSSTableRepairedSet.html>. Cheers, On Fri, Sep 15, 2017 at 10:27 AM Ste

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-18 Thread Steinmaurer, Thomas

3.0 in context of CPU/GC and not disk savings? Thanks, Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent: Freitag, 15. September 2017 13:51 To: user@cassandra.apache.org Subject: RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18) Hi Jeff, we are using

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas

ant to fully avoid marking SSTables as repaired (which you don't need anyway since you're not using incremental repairs). If you wish to do this, you'll have to mark back all your sstables to unrepaired, using nodetool sstablerepairedset<https://docs.datastax.

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Steinmaurer, Thomas

users may want to run keep running full repairs without the additional cost of anti-compaction. Would you mind opening a ticket for this? 2017-09-19 1:33 GMT-05:00 Steinmaurer, Thomas : > Hi Kurt, > > > > thanks for the link! > > > > Honestly, a pity, that in 3.0, we can

RE: Reg:- Install / Configure Cassandra on 2 DCs with 3 nodes

2017-09-19 Thread Steinmaurer, Thomas

Nandan, you may find the following useful. Slideshare: https://www.slideshare.net/DataStax/apache-cassandra-multidatacenter-essentials-julien-anguenot-iland-internet-solutions-c-summit-2016 Youtube: https://www.youtube.com/watch?v=G6od16YKSsA From a client perspective, if you are targeting quor

RE: Row Cache hit issue

2017-09-19 Thread Steinmaurer, Thomas

Hi, additionally, with saved (key) caches, we had some sort of corruption (I think, for whatever reason) once. So, if you see something like that upon Cassandra startup: INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading saved cache /var/opt/xxx/cassandra/saved_caches/

RE: network down between DCs

2017-09-21 Thread Steinmaurer, Thomas

Hi, within the default hint window of 3 hours, the hinted handoff mechanism should take care of that, but we have seen that failing from time to time (depending on the load) in 2.1 with some sort of tombstone related issues causing failing requests on the system hints table. So, watch out any s

RE: Massive deletes -> major compaction?

2017-09-22 Thread Steinmaurer, Thomas

Additional to Kurt’s reply. Double disk usage is really the worst case. Most of the time you are fine having > largest column family free disk available. Also take local snapshots into account. Even after a finished major compaction, disk space may have not been reclaimed, if snapshot sym links

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-25 Thread Steinmaurer, Thomas

://issues.apache.org/jira/browse/CASSANDRA-13900. Feel free to request any further additional information on the ticket. Unfortunately this is a real show-stopper for us upgrading to 3.0. Thanks for your attention. Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas

situation after upgrading from 2.1.14 to 3.11 in our production. Have you already tried G1GC instead of CMS? Our timeouts were mitigated after replacing CMS with G1GC. Thanks. 2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>>: Hello, I have now some co

RE: GC/CPU increase after upgrading to 3.0.14 (from 2.1.18)

2017-09-26 Thread Steinmaurer, Thomas

ted after replacing CMS with G1GC. Thanks. 2017-09-25 20:01 GMT+09:00 Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>>: Hello, I have now some concrete numbers from our 9 node loadtest cluster with constant load, same infrastructure after upgrading to 3.0.14 from 2.1.18. We see

RE: 回复： nodetool cleanup in parallel

2017-09-26 Thread Steinmaurer, Thomas

Side-note: At least with 2.1 (or even later), be aware that you might run into the following issue: https://issues.apache.org/jira/browse/CASSANDRA-11155 We are doing cron―job based hourly snapshots in production and have tried to also run cleanup after extending a cluster from 6 to 9 nodes. Thi

RE:

2017-09-28 Thread Steinmaurer, Thomas

Dan, do you see any major GC? We have been hit by the following memory leak in our loadtest environment with 3.11.0. https://issues.apache.org/jira/browse/CASSANDRA-13754 So, depending on the heap size and uptime, you might get into heap troubles. Thomas From: Dan Kinder [mailto:dkin...@turnit

RE: space left for compaction

2017-10-01 Thread Steinmaurer, Thomas

Hi, half of free space does not make sense. Imagine your SSTables need 100G space and you have 20G free disk. Compaction won't be able to do its job with 10G. Half free of total disk makes more sense and is what you need for a major compaction worst case. Thomas From: Peng Xiao [mailto:2535..

Cassandra 3.11.1 (snapshot build) - io.netty.util.Recycler$Stack memory leak

2017-10-01 Thread Steinmaurer, Thomas

Hello, we were facing a memory leak with 3.11.0 (https://issues.apache.org/jira/browse/CASSANDRA-13754) thus upgraded our loadtest environment to a snapshot build of 3.11.1. Having it running for > 48 hrs now, we still see a steady increase on heap utilization. Eclipse memory analyzer shows 14

RE: Alter table gc_grace_seconds

2017-10-02 Thread Steinmaurer, Thomas

Hello Justin, yes, but in real-world, hard to accomplish for high volume column families >= 3-digit GB. Even with the default of 10 days grace period, this may get a real challenge, to complete a full repair. ☺ Possibly back again to the discussion that incremental repair has some flaws, full

RE: Node failure

2017-10-06 Thread Steinmaurer, Thomas

QUORUM should succeed with a RF=3 and 2 of 3 nodes available. Modern client drivers also have ways to “downgrade” the CL of requests, in case they fail. E.g. for the Java driver: http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolic

RE: Got error, removing parent repair session - When doing multiple repair -pr â€” Cassandra 3.x

2017-10-07 Thread Steinmaurer, Thomas

Marshall, -pr should not be used with incremental repairs, which is the default since 2.2. But even when used with full repairs (-full option), this will cause troubles when running nodetool repair -pr from several nodes concurrently. So, unfortunately, this does not seem to work anymore and ma

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas

Hi, although not happening here with Cassandra (due to using CMS), we had some weird problem with our server application e.g. hit by the following JVM/G1 bugs: https://bugs.openjdk.java.net/browse/JDK-8140597 https://bugs.openjdk.java.net/browse/JDK-8141402 (more or less a duplicate of above) h

RE: Cassandra and G1 Garbage collector stop the world event (STW)

2017-10-09 Thread Steinmaurer, Thomas

G1 suggested settings http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3 @Steinmaurer, Thomas If this happens in a very short very frequently and depending on your allocation rate in MB/s, a combination of the G1 bug and a small heap, might result

Not serving read requests while running nodetool repair

2017-10-17 Thread Steinmaurer, Thomas

Hello, due to performance/latency reasons, we are currently reading and writing time series data at consistency level ONE/ANY. In case of a node being down and recovering after the default hinted handoff window of 3 hrs, we may potentially read stale data from the recovering node. Of course,

RE: Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas

e windows ? On 18 October 2017 at 08:04, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, due to performance/latency reasons, we are currently reading and writing time series data at consistency level ONE/ANY. In case of a node being down and recovering after

RE: Not serving read requests while running nodetool repair

2017-10-18 Thread Steinmaurer, Thomas

read requests while running nodetool repair You can accomplish this by manually tweaking the values in the dynamic snitch mbean so other nodes won’t select it for reads -- Jeff Jirsa On Oct 18, 2017, at 3:24 AM, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hi N

cassandra.yaml configuration for large machines (scale up vs. scale out)

2017-11-03 Thread Steinmaurer, Thomas

Hello, I know that Cassandra is built for scale out on commodity hardware, but I wonder if anyone can share some experience when running Cassandra on rather capable machines. Let's say we have a 3 node cluster with 128G RAM, 32 physical cores (16 per CPU socket), Large Raid with Spinning Disks

RE: Stable Cassandra 3.x version for production

2017-11-07 Thread Steinmaurer, Thomas

Latest DSE is based on 3.11 (possibly due to CASSANDRA-12269, but just a guess). For us (only), none of 3.0+/3.11+ qualifies for production to be honest, when you are familiar with having 2.1 in production. · 3.0 needs more hardware resources to handle the same load => https://issues.a

Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-05 Thread Steinmaurer, Thomas

Hello, has anybody already some experience/results if a patched Linux kernel regarding Meltdown/Spectre is affecting performance of Cassandra negatively? In production, all nodes running in AWS with m4.xlarge, we see up to a 50% relative (e.g. AVG CPU from 40% => 60%) CPU increase since Jan 4,

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-09 Thread Steinmaurer, Thomas

not production though), thus more or less double patched now. Additional CPU impact by OS/VM level kernel patching is more or less negligible, so looks highly Hypervisor related. Regards, Thomas From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] Sent: Freitag, 05. Jänner 2018

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-10 Thread Steinmaurer, Thomas

AM, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Quick follow up. Others in AWS reporting/seeing something similar, e.g.: https://twitter.com/BenBromhead/status/950245250504601600 So, while we have seen an relative CPU increase of ~ 50% since Jan 4, 2018, we no

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-13 Thread Steinmaurer, Thomas

On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: m4.xlarge do have PCID to my knowledge, but possibly we need a rather new kernel 4.14. But I fail to see how this could help anyway, cause this looks highly Amazon Hypervisor patch relat

Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas

Hello, we are running 2.1.18 with vnodes in production and due to (https://issues.apache.org/jira/browse/CASSANDRA-11155) we can't run cleanup e.g. after extending the cluster without blocking our hourly snapshots. What options do we have to get rid of partitions a node does not own anymore? *

RE: Meltdown/Spectre Linux patch - Performance impact on Cassandra?

2018-01-14 Thread Steinmaurer, Thomas

th, we (TLP) just posted some results comparing pre and post meltdown statistics: http://thelastpickle.com/blog/2018/01/10/meltdown-impact-on-latency.html On Jan 10, 2018, at 1:57 AM, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: m4.xlarge do have PCID to my kn

RE: Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas

ay to reproduce? On 14 January 2018 at 16:12, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, we are running 2.1.18 with vnodes in production and due to (https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run cleanup e.g. after extending the clus

Cassandra 3.11 fails to start with JDK8u162

2018-01-17 Thread Steinmaurer, Thomas

Hello, after switching from JDK8u152 to JDK8u162, Cassandra fails with the following stack trace upon startup. ERROR [main] 2018-01-18 07:33:18,804 CassandraDaemon.java:706 - Exception encountered during startup java.lang.AbstractMethodError: org.apache.cassandra.utils.JMXServerUtils$Exporter.

RE: Cassandra 3.11 fails to start with JDK8u162

2018-01-18 Thread Steinmaurer, Thomas

wngrade back to 152 then ! On 18 January 2018 at 08:34, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Hello, after switching from JDK8u152 to JDK8u162, Cassandra fails with the following stack trace upon startup. ERROR [main] 2018-01-18 07:33:18,804 CassandraDaemon

RE: Cassandra 3.11 fails to start with JDK8u162

2018-01-18 Thread Steinmaurer, Thomas

o into 3.0? On Thu, Jan 18, 2018 at 2:32 AM, Steinmaurer, Thomas mailto:thomas.steinmau...@dynatrace.com>> wrote: Sam, thanks for the confirmation. Going back to u152 then. Thomas From: li...@beobal.com<mailto:li...@beobal.com> [mailto:li...@beobal.com<mailto:li...@beobal.com>]

Cassandra 3.11 - nodetool cleanup - Compaction interrupted ...

2018-01-22 Thread Steinmaurer, Thomas

Hello, when triggering a "nodetool cleanup" with Cassandra 3.11, the nodetool call almost returns instantly and I see the following INFO log. INFO [CompactionExecutor:54] 2018-01-22 12:59:53,903 CompactionManager.java:1777 - Compaction interrupted: Compaction@fc9b0073-1008-3a07-aeb9-baf6f3cd0

RE: Cleanup blocking snapshots - Options?

2018-01-30 Thread Steinmaurer, Thomas

it wasn't worth fixing. If you are triggering it easily maybe it is worth fixing in 2.1 as well. Does this happen consistently? Can you provide some more logs on the JIRA or better yet a way to reproduce? On 14 January 2018 at 16:12, Steinmaurer, Thomas mailto:thomas.steinmau...@dynat

RE: Old tombstones not being cleaned up

2018-02-01 Thread Steinmaurer, Thomas

Did you started with a 9 node cluster from the beginning or did you extend / scale out your cluster (with vnodes) beyond the replication factor? If later applies and if you are deleting by explicit deletes and not via TTL, then nodes might not see the deletes anymore, as a node might not own the

RE: Old tombstones not being cleaned up

2018-02-01 Thread Steinmaurer, Thomas

of 3, then added another 3 nodes and again another 3 nodes. So it is a good guess :) But I have run both repair and cleanup against the table on all nodes, would that not have removed any stray partitions? tor. 1. feb. 2018 kl. 22.31 skrev Steinmaurer, Thomas mailto:thomas.steinmau

RE: if the heap size exceeds 32GB..

2018-02-12 Thread Steinmaurer, Thomas

Stick with 31G in your case. Another article on compressed Oops: https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/ Thomas From: Eunsu Kim [mailto:eunsu.bil...@gmail.com] Sent: Dienstag, 13. Februar 2018 08:09 To: user@cassandra.apache.org Subject: if the heap s

Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-01 Thread Steinmaurer, Thomas

Hello, Production, 9 node cluster with Cassandra 2.1.18, vnodes, default 256 tokens, RF=3, compaction throttling = 16, concurrent compactors = 4, running in AWS using m4.xlarge at ~ 35% CPU AVG We have a nightly cronjob starting a "nodetool repair -pr ks cf1 cf2" concurrently on all nodes, whe

RE: Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-06 Thread Steinmaurer, Thomas

Hi Kurt, our provisioning layer allows extending a cluster only one-by-one, thus we didn’t add multiple nodes at the same time. What we did have was some sort of overlapping between our daily repair cronjob and the newly added node still in progress joining. Don’t know if this sort of combinat

Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas

Hello, after moving from 2.1.18 to 3.0.18, we are facing OOM situations after several hours a node has successfully joined a cluster (via auto-bootstrap). I have created the following ticket trying to describe the situation, including hprof / MAT screens: https://issues.apache.org/jira/browse/C

RE: Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas

arity. Per-second metrics might show CPU cores getting pegged. I’m not sure that GC tuning eliminates this problem, but if it isn’t being caused by that, GC tuning may at least improve the visibility of the underlying problem. From: "Steinmaurer, Thomas" mailto:thomas.steinmau..

Cassandra 3.0.20 release ETA?

2019-11-13 Thread Steinmaurer, Thomas

Hello, sorry, I know, 3.0.19 has been released just recently. Any ETA for 3.0.20? Reason is that we are having quite some pain with on-heap pressure after moving from 2.1.18 to 3.0.18. https://issues.apache.org/jira/browse/CASSANDRA-15400 Thanks a lot, Thomas The contents of this e-mail are int

Cassandra 3.0.18 showing ~ 10x higher on-heap allocations for processing batch messages compared to 2.1.18

2019-11-15 Thread Steinmaurer, Thomas

Hello, looks like 3.0.18 can't handle the same write ingest compared to 2.1.18 on the same hardware. Basically it looks like the write path, processing batch messages show 10x higher numbers in regard to on-heap allocations. I've tried to summarize the finding on the following ticket: https://

Cassandra 3.0.19 and 3.11.5 cannot start on Windows

2020-01-10 Thread Steinmaurer, Thomas

Hello, https://issues.apache.org/jira/browse/CASSANDRA-15426. According to the ticket, changes in https://issues.apache.org/jira/browse/CASSANDRA-15053 likely being the root cause. Will this be fixed in 3.0.20 and 3.11.6? Thanks, Thomas The contents of this e-mail are intended for the named ad

RE: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-30 Thread Steinmaurer, Thomas

If possible, prefer m5 over m4, cause they are running on a newer hypervisor (KVM-based), single core performance is ~ 10% better compared to m4 with m5 even being slightly cheaper than m4. Thomas From: Erick Ramirez Sent: Donnerstag, 30. Jänner 2020 03:00 To: user@cassandra.apache.org Subject

Re: GC pauses way up after single node Cassandra 2.2 -> 3.11 binary upgrade

2020-10-28 Thread Steinmaurer, Thomas

Leon, we had an awful performance/throughput experience with 3.x coming from 2.1. 3.11 is simply a memory hog, if you are using batch statements on the client side. If so, you are likely affected by https://issues.apache.org/jira/browse/CASSANDRA-16201 Regards, Thomas

Re: Java21 support for Cassandra

2024-06-17 Thread Steinmaurer, Thomas via user

2c from someone, who oversees a bunch (4-digit 😊) of Cassandra JVMs out there, where especially on-prem customers, in domains with partly very rigid requirements in terms of not running a single piece of software in their infrastructure, which already reached EOL, not only including Cassandra bu

ZSTD support for over-the-wire compression?

2025-03-27 Thread Steinmaurer, Thomas via user

Hello, any thoughts / reasons why ZSTD support for over-the-wire compression has been left out despite being a compression option for SSTable on-disk compression? Perhaps intentional, a technical reason behind that? Created https://issues.apache.org/jira/browse/CASSANDRA-20488, with the main m

95 matches

Mail list logo