Re: Re: How to gracefully decommission a highly loaded node?

2018-12-17 Thread Riccardo Ferrari
I am having "the same" issue. One of my nodes seems to have some hardware struggle, out of 6 nodes (same instance size) this one is likely to be makred down, it consntantly compacting, high system load, it's just a big pain. My idea was to add nodes and decommission all the one running on old hard

Re: AWS r5.xlarge vs i3.xlarge

2018-12-10 Thread Riccardo Ferrari
f spinning disks. Both i3 and r5d are EBS optimized Regards, On Mon, Dec 10, 2018 at 2:38 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Mon, Dec 10, 2018 at 12:20 PM Riccardo Ferrari > wrote: > >> I am wondering what instance type is best for a sm

AWS r5.xlarge vs i3.xlarge

2018-12-10 Thread Riccardo Ferrari
Hi list! I am wondering what instance type is best for a small cassandra cluster on AWS. Actually I'd like to compare, or have your opinion about the following instances: - r5*d*.xlarge (4vCPU, *19*ecu, 32GB ram and 1 NVMe instance store 150GB) - Need to attach a 600/900GB ESB - i3

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-06 Thread Riccardo Ferrari
18 at 11:14 AM Riccardo Ferrari > wrote: > >> >> I had few instances in the past that were showing that unresponsivveness >> behaviour. Back then I saw with iotop/htop/dstat ... the system was stuck >> on a single thread processing (full throttle) for seconds. According to

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-06 Thread Riccardo Ferrari
leksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Wed, 5 Dec 2018, 19:34 Riccardo Ferrari >> Hi Alex, >> >> I saw that behaviout in the past. >> > > Riccardo, > > Thank you for the reply! > > Do you refer to kswapd issue only or have y

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Riccardo Ferrari
Hi Alex, I saw that behaviout in the past. I can tell you the kswapd0 usage is connected to the `disk_access_mode` property. On 64bit systems defaults to mmap. That also explains why your virtual memory is so high (it somehow matches the node load, right?). I can not find and good reference howeve

Re: Correct repair job

2018-11-21 Thread Riccardo Ferrari
Hi, Given the number of nodes, I would consider deploying a tool like cassandra reaper . Best, On Wed, Nov 21, 2018 at 6:28 AM Pranab Bordoloi wrote: > Hi, > Yes, it needs to run on every node. This may help - > https://www.datastax.com/dev/blog/repair-in-cassandra

Re: Upgraded to 3.0.17, stop here or move forward?

2018-10-10 Thread Riccardo Ferrari
ll new release > - review config updates (patch existing config) > - start Cassandra > - *upgradesstables* > > *Not to forget*: Perform upgrade on one node at a time. > > Regards, > > Anup Shirolkar > > Instaclustr <https://www.instaclustr.com/> > > > On

Upgraded to 3.0.17, stop here or move forward?

2018-10-09 Thread Riccardo Ferrari
Hi list, We recently upgraded our small cluster to the latest 3.0.17. Everything was nice and smooth, however I am wondering if ti make sense to keep moving forward and upgarde to the latest 3.11.3? We really need something like the GROUP_BY and UFF/UDA seems limited wrt our use-case. Does it ma

Re: About UDF/UDA

2018-09-26 Thread Riccardo Ferrari
UNC accumulatorFunction > STYPE stateType > [FINALFUNC finalFunction] > INITCOND initCond; > > > The final return type will be the return type of the FINALFUNC and not > necessarily the stateType > > More details by reading my blog post on it: > http://www.doanduyhai.com/

About UDF/UDA

2018-09-26 Thread Riccardo Ferrari
Hi users! Given my Cassandra version 3.0.x I don't have the famous GROUP BY operator available. So looking around I turned to UDAs. I'm aware all/most of the magic happens on the coordinator and the plan is to keep the data volume low to avoid too much pressure. Q1: How much is low volume. It's

Re: jmxterm "#NullPointerException: No such PID "

2018-09-18 Thread Riccardo Ferrari
Hi Philip, I've used jmxterm myself without any problems particular problems. On my systems too, I don't get the cassandra daemon listed when issuing the `jvms` command but I never spent much time investigating it. Assuming you have not changed anything relevant in the cassandra-env.sh you can con

Re: Read timeouts when performing rolling restart

2018-09-18 Thread Riccardo Ferrari
e things, but having > an interactive view of the pending requests might lead you to the root > cause of the issue. > > C*heers, > --- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http:/

Re: Read timeouts when performing rolling restart

2018-09-13 Thread Riccardo Ferrari
uts (probably due to GC and hints). > > Hope this helps! > > On Thu, Sep 13, 2018 at 2:20 AM Riccardo Ferrari > wrote: > >> A little update on the progress. >> >> First: >> Thank you Thomas. I checked the code in the patch and briefly skimmed >> through

Re: Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
g ready due to the following issue: >> >> https://issues.apache.org/jira/browse/CASSANDRA-8236 >> >> >> >> which looks similar, but above was marked as fixed in 2.2. >> >> >> >> Thomas >> >> >> >> *From:* Riccardo Fe

Re: Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
Hi Alain, Thank you for chiming in! I was thinking to perform the 'start_native_transport=false' test as well and indeed the issue is not showing up. Starting the/a node with native transport disabled and letting it cool down lead to no timeout exceptions no dropped messages, simply a crystal cle

Read timeouts when performing rolling restart

2018-09-12 Thread Riccardo Ferrari
Hi list, We are seeing the following behaviour when performing a rolling restart: On the node I need to restart: * I run the 'nodetool drain' * Then 'service cassandra restart' so far so good. The load incerase on the other 5 nodes is negligible. The node is generally out of service just for th

Re: How to downloading Cassandra 3.11.0 and 3.11.2 binaries for ubuntu

2018-08-04 Thread Riccardo Ferrari
You should be able to do: apt-get install cassandra=3.11.2, the same applies to cassandra-tools Have a look here: https://askubuntu.com/a/92021 Also, I find useful apt-cache madison to list all the avilable versions HTH On Sat, Aug 4, 2018 at 3:56 PM, R1 J1 wrote: > What are the steps to dow

Re: how to make cassandra listen not on 127.0.0.1 on 9042

2018-07-20 Thread Riccardo Ferrari
Hi, Have a look at the rcp_address description http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html?highlight=rpc_address#rpc-address. what does your hostname resolves to? Best, On Fri, Jul 20, 2018 at 3:09 PM, Vitaliy Semochkin wrote: > Hi > > I'm building a cluster

Re: concurrent_compactors via JMX

2018-07-19 Thread Riccardo Ferrari
ely while >> I did not even play much with G1GC. >> >> This (or similar settings) worked for distinct cases having heavy read >> patterns. In the mailing list I explained recently to someone else my >> understanding of JVM and GC, also there is a blog post my coll

Re: concurrent_compactors via JMX

2018-07-18 Thread Riccardo Ferrari
Chris, Thank you for mbean reference. On Wed, Jul 18, 2018 at 6:26 PM, Riccardo Ferrari wrote: > Alain, thank you for email. I really really appreciate it! > > I am actually trying to remove the disk io from the suspect list, thus I'm > want to reduce the number of concurrent

Re: concurrent_compactors via JMX

2018-07-18 Thread Riccardo Ferrari
ECT ...') > > You can also analyze the *Garbage Collection* activity. As Cassandre uses > the JVM, a badly tuned GC will induce long pauses. Depending on the > workload, and I must say for most of the cluster I work on, default the > tuning is not that good and can keep

concurrent_compactors via JMX

2018-07-17 Thread Riccardo Ferrari
Hi list, Cassandra 3.0.6 I'd like to test the change of concurrent compactors to see if it helps when the system is under stress. Can someone point me to the right mbean? I can not really find good docs about mbeans (or tools ...) Any suggestion much appreciated, best

Added a new node, now what repair is best?

2018-07-01 Thread Riccardo Ferrari
Hi list, After long time of operation we come to the need of growing our cluster. This cluster was born on 2.X and almos 2 years ago migrated to 3.0.6 ( I know we are bit prudent ) The cluster was a 3 m1.xlarge (we are on AWS) and table RF was 3 Thanks to your valuable hints we added a new node

Re: Cassandra 3.0.X migarte to VPC

2018-06-08 Thread Riccardo Ferrari
Thank you guys! Much appreciated. Leaving the snitch aside for a moment, we can fix this either before or after the migration. I understand I should prefer adding a new DC rather than extending/shrinking my current (and only) DC. Correct? Thanks, On Fri, Jun 8, 2018 at 2:22 AM, kurt greaves wr

Cassandra 3.0.X migarte to VPC

2018-06-07 Thread Riccardo Ferrari
Dear list, We have a small cluster on AWS EC2-Classic and we are planning to move it to a VPC. I know this has been discussed few times already including here: https://mail-archives.apache.org/mod_mbox/cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.co

Re: 3.0.6 - CorruptSSTableException

2017-11-07 Thread Riccardo Ferrari
, Nov 7, 2017 at 6:54 PM, wrote: > Hi Riccardo, > > > > The following may help me, as the case described there is similar to yours > ! > > https://engineering.gosquared.com/dealing-corrupt-sstable-cassandra > > > > Regards. > > > > *De :* Riccar

3.0.6 - CorruptSSTableException

2017-11-06 Thread Riccardo Ferrari
Hi list, It happened that one of our EC2 instance of our cluster got rebooted. Unfortunately when back Cassandra 3.0.6 failed to restart complaining about: ERROR [NonPeriodicTasks:1] 2017-11-04 03:44:20,019 LogTransaction.java:204 - Unable to delete //system/local/ma-292-big-Data.db as it does no

Re: Upgrade from 3.0.6, where's the documentation?

2017-06-15 Thread Riccardo Ferrari
Jeff, Thank you so much for your answer. If you say there are 2 very important fixes in next release I believe we can wait couple of weeks. Thanks! On Fri, Jun 16, 2017 at 12:35 AM, Jeff Jirsa wrote: > > > On 2017-06-14 07:05 (-0700), Riccardo Ferrari wrote: > > Hi list, >

Upgrade from 3.0.6, where's the documentation?

2017-06-14 Thread Riccardo Ferrari
Hi list, It's been a while since I upgraded my C* to 3.0.6, nevertheless I would like to give TWCS a try (avaialble since 3.0.7). What happened to the upgrade documentation ? I was used to read some step-by-step procedure from datastax but looks like they are not supporting it anymore, on the fli

Re: Failure when setting up cassandra in cluster

2016-08-22 Thread Riccardo Ferrari
Hi that's very likely because of: > > empty the listen_address entry and # Leaving it blank leaves it up to InetAddress.getLocalHost(). This # will always do the Right Thing _if_ the node is properly configured # (hostname, name resolution, etc), and the Right Thing is to use the # address asso

JVM Crash on 3.0.6

2016-08-11 Thread Riccardo Ferrari
Hi C* users, In recent time I had couple of my nodes crashing (on different dates). I don't have core dumps however my JVM crash logs goes like this: === # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f8f

Re: Verify cassandra backup and restore in C * 2.1

2016-08-09 Thread Riccardo Ferrari
Hi Indranil, I think it really depends on what makes a backup "correct" for you. Do you have some test you can run on that data? When I want to test my data I usually restore it in a new cluster (ie. on AWS) and use Spark to perform some cross-tests. This is a bit cumbersome nevertheless does the

Re: Gossip Threshold

2016-07-25 Thread Riccardo Ferrari
Hi Jean, I think this is a good resource: https://www.youtube.com/watch?v=FuP1Fvrv6ZQ Best, On Mon, Jul 25, 2016 at 2:45 PM, jean paul wrote: > As i find in cassandra Documentation, the gossip process runs every second. > Please, why you have chosen 'running it *every second'* ? > > Thanks a

Re: (C)* stable version after 3.5

2016-07-18 Thread Riccardo Ferrari
Check the "Compatibility" section of the Cassandra Java driver. Since the driver is backward compatible when we did upgraded we first upgrade our applications to the latest java driver version then we upgraded our C* cluster. best, On Mon, Jul 18, 2016 at 9:06 AM, Varun Barala wrote: > Thanks

Re: Is my cluster normal?

2016-07-12 Thread Riccardo Ferrari
0 >>>> >>>> MemtablePostFlush 0 0 1973 0 >>>> 0 >>>> >>>> ValidationExecutor0 0 0 0 >>>> 0 >>>&g

Re: NoHostAvailableException coming up on our server

2016-07-12 Thread Riccardo Ferrari
What driver version are you using? You can look at the LoggingRetryPolicy to have more meaningful messages in your logs. best, On Tue, Jul 12, 2016 at 9:02 PM, Abhinav Solan wrote: > Thanks, Johnny > Actually, they were running .. it went through a series of read and writes > .. and recovered

Re: (C)* stable version after 3.5

2016-07-12 Thread Riccardo Ferrari
You may want to read more about Cassandra release process, find: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ On Tue, Jul 12, 2016 at 4:01 PM, Alain RODRIGUEZ wrote: > Hi, > > The only "fix" release after 3.5 is 3.7. Yet hard to say if it is more > stable, we can hope so. >

Re: DTCS SSTable count issue

2016-07-11 Thread Riccardo Ferrari
le.com > > 2016-07-07 19:25 GMT+02:00 Jeff Jirsa : > >> 48 sstables isn’t unreasonable in a DTCS table. It will continue to grow >> over time, but ideally data will expire as it nears your 90 day TTL and >> those tables should start dropping away as they age. >>

Re: Problems with cassandra on AWS

2016-07-11 Thread Riccardo Ferrari
I would check your security group settings, you need to allow communication on cassandra ports (ie 9042,...) On Mon, Jul 11, 2016 at 8:17 AM, daemeon reiydelle wrote: > xWell, I seem to recall that the private IP's are valid for communications > WITHIN one VPC. I assume you can log into one mach

Re: Is my cluster normal?

2016-07-07 Thread Riccardo Ferrari
Hi Yuan, You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside from any Cassandra specific discussion a system load of 10 on a 4 threads machine is way too much in my opinion. If that is the running average system load I would look deeper into system details. Is that IO wait? Is

DTCS SSTable count issue

2016-07-07 Thread Riccardo Ferrari
Hi everyone, This is my first question, apologize may I do something wrong. I have a small Cassandra cluster build upon 3 nodes. Originally born as 2.0.X cluster was upgraded to 2.0.15 then 2.1.13 and finally to 3.0.4 recently 3.0.6. Ubuntu is the OS. There are few tables that have DateTieredCom