cassandra spark-connector-sqlcontext too many tasks

2018-03-17 Thread onmstester onmstester
I'm querying a single cassandra partition using sqlContext and Its temView which creates more than 2000 tasks on spark and took about 360 seconds: sqlContext.read().format("org.apache.spark.sql.cassandra).options(ops).load.createOrReplaceTempView("tableName") But using javaFunctions(sc).ca

Cassandra client tuning

2018-03-18 Thread onmstester onmstester
I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs: maxConnectionsPerHost = 5 maxRequestsPerHost = 32K maxAsyncQueue at client side = 100K I could achieve 25% of throughtput i needed, client CPU is more than 80% and inc

Re: Cassandra client tuning

2018-03-18 Thread onmstester onmstester
2018 at 19:23 onmstester onmstester <onmstes...@zoho.com> wrote: -- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attach

Re: Cassandra client tuning

2018-03-18 Thread onmstester onmstester
client with batching and your Cassandra cluster doesn’t sound terribly stressed then there is room to increase threads on the client to up throughput (unless your bottlenecked on IO or something)? On Sun, 18 Mar 2018 at 20:27 onmstester onmstester <onmstes...@zoho.com> wrote: -- Ben Sla

write latency on single partition table

2018-04-06 Thread onmstester onmstester
I've defained a table like this create table test ( hours int, key1 int, value1 varchar, primary key (hours,key1) ) For one hour every input would be written in single partition, because i need to group by some 500K records in the partition for a report with expected response time in

Re: write latency on single partition table

2018-04-06 Thread onmstester onmstester
Apr 7, 2018, 9:45 AM onmstester onmstester <onmstes...@zoho.com> wrote: I've defained a table like this create table test ( hours int, key1 int, value1 varchar, primary key (hours,key1) ) For one hour every input would be written in single partition, because i need to g

copy from one table to another

2018-04-08 Thread onmstester onmstester
Is there any way to copy some part of a table to another table in cassandra? A large amount of data should be copied so i don't want to fetch data to client and stream it back to cassandra using cql. Sent using Zoho Mail

Re: copy from one table to another

2018-04-08 Thread onmstester onmstester
move to target tables directory and rename it. Restart target node or run nodetool refresh Sent from my iPhone On Apr 8, 2018, at 4:15 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Is there any way to copy some part of a table to another table in cassandra? A large

Re: Can I sort it as a result of group by?

2018-04-10 Thread onmstester onmstester
I'm using apache spark on top of cassandra for such cases Sent using Zoho Mail On Mon, 09 Apr 2018 18:00:33 +0430 DuyHai Doan wrote No, sorting by column other than clustering column is not possible On Mon, Apr 9, 2018 at 11:42 AM, Eunsu Kim

A Cassandra Storage Estimation Mechanism

2018-04-17 Thread onmstester onmstester
I was going to estimate Hardware requirements for a project which mainly uses Apache Cassandra. Because of rule "Cassandra nodes size better be < 2 TB", the total disk usage determines number of nodes, and in most cases the result of this calculation would be so OK for satisfying the required

Insert-only application repair

2018-05-12 Thread onmstester onmstester
In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the nodes of production cluster (according to this: http://cassandra.apache.org/doc/latest/operating/repair.html )? nodetool repair -pr --full When none of the nodes was down in 4 months (ever sin

Re: Insert-only application repair

2018-05-12 Thread onmstester onmstester
e RF>CL then Repair needs to be run to make sure data is in sync. Sent from my iPhone On May 12, 2018, at 3:54 AM, onmstester onmstester <onmstes...@zoho.com> wrote: In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the

Solve Busy pool at Cassandra side

2018-05-13 Thread onmstester onmstester
Hi, I'm getting "Pool is Busy (limit is 256)", while connecting to a single node cassandra cluster. The whole client side application is a 3rd-party lib which i can't change it's source and its session builder is not using any PoolingOptions. Is there any config on cassandra side that could h

Re: Interesting Results - Cassandra Benchmarks over Time Series Data for IoT Use Case I

2018-05-18 Thread onmstester onmstester
I recommend you to review newts data model, which is a time-series data model upon cassandra: https://github.com/OpenNMS/newts/wiki/DataModel Sent using Zoho Mail First the use-case: We have time-series of data from devices on several sites, where each device (with a unique dev_id) c

Reading from big partitions

2018-05-19 Thread onmstester onmstester
Hi, Due to some unpredictable behavior in input data i end up with some hundred partitions having more than 300MB size. Reading any sequence of data from these partitions took about 5 seconds while reading from other partitions (with less than 50MB sizes) took less than 10ms. Since i can't ch

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
Data spread between a SSD disk and a 15K disk. the table has 26 tables totally. I haven't try tracing, but i will and inform you! Sent using Zoho Mail On Sun, 20 May 2018 08:26:33 +0430 Jonathan Haddad wrote What disks are you using? How many sstables a

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time, it even got worse. Even increasing Key cache size and Row cache size did not help. Sent using Zoho Mail On Sun, 20 May 2018 08:52:03 +0430 Jeff Jirsa wrote Column in

Re: Reading from big partitions

2018-05-20 Thread onmstester onmstester
Should i run compaction after changing column_index_size_in_kb? Sent using Zoho Mail On Sun, 20 May 2018 15:06:57 +0430 onmstester onmstester <onmstes...@zoho.com> wrote I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time,

IN clause of prepared statement

2018-05-20 Thread onmstester onmstester
The table is something like Samples ... partition key (partition,resource,(timestamp,metric_name) creating prepared statement : session.prepare("select * from samples where partition=:partition and resource=:resource and timestamp>=:start and timestamp<=:end and metric_name in :metric_n

RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread onmstester onmstester
a good practice, you shouldn’t do select * (as a production query) against any database. You want to list the columns you actually want to select. That way a later “alter table add column” (or similar) doesn’t cause unpredictable results to the application. Sean Durity From: onmstester onms

RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread onmstester onmstester
It seems that there is no way doing this using Cassandra and even something like spark won't help because i'm going to read from a big Cassandra partition (bottleneck is reading from Cassandra) Sent using Zoho Mail On Tue, 22 May 2018 09:08:55 +0430 onmstester onmstester

cassandra concurrent read performance problem

2018-05-26 Thread onmstester onmstester
By reading 90 partitions concurrently(each having size > 200 MB), My single node Apache Cassandra became unresponsive, no read and write works for almost 10 minutes. I'm using this configs: memtable_allocation_type: offheap_buffers gc: G1GC heap: 128GB concurrent_reads: 128 (having more tha

cassandra update vs insert + delete

2018-05-27 Thread onmstester onmstester
Hi I want to load all rows from many partitions and change a column value in each row, which of following ways is better concerning disk space and performance? 1. create a update statement for every row and batch update for each partitions 2. create an insert statement for every row and batch

Fwd: Re: cassandra update vs insert + delete

2018-05-27 Thread onmstester onmstester
ally something to consider. Better to add / update or insert data and do a soft delete on old data and apply a TTL to remove it at a future time. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 27, 2018, 5:36 AM -0400, onmstester onmstester <onmstes...@zoho.com>, w

Fwd: 答复: Re: cassandra update vs insert + delete

2018-05-27 Thread onmstester onmstester
Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei Mob: +86 13797007811|Tel: + 86 27 5024 2516 发件人: onmstester onmstester <onmstes...@zoho.com> 发送时间: 2018年5月28日 14:33 收件人: user <user@cassandra.apache.org> 主题: Fwd: Re: cassandra update vs insert + delete How upda

how to immediately delete tombstones

2018-05-31 Thread onmstester onmstester
Hi, I've deleted 50% of my data row by row now disk usage of cassandra data is more than 80%. The gc_grace of table was default (10 days), now i set that to 0, although many compactions finished but no space reclaimed so far. How could i force deletion of tombstones in sstables and reclaim th

Re: how to immediately delete tombstones

2018-06-01 Thread onmstester onmstester
Thanks for your replies But my current situation is that i do not have enough free disk for my biggest sstable, so i could not run major compaction or nodetool garbagecollect Sent using Zoho Mail On Thu, 31 May 2018 22:32:32 +0430 Alain RODRIGUEZ wrote

data consistency without using nodetool repair

2018-06-09 Thread onmstester onmstester
I'm using RF=2 (i know it should be at least 3 but i'm short of resources) and WCl=ONE and RCL=ONE in a cluster of 10 nodes in a insert-only scenario. The problem: i dont want to use nodetool repair because it would put hige load on my cluster for a long time, but also i need data consistency

Re: data consistency without using nodetool repair

2018-06-09 Thread onmstester onmstester
at 10:28 PM, onmstester onmstester <onmstes...@zoho.com> wrote: I'm using RF=2 (i know it should be at least 3 but i'm short of resources) and WCl=ONE and RCL=ONE in a cluster of 10 nodes in a insert-only scenario. The problem: i dont want to use nodetool repair because it w

saving distinct data in cassandra result in many tombstones

2018-06-12 Thread onmstester onmstester
Hi, I needed to save a distinct value for a key in each hour, the problem with saving everything and computing distincts in memory is that there are too many repeated data. Table schema: Table distinct( hourNumber int, key text, distinctValue long primary key (hourNumber) ) I want t

Write performance degradation

2018-06-17 Thread onmstester onmstester
Hi, I was doing 500K inserts + 100K counter update in seconds on my cluster of 12 nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements with no problem. I saw a lot of warning show that most of batches not concerning a single node, so they should not be in a batch, on the other h

Re: saving distinct data in cassandra result in many tombstones

2018-06-18 Thread onmstester onmstester
ing deleted data if the table hasn't been repaired during the grace interval. You can also just increase the tombstone thresholds, but the queries will be pretty expensive/wasteful. On Tue, Jun 12, 2018 at 2:02 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Hi,

Re: Write performance degradation

2018-06-18 Thread onmstester onmstester
16:24:48 +0430 DuyHai Doan <doanduy...@gmail.com> wrote Maybe the disk I/O cannot keep up with the high mutation rate ? Check the number of pending compactions On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Hi, I wa

Re: saving distinct data in cassandra result in many tombstones

2018-06-18 Thread onmstester onmstester
On Tue, 19 Jun 2018 08:16:28 +0430 onmstester onmstester <onmstes...@zoho.com> wrote Can i set gc_grace_seconds to 0 in this case? because reappearing deleted data has no impact on my Business Logic, i'm just either creating a new row or replacing the exactly same row.

copy sstables while cassandra is running

2018-06-23 Thread onmstester onmstester
Hi I'm using two directories on different disks as cassandra data storage, the small disk is 90% full and the bigger diskis 30% full (the bigger one was added later that we find out we need more storage!!), so i want to move all data to the big disk, one way is to stop my application and copy al

adding a non-used column just to debug ttl

2018-07-07 Thread onmstester onmstester
Hi, Because of "Cannot use selection function ttl on PRIMARY KEY part type", i'm adding a boolean column to table with no non-primary key columns, i'm just worried about someday i would need debugging on ttl! is this a right approach? anyone else is doing this? Sent using Zoho Mail

Compaction out of memory

2018-07-12 Thread onmstester onmstester
Cassandra crashed in Two out of 10 nodes in my cluster within 1 day, the error is: ERROR [CompactionExecutor:3389] 2018-07-10 11:27:58,857 CassandraDaemon.java:228 - Exception in thread Thread[CompactionExecutor:3389,1,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed  

changing ip address of all nodes in cluster

2018-07-15 Thread onmstester onmstester
I need to assign a new ip range to my cluster, What's the procedure? Thanks in advance Sent using Zoho Mail

Fwd: changing ip address of all nodes in cluster

2018-07-15 Thread onmstester onmstester
I tested the single node scenario on all nodes iteratively and it worked: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsChangeIp.html Sent using Zoho Mail Forwarded message From : onmstester onmstester To : "user" Date : S

New cluster vs Increasing nodes to already existed cluster

2018-07-16 Thread onmstester onmstester
Currently i have a cluster with 10 nodes dedicated to one keyspace (Hardware sizing been done according to input rate and ttl just for current application requirements). I need a launch a new application with new keyspace with another set of servers (8 nodes), there is no relation between the cu

Re: Cassandra node RAM amount vs data-per-node/total data?

2018-07-17 Thread onmstester onmstester
I actually never set Xmx > 32 GB, for any java application, unless it necessarily need more. Just because of the fact: "once you exceed this 32 GiB border JVM will stop using compressed object pointers, effectively reducing the available memory. That means increasing your JVM heap above 32 GiB y

how to fix too many native-transport-blocked?

2018-07-18 Thread onmstester onmstester
Hi , On a cluster with 10 nodes, Out of 20K/seconds Native-Transports, 200/seconds of them blocked. They are mostly small single writes. Also I'm expriencing random read delays, which i suspect the filled native queue. On all nodes, cpu usage is less than 20 percent, and there is no problem in m

Cassandra crashed with no log

2018-07-21 Thread onmstester onmstester
Cassandra in one of my nodes, crashed without any error/warning in system/gc/debug log. All jmx metrics is being monitored, last fetched values for heap usage is 50% and for cpu usage is 20%. How can i find the cause of crash? Sent using Zoho Mail

JMX metric to report number failed WCL ALL

2018-07-22 Thread onmstester onmstester
I'm using RF=2 and Write consistency = ONE, is there a counter in cassandra jmx to report number of writes that only acknowledged by one node (instead of both replica's)?  Although i don't care all replicas acknowledge the write, but i consider this as normal status of cluster. Sent using Zoho M

Fwd: Re: Cassandra crashed with no log

2018-07-22 Thread onmstester onmstester
: Sun, 22 Jul 2018 10:43:38 +0430 Subject : Re: Cassandra crashed with no log Forwarded message Anything in non-Cassandra logs? Dmesg? --  Jeff Jirsa On Jul 21, 2018, at 11:07 PM, onmstester onmstester wrote: Cassandra in one of my nodes, crashed without any error/warning

Re: JMX metric to report number failed WCL ALL

2018-07-23 Thread onmstester onmstester
AM, onmstester onmstester wrote: I'm using RF=2 and Write consistency = ONE, is there a counter in cassandra jmx to report number of writes that only acknowledged by one node (instead of both replica's)?  Although i don't care all replicas acknowledge the write, but i consider

Data model storage optimization

2018-07-28 Thread onmstester onmstester
The current data model described as table name: ((partition_key),cluster_key),other_column1,other_column2,... user_by_name: ((time_bucket, username)),ts,request,email user_by_mail: ((time_bucket, email)),ts,request,username The reason that all 2 keys (username, email) repeated in all tables is

Fwd: Re: Data model storage optimization

2018-07-29 Thread onmstester onmstester
How many rows in average per partition? around 10K. Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ? We are just analyzing output log

full text search on some text columns

2018-07-31 Thread onmstester onmstester
I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation

Re: full text search on some text columns

2018-07-31 Thread onmstester onmstester
Thanks Jordan, There would be millions of rows per day, is SASI capable of standing such a rate? Sent using Zoho Mail On Tue, 31 Jul 2018 19:47:55 +0430 Jordan West wrote On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester wrote: I need to do a full text search (like) on one of

RE: [EXTERNAL] full text search on some text columns

2018-07-31 Thread onmstester onmstester
urity   From: onmstester onmstester Sent: Tuesday, July 31, 2018 10:46 AM To: user Subject: [EXTERNAL] full text search on some text columns   I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so

Fwd: Re: [EXTERNAL] full text search on some text columns

2018-07-31 Thread onmstester onmstester
Subject : Re: [EXTERNAL] full text search on some text columns Forwarded message Maybe this plugin could do the job: https://github.com/Stratio/cassandra-lucene-index On Tue, 31 Jul 2018 at 22:37, onmstester onmstester wrote:

updating old partitions in STCS

2018-08-04 Thread onmstester onmstester
I read in some best practising documents on datam model that: do not update old partitions while using STCS. But i always use cluster keys in my queries and cqlsh-tracing reports that it only accesses sstables with data having specified cluster key (not all sstables containing part of partition)

data loss

2018-08-14 Thread onmstester onmstester
I am inserting to Cassandra by a simple insert query and an update counter query for every input record. input rate is so high. I've configured the update query with idempotent = true (no config for insert query, default is false IMHO) I've seen multiple records having rows in counter table (ide

bigger data density with Cassandra 4.0?

2018-08-25 Thread onmstester onmstester
I've noticed this new feature of 4.0: Streaming optimizations (https://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html) Is this mean that we could have much more data density with Cassandra 4.0 (less problems than 3.X)? I mean > 10 TB of data on each node without worrying

Fwd: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread onmstester onmstester
Thanks Kurt, Actually my cluster has > 10 nodes, so there is a tiny chance to stream a complete SSTable. While logically any Columnar noSql db like Cassandra, needs always to re-sort grouped data for later-fast-reads and having nodes with big amount of data (> 2 TB) would be annoying for this ba

Re: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread onmstester onmstester
C*'s mistakes. On 29 August 2018 at 19:43, onmstester onmstester wrote: Thanks Kurt, Actually my cluster has > 10 nodes, so there is a tiny chance to stream a complete SSTable. While logically any Columnar noSql db like Cassandra, needs always to re-sort grouped data for later-fast-rea

adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread onmstester onmstester
I'm going to add more 6 nodes to my cluster (already has 4 nodesand RF=2) using  GossipingPropertyFileSnitch, and NetworkTopologyStrategy and default num_tokens = 256. It recommended to join nodes one by one, although there is < 200GB on each node, i will do so. In the document mentioned that i

Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread onmstester onmstester
Thanks Alex, So you suggest that i should not worry about this:  Failure to run this command (cleanup) after adding a node causes Cassandra to include the old data to rebalance the load on that node Would you kindly explain a little more? Sent using Zoho Mail It makes a lot of sense to run clean

Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread onmstester onmstester
would be also moved between nodes everytime? Sent using Zoho Mail On Mon, 03 Sep 2018 14:39:37 +0430  onmstester onmstester wrote Thanks Alex, So you suggest that i should not worry about this:  Failure to run this command (cleanup) after adding a node causes Cassandra to include the old

counter mutation not persisted

2018-09-04 Thread onmstester onmstester
My application updates a counter table with rate 50K per seconds in a cluster with 10 nodes. The problem is that counter value is less than what it should be in 20% of cases. dropped counter mutation in jmx is always equal to 0. I'm using batch statements to update counters and executeAsync.  I

Cluster CPU usage limit

2018-09-06 Thread onmstester onmstester
IMHO, Cassandra write is more of a CPU bound task, so while determining cluster write throughput, what CPU usage percent (avg among all cluster nodes) should be determined as limit?   Rephrase: what's the normal CPU usage in Cassandra cluster (while no compaction, streaming or heavy-read running

Fwd: Re: Cluster CPU usage limit

2018-09-07 Thread onmstester onmstester
ut lower is definitely not enough), but every advice I've seen is for a lower write thread count being optimal for most cases. On Thu, Sep 6, 2018 at 5:51 AM, onmstester onmstester wrote: IMHO, Cassandra write is more of a CPU bound task, so while determining cluster write throughput, w

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-07 Thread onmstester onmstester
Why not setting default vnodes count to that recommendation in Cassandra installation files?  Sent using Zoho Mail On Tue, 04 Sep 2018 17:35:54 +0430 Durity, Sean R wrote   Longer term, I agree with Oleksandr, the recommendation for number of vnodes is now much smaller than 256. I am

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread onmstester onmstester
and join the rest of the nodes.  That gives even distribution. On Sat, Sep 8, 2018 at 1:40 AM onmstester onmstester wrote: -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread onmstester onmstester
Thanks Jeff, You mean that with RF=2, num_tokens = 256 and having less than 256 nodes i should not worry about data distribution? Sent using Zoho Mail On Sat, 08 Sep 2018 21:30:28 +0430 Jeff Jirsa wrote Virtual nodes accomplish two primary goals 1) it makes it easier to gradually add

node replacement failed

2018-09-08 Thread onmstester onmstester
Hi, Cluster Spec: 30 nodes RF = 2 NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i lost all disks of cassandar-data on one of my racks, after replacing the disks, tried to replace the nodes with same ip using this: https://blog.alteroot.org/articles/2014-03-12/replace

Re: node replacement failed

2018-09-10 Thread onmstester onmstester
Any idea? Sent using Zoho Mail On Sun, 09 Sep 2018 11:23:17 +0430  onmstester onmstester wrote Hi, Cluster Spec: 30 nodes RF = 2 NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i lost all disks of cassandar-data on one of my racks, after replacing the disks

Re: node replacement failed

2018-09-10 Thread onmstester onmstester
ty nodes available. I am not sure about limitation you might face though and that's why I suggest a second option for you to consider if the first is not actionable. Let us know how it goes, C*heers, ------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The La

Re: node replacement failed

2018-09-14 Thread onmstester onmstester
Thanks, I am still thinking about it, but before going deeper, is this still an issue for you at the moment? Yes, It is.

Scale SASI index

2018-09-17 Thread onmstester onmstester
By adding new nodes to cluster, should i rebuild SASI indexes on all nodes ?

stuck with num_tokens 256

2018-09-22 Thread onmstester onmstester
I noticed that currently there is a discussion in ML with subject: changing default token behavior for 4.0. Any recommendation to guys like me who already have multiple clusters ( > 30 nodes in each cluster) with random partitioner and num_tokens = 256? I should also add some nodes to existing c

Re: stuck with num_tokens 256

2018-09-22 Thread onmstester onmstester
you can decommission the old DC and hopefully end up with a balanced cluster. Definitely test beforehand though because that was just me theorising... I'll note though that if your existing clusters don't have any major issues it's probably not worth the migration at this point. On

Re: stuck with num_tokens 256

2018-09-22 Thread onmstester onmstester
If you have problems with balance you can add new nodes using the algorithm and it'll balance out the cluster. You probably want to stick to 256 tokens though. I read somewhere (don't remember the ref) that all nodes of the cluster should use the same algorithm, so if my cluster suffer from imba

Re: node replacement failed

2018-09-22 Thread onmstester onmstester
problem? On Mon, 10 Sep 2018 17:12:48 +0430 onmstester onmstester wrote Thanks Alain, First here it is more detail about my cluster: 10 racks + 3 nodes on each rack nodetool status: shows 27 nodes UN and 3 nodes all related to single rack as DN version 3.11.2 Option 1: (Change schema and

Re: node replacement failed

2018-09-22 Thread onmstester onmstester
Another question, Is there a management tool to do nodetool cleanup one by one (wait until finish of cleaning up one node then start clean up for the next node in cluster)? On Sat, 22 Sep 2018 16:02:17 +0330 onmstester onmstester wrote I have a cunning plan (Baldrick wise) to solve

High CPU usage on writer application

2018-09-24 Thread onmstester onmstester
Hi,  My app writes 100K rows per seconds to a C* cluster (including 30 nodes and using version 3.11.2). There are 20 threads, each writing 10K (list size in below code is 100K) statements using async API: for (Statement s:list) { ResultSetFuture future = session.executeAsync(s); tasks.ad

how to configure the Token Allocation Algorithm

2018-09-30 Thread onmstester onmstester
Since i failed to find a document on how to configure and use the Token Allocation Algorithm (to replace the random Algorithm), just wanted to be sure about the procedure i've done: 1. Using Apache Cassandra 3.11.2 2. Configured one of seed nodes with num_tokens=8 and started it. 3. Using Cqlsh

Fwd: Re: how to configure the Token Allocation Algorithm

2018-10-01 Thread onmstester onmstester
Thanks Alain, What if instead of running that python and having one node with non-vnode config, i remove the first seed node and re-add it after cluster was fully up ? so the token ranges of first seed node would also be assigned by Allocation Alg Forwarded message From

Fwd: Re: Re: how to configure the Token Allocation Algorithm

2018-10-01 Thread onmstester onmstester
ed message On Mon, Oct 1, 2018 at 12:18 PM onmstester onmstester wrote: What if instead of running that python and having one node with non-vnode config, i remove the first seed node and re-add it after cluster was fully up ? so the token ranges of first seed node would also be a

Re: Re: Re: how to configure the Token Allocation Algorithm

2018-10-01 Thread onmstester onmstester
Sent using Zoho Mail On Mon, 01 Oct 2018 18:36:03 +0330 Alain RODRIGUEZ wrote Hello again :), I thought a little bit more about this question, and I was actually wondering if something like this would work: Imagine 3 node cluster, and create them using: For the 3 nodes: `num_token: 4

High CPU usage on some of the nodes due to message coalesce

2018-10-20 Thread onmstester onmstester
3 nodes in my cluster have 100% cpu usage and most of it is used by  org.apache.cassandra.util.coalesceInternal and SepWorker.run? The most active threads are the messaging-service-incomming. Other nodes are normal, having 30 nodes, using Rack Aware strategy. with 10 rack each having 3 nodes. The

How to validate if network infrastructure is efficient for Cassandra cluster?

2018-10-21 Thread onmstester onmstester
Currently, before launching the production cluster, i run 'iperf -s' on half of the cluster and then run 'iperf -c $nextIP' on the other half using parallel ssh, So simultaneously all cluster's nodes are connecting together (paired) and then examining the result of iperfs, doing the math that if

Fwd: Re: High CPU usage on some of the nodes due to message coalesce

2018-10-21 Thread onmstester onmstester
gs or if the load your application is producing exceeds what your cluster can handle (needs more nodes). Chris On Oct 20, 2018, at 5:18 AM, onmstester onmstester wrote: 3 nodes in my cluster have 100% cpu usage and most of it is used by org.apache.cassandra.util.coalesceInternal and SepWorke

Re: Re: High CPU usage on some of the nodes due to message coalesce

2018-10-21 Thread onmstester onmstester
What takes the most CPU? System or User?  most of it is used by  org.apache.cassandra.util.coalesceInternal and SepWorker.run Did you try removing a problematic node and installing a brand new one (instead of re-adding)? I did not install a new node, but did remove the problematic node and CPU l

Fwd: Re: Re: High CPU usage on some of the nodes due to message coalesce

2018-10-21 Thread onmstester onmstester
Any cron or other scheduler running on those nodes? no Lots of Java processes running simultaneously? no, just Apache Cassandra Heavy repair continuously running? none Lots of pending compactions? none, the cpu goes to 100% on first seconds of insert (write load) so no memtable flushed yet,  Is

Fwd: A quick question on unlogged batch

2018-11-01 Thread onmstester onmstester
Read this: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html Please use batch (any type of batch) for statements that only concerns a single partition, otherwise it cause a lot of performance degradation on your cluster and after a while throughput would be alot less than paral

Fwd: Re: How to set num tokens on live node

2018-11-02 Thread onmstester onmstester
IMHO, the best option with two datacenters is to config replication strategy to stream data from dc with wrong num_token to correct one, and then a repair on each node would move your data to the other dc Sent using Zoho Mail Forwarded message From : Goutham reddy To

Fwd: Re: Re: How to set num tokens on live node

2018-11-02 Thread onmstester onmstester
lenge I am facing. Any comments. Thanks and Regards, Goutham On Fri, Nov 2, 2018 at 1:08 AM onmstester onmstester wrote: -- Regards Goutham Reddy IMHO, the best option with two datacenters is to config replication strategy to stream data from dc with wrong num_token to correct one, and then a r

Fwd: Re: A quick question on unlogged batch

2018-11-02 Thread onmstester onmstester
unlogged batch meaningfully outperforms parallel execution of individual statements, especially at scale, and creates lower memory pressure on both the clients and cluster.  They do outperform parallel individuals, but in cost of higher pressure on coordinators which leads to more blocked Native

Multiple cluster for a single application

2018-11-05 Thread onmstester onmstester
Hi, One of my applications requires to create a cluster with more than 100 nodes, I've read documents recommended to use clusters with less than 50 or 100 nodes (Netflix got hundreds of clusters with less 100 nodes on each). Is it a good idea to use multiple clusters for a single application, ju

Fwd: Re: Multiple cluster for a single application

2018-11-08 Thread onmstester onmstester
Thank you all, Actually, "the documents" i mentioned in my question, was a talk in youtube seen long time ago and could not find it. Also noticing that a lot of companies like Netflix built hundreds of Clusters each having 10s of nodes and saying that its much stable, i just concluded that big c

Fwd: RE : issue while connecting to apache-cassandra-3.11.1 hosted on a remote VM.

2018-11-16 Thread onmstester onmstester
Also set rpc_address to your remote ip address and restart cassandra. Run nodetool status on Cassandra node to be sure that its running properly. The port you should look for and connect to is 9042, 7199 is the JMX port Sent using Zoho Mail Forwarded message From : Gaur

How to gracefully decommission a highly loaded node?

2018-12-04 Thread onmstester onmstester
One node suddenly uses 100% CPU, i suspect hardware problems and do not have time to trace that, so decided to just remove the node from the cluster, but although the node state changed to UL, but no sign of Leaving: the node is still compacting and flushing memtables, writing mutations and CPU

Fwd: Re: How to gracefully decommission a highly loaded node?

2018-12-04 Thread onmstester onmstester
nodetool netstats you can also disablebinary, disablethrift and disablehandoff to stop serving client requests.  -- SIMON FONTANA OSCARSSON Software Developer Ericsson Ölandsgatan 1 37133 Karlskrona, Sweden simon.fontana.oscars...@ericsson.com www.ericsson.com On tis, 2018-12-04 at 14:21 +0330, onmstes

Fwd: Re: How to gracefully decommission a highly loaded node?

2018-12-06 Thread onmstester onmstester
something wrong with decommissioning while someones writing to Cluster? Using Apache Cassandra 3.11.2 Sent using Zoho Mail Forwarded message ==== From : onmstester onmstester To : "user" Date : Wed, 05 Dec 2018 09:00:34 +0330 Subject : Fwd: Re: How to grace

Fwd: Cassandra does launch since computer was accidentally unplugged

2018-12-08 Thread onmstester onmstester
Delete the file: C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542650688953.log and restart Cassandra. Its possible that you lose a bit of data that just existed on this log (not matter if you have replica or could re-insert data again) Sent using Zoho Mail Forwarded me

slow commitlog sync

2018-12-23 Thread onmstester onmstester
Hi, I'm seeing a lot of logs like this in all of my nodes (every 5 minutes): WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-23 08:59:19,075 NoSpamLogger.java:94 - Out of 50 commit log syncs over the past 300s with average duration of 300.00ms, 30 have exceeded the configured commit interval by an av

Fwd: Question about allocate_tokens_for_keyspace

2019-01-28 Thread onmstester onmstester
You could only have one keyspace for the value of allocate_tokens_for_keyspace   to specify a keyspace from which the algorithm can find the replication to optimize for. So as far as your keyspaces are using similar replication strategies and replication factor you should not worry about this.

forgot to run nodetool cleanup

2019-02-12 Thread onmstester onmstester
Hi, I should have run cleanup after adding a few nodes to my cluster, about 2 months ago, the ttl is 6 month, What happens now? Should i worry about any  catastrophics? Should i run the cleanup now? Thanks in advance Sent using https://www.zoho.com/mail/

  1   2   >