cassandra spark-connector-sqlcontext too many tasks
I'm querying a single cassandra partition using sqlContext and Its temView which creates more than 2000 tasks on spark and took about 360 seconds: sqlContext.read().format("org.apache.spark.sql.cassandra).options(ops).load.createOrReplaceTempView("tableName") But using javaFunctions(sc).cassandraTable().where() it creates only one task which response in 200 ms! I'm using exactly the same where clause for both scenarios. Spark UI shows like 60 GB input for sqlcontext scenario and only a few KBs for javaFunctions scenario Sent using Zoho Mail
Cassandra client tuning
I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs: maxConnectionsPerHost = 5 maxRequestsPerHost = 32K maxAsyncQueue at client side = 100K I could achieve 25% of throughtput i needed, client CPU is more than 80% and increasing number of threads cause some execAsync to fail, so configs above are the best the client could handle. Cassandra nodes cpu is less than 30% in average. The data has no locality in sake of partition keys and i can't use createSStable mechanism. Is there any tuning which i'm missing in client side, cause the server side is already tuned with datastax recomendations. Sent using Zoho Mail
Re: Cassandra client tuning
Input data does not preserve good locality and I've already tested batch insert, it was worse than executeAsync in case of throughput but much less CPU usage at client side. Sent using Zoho Mail On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater <ben.sla...@instaclustr.com> wrote You will probably find grouping writes into small batches improves overall performance (if you are not doing it already). See the following presentation for some more info: https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes Cheers Ben On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <onmstes...@zoho.com> wrote: -- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message. I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs: maxConnectionsPerHost = 5 maxRequestsPerHost = 32K maxAsyncQueue at client side = 100K I could achieve 25% of throughtput i needed, client CPU is more than 80% and increasing number of threads cause some execAsync to fail, so configs above are the best the client could handle. Cassandra nodes cpu is less than 30% in average. The data has no locality in sake of partition keys and i can't use createSStable mechanism. Is there any tuning which i'm missing in client side, cause the server side is already tuned with datastax recomendations. Sent using Zoho Mail
Re: Cassandra client tuning
I'm using a queue of 100 ExecuteAsyncs * 1000 statements in in each batch = 100K insert queue in non-batch scenario. Using more than 1000 statememnts per batch throws batch limit exception and some documents recommend no to change batch_size_limit??! Sent using Zoho Mail On Sun, 18 Mar 2018 13:14:54 +0330 Ben Slater <ben.sla...@instaclustr.com> wrote When you say batch was worth than async in terms of throughput are you comparing throughput with the same number of threads or something? I would have thought if you have much less CPU usage on the client with batching and your Cassandra cluster doesn’t sound terribly stressed then there is room to increase threads on the client to up throughput (unless your bottlenecked on IO or something)? On Sun, 18 Mar 2018 at 20:27 onmstester onmstester <onmstes...@zoho.com> wrote: -- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message. Input data does not preserve good locality and I've already tested batch insert, it was worse than executeAsync in case of throughput but much less CPU usage at client side. Sent using Zoho Mail On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater <ben.sla...@instaclustr.com> wrote You will probably find grouping writes into small batches improves overall performance (if you are not doing it already). See the following presentation for some more info: https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes Cheers Ben On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <onmstes...@zoho.com> wrote: -- Ben Slater Chief Product Officer Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message. I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs: maxConnectionsPerHost = 5 maxRequestsPerHost = 32K maxAsyncQueue at client side = 100K I could achieve 25% of throughtput i needed, client CPU is more than 80% and increasing number of threads cause some execAsync to fail, so configs above are the best the client could handle. Cassandra nodes cpu is less than 30% in average. The data has no locality in sake of partition keys and i can't use createSStable mechanism. Is there any tuning which i'm missing in client side, cause the server side is already tuned with datastax recomendations. Sent using Zoho Mail
write latency on single partition table
I've defained a table like this create table test ( hours int, key1 int, value1 varchar, primary key (hours,key1) ) For one hour every input would be written in single partition, because i need to group by some 500K records in the partition for a report with expected response time in less than 1 seconds so using key1 in partition key would made 500K partitions which would be slow on reads. Although using this mechanism gains < 1 seconds response time on reads but the write delay increased surprisingly, for this table write latency reported by cfstats is more than 100ms but for other tables which accessing thousands of partitions while writing in 1 hour , the write delay is 0.02ms. But i was expecting that writes to test table be faster than other tables because always only one node and one partition would be accessed, so no memtable switch happens and all writes would be local to a single node?! Should i add another key to my partition key to distribute data on all of nodes? Sent using Zoho Mail
Re: write latency on single partition table
The size is less than 50MB Sent using Zoho Mail On Sat, 07 Apr 2018 09:09:41 +0430 Laxmikant Upadhyay <laxmikant@gmail.com> wrote It seems your partition size is more..what is the size of value field ? Try to keep your partition size within 100 mb. On Sat, Apr 7, 2018, 9:45 AM onmstester onmstester <onmstes...@zoho.com> wrote: I've defained a table like this create table test ( hours int, key1 int, value1 varchar, primary key (hours,key1) ) For one hour every input would be written in single partition, because i need to group by some 500K records in the partition for a report with expected response time in less than 1 seconds so using key1 in partition key would made 500K partitions which would be slow on reads. Although using this mechanism gains < 1 seconds response time on reads but the write delay increased surprisingly, for this table write latency reported by cfstats is more than 100ms but for other tables which accessing thousands of partitions while writing in 1 hour , the write delay is 0.02ms. But i was expecting that writes to test table be faster than other tables because always only one node and one partition would be accessed, so no memtable switch happens and all writes would be local to a single node?! Should i add another key to my partition key to distribute data on all of nodes? Sent using Zoho Mail
copy from one table to another
Is there any way to copy some part of a table to another table in cassandra? A large amount of data should be copied so i don't want to fetch data to client and stream it back to cassandra using cql. Sent using Zoho Mail
Re: copy from one table to another
Thank you all I need something like this: insert into table test2 select * from test1 where partition_key='SOME_KEYS'; The problem with copying sstable is that original table contains some billions of records and i only want some hundred millions of records from the table, so after copy/pasting big sstables in so many nodes i should wait for a deletion that would take so long to response: delete from test2 where partition_key != 'SOME_KEYS' Sent using Zoho Mail On Mon, 09 Apr 2018 06:14:02 +0430 Dmitry Saprykin <saprykin.dmi...@gmail.com> wrote IMHO The best step by step description of what you need to do is here https://issues.apache.org/jira/browse/CASSANDRA-1585?focusedCommentId=13488959&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13488959 The only difference is that you need to copy data from one table only. I did it for a whole keyspace. On Sun, Apr 8, 2018 at 3:06 PM Jean Carlo <jean.jeancar...@gmail.com> wrote: You can use the same procedure to restore a table from snapshot from datastax webpage https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html Just two modifications. after step 5, modify the name of the sstables to add the name of the table you want to copy to. and in the step 6 copy the sstables to the right directory corresponding to the tale you want to copy to. Be sure you have an snapshot of the table source and ignore step 4 of course Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Sun, Apr 8, 2018 at 6:33 PM, Dmitry Saprykin <saprykin.dmi...@gmail.com> wrote: You can copy hardlinks to ALL SSTables from old to new table and then delete part of data you do not need in a new one. On Sun, Apr 8, 2018 at 10:20 AM, Nitan Kainth <nitankai...@gmail.com> wrote: If it for testing and you don’t need any specific data, just copy a set of sstables with all files of that sequence and move to target tables directory and rename it. Restart target node or run nodetool refresh Sent from my iPhone On Apr 8, 2018, at 4:15 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Is there any way to copy some part of a table to another table in cassandra? A large amount of data should be copied so i don't want to fetch data to client and stream it back to cassandra using cql. Sent using Zoho Mail
Re: Can I sort it as a result of group by?
I'm using apache spark on top of cassandra for such cases Sent using Zoho Mail On Mon, 09 Apr 2018 18:00:33 +0430 DuyHai Doanwrote No, sorting by column other than clustering column is not possible On Mon, Apr 9, 2018 at 11:42 AM, Eunsu Kim wrote: Hello, everyone. I am using 3.11.0 and I have the following table. CREATE TABLE summary_5m ( service_key text, hash_key int, instance_hash int, collected_time timestamp, count int, PRIMARY KEY ((service_key), hash_key, instance_hash, collected_time) ) And I can sum count grouping by primary key. select service_key, hash_key, instance_hash, sum(count) as count_summ from apm.ip_summary_5m where service_key='ABCED' group by service_key, hash_key, instance_hash; But what I want is to get only the top 100 with a high value added. Like following query is attached … (syntax error, of course) order by count_sum limit 100; Anybody have ever solved this problem? Thank you in advance.
A Cassandra Storage Estimation Mechanism
I was going to estimate Hardware requirements for a project which mainly uses Apache Cassandra. Because of rule "Cassandra nodes size better be < 2 TB", the total disk usage determines number of nodes, and in most cases the result of this calculation would be so OK for satisfying the required input rate. So IMHO the storage estimation is the most important part of requirement analysis in this kind of projects. There are some formula's on the net for theorotical storage estimation but the results would be some KB on each row while actual inserts shows a few hundred bytes! So It seems like that the best estimation would be insert alot of real data in real schema of real production server. But i can't have the real data and production cluster before the estimation! So i came up with an estimation idea: 1. I'm using the real schema + > 3 nodes cluster 2. Required assumptions: Real input rate (200K per seconds that would be 150 Billions totally) and Real partition count(Unique Keys in partitions: 1.5 millions totally) 3. Instead of 150 billions, i'm doing 1 , 10 and 100 millions write so i would use 10, 100 and 1000 partitions proportionally! after each run, i would use 'nodetool flush' and using du -sh keyspace_dir, i would check the total disk usage of the rate, for example for rate 1 million, disk usage was 90 MB, so for 150Bil it would be 13 TB . then drop the schema and run the next rate. I would continue this until differential of two consecuence results, would be a tiny number. I've got a good estimation at rate 100 Millions. Actually i was doing the estimation for an already runnig production cluster and i knew the answer beforehand (just wanted to be sure about the idea), and estimation was equal to answer finally! but i'm worried that it was accidental. Finally the question: Is my estimation mechanism correct and would be applicable for any estimation and any project? If not, how to estimate storage (How you estimate)? Thanks in advance Sent using Zoho Mail
Insert-only application repair
In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the nodes of production cluster (according to this: http://cassandra.apache.org/doc/latest/operating/repair.html )? nodetool repair -pr --full When none of the nodes was down in 4 months (ever since the cluster was launched) and none of the rows been deleted, why should i run nodetool repair?
Re: Insert-only application repair
Thank you Nitan, That's exactly my case (RF > CL). But as long as there is no node outage, shouldn't the hinted handoff handle data consistency? Sent using Zoho Mail On Sat, 12 May 2018 16:26:13 +0430 Nitan Kainth <nitankai...@gmail.com> wrote If you have RF>CL then Repair needs to be run to make sure data is in sync. Sent from my iPhone On May 12, 2018, at 3:54 AM, onmstester onmstester <onmstes...@zoho.com> wrote: In an insert-only use case with TTL (6 months), should i run this command, every 5-7 days on all the nodes of production cluster (according to this: http://cassandra.apache.org/doc/latest/operating/repair.html )? nodetool repair -pr --full When none of the nodes was down in 4 months (ever since the cluster was launched) and none of the rows been deleted, why should i run nodetool repair?
Solve Busy pool at Cassandra side
Hi, I'm getting "Pool is Busy (limit is 256)", while connecting to a single node cassandra cluster. The whole client side application is a 3rd-party lib which i can't change it's source and its session builder is not using any PoolingOptions. Is there any config on cassandra side that could handle increasing max requests per connections from 256 to 32K? Sent using Zoho Mail
Re: Interesting Results - Cassandra Benchmarks over Time Series Data for IoT Use Case I
I recommend you to review newts data model, which is a time-series data model upon cassandra: https://github.com/OpenNMS/newts/wiki/DataModel Sent using Zoho Mail First the use-case: We have time-series of data from devices on several sites, where each device (with a unique dev_id) can have several sensors attached to it. Most queries however are both time limited as well as over a range of dev_ids, even for a single sensor (Multi-sensor joins are a whole different beast for another day!). We want to have a schema where the query can complete in time linear to the query ranges for both devices and time range, immaterial (largely) to the total data size. So we explored several different primary key definitions, learning from the best-practices communicated on this mailing list and over the interwebs. While details about the setup (Spark over C*) and schema are in a companion blog/site here [1], we just mention the primary keys and the key points here. PRIMARY KEY (dev_id, day, rec_time) PRIMARY KEY ((dev_id, rec_time) PRIMARY KEY (day, dev_id, rec_time) PRIMARY KEY ((day, dev_id), rec_time) PRIMARY KEY ((dev_id, day), rec_time) Combination of above by adding a year field in the schema. The main takeaway (again, please read through the details at [1]) is that we really don't have a single schema to answer the use case above without some drawback. Thus while the ((day, dev_id), rec_time) gives a constant response, it is dependent entirely on the total data size (full scan). On the other hand, (dev_id, day, rec_time) and its counterpart (day, dev_id, rec_time) provide acceptable results, we have the issue of very large partition space in the first, and hotspot while writing for the latter case. We also observed that having a multi-field partition key allows for fast querying only if the "=" is used going left to right. If an IN() (for specifying eg. range of time or list of devices) is used once that order, than any further usage of IN() removes any benefit (i.e. a near full table scan). Another useful learning was that using the IN() to query for days is less useful than putting in a range query. Currently, it seems we are in a bind --- should we use a different data store for our usecase (which seems quite typical for IoT)? Something like HDFS or Parquet? We would love to get feedback on the benchmarking results and how we can possibly improve this and share widely. [1] Cassandra Benchmarks over Time Series Data for IoT Use Case https://sites.google.com/an10.io/timeseries-results -- Regards, Arbab Khalil Software Design Engineer
Reading from big partitions
Hi, Due to some unpredictable behavior in input data i end up with some hundred partitions having more than 300MB size. Reading any sequence of data from these partitions took about 5 seconds while reading from other partitions (with less than 50MB sizes) took less than 10ms. Since i can't change the data model in sake of a few problematic partitions, Is there any tuning at Cassandra side that could boost up read performance from the big partitions? Thanks in advance Sent using Zoho Mail
Re: Reading from big partitions
Data spread between a SSD disk and a 15K disk. the table has 26 tables totally. I haven't try tracing, but i will and inform you! Sent using Zoho Mail On Sun, 20 May 2018 08:26:33 +0430 Jonathan Haddadwrote What disks are you using? How many sstables are you hitting? Did you try tracing the request?
Re: Reading from big partitions
I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time, it even got worse. Even increasing Key cache size and Row cache size did not help. Sent using Zoho Mail On Sun, 20 May 2018 08:52:03 +0430 Jeff Jirsawrote Column index size in the yaml (increase it to trade GC pressure for disk IO) If you’re on anything other than 3.11.x, upgrade to 3.11.newest -- Jeff Jirsa
Re: Reading from big partitions
Should i run compaction after changing column_index_size_in_kb? Sent using Zoho Mail On Sun, 20 May 2018 15:06:57 +0430 onmstester onmstester <onmstes...@zoho.com> wrote I've increased column_index_size_in_kb to 512 and then 4096 : no change in response time, it even got worse. Even increasing Key cache size and Row cache size did not help. Sent using Zoho Mail On Sun, 20 May 2018 08:52:03 +0430 Jeff Jirsa <jji...@gmail.com> wrote Column index size in the yaml (increase it to trade GC pressure for disk IO) If you’re on anything other than 3.11.x, upgrade to 3.11.newest -- Jeff Jirsa
IN clause of prepared statement
The table is something like Samples ... partition key (partition,resource,(timestamp,metric_name) creating prepared statement : session.prepare("select * from samples where partition=:partition and resource=:resource and timestamp>=:start and timestamp<=:end and metric_name in :metric_names") failed with exception: can not restrict clustering columns by IN relations when a collection is selected by the query The query is OK using cqlsh. using column names in select did not help. Is there anyway to achieve this in Cassandra? I'm aware of performance problems of this query but it does not matter in my case! I'm using datastax driver 3.2 and Apache cassandra 3.11.2 Sent using Zoho Mail
RE: [EXTERNAL] IN clause of prepared statement
I try that too, using select ALL_NON_Collection_Columns ..., encoutered error: IN restrictions are not supported on indexed columns Sent using Zoho Mail On Mon, 21 May 2018 20:10:29 +0430 Durity, Sean R <sean_r_dur...@homedepot.com> wrote One of the columns you are selecting is a list or map or other kind of collection. You can’t do that with an IN clause against a clustering column. Either don’t select the collection column OR don’t use the IN clause. Cassandra is trying to protect itself (and you) from a query that won’t scale well. Honor that. As a good practice, you shouldn’t do select * (as a production query) against any database. You want to list the columns you actually want to select. That way a later “alter table add column” (or similar) doesn’t cause unpredictable results to the application. Sean Durity From: onmstester onmstester <onmstes...@zoho.com> Sent: Sunday, May 20, 2018 10:13 AM To: user <user@cassandra.apache.org> Subject: [EXTERNAL] IN clause of prepared statement The table is something like Samples ... partition key (partition,resource,(timestamp,metric_name) creating prepared statement : session.prepare("select * from samples where partition=:partition and resource=:resource and timestamp>=:start and timestamp<=:end and metric_name in :metric_names") failed with exception: can not restrict clustering columns by IN relations when a collection is selected by the query The query is OK using cqlsh. using column names in select did not help. Is there anyway to achieve this in Cassandra? I'm aware of performance problems of this query but it does not matter in my case! I'm using datastax driver 3.2 and Apache cassandra 3.11.2 Sent using Zoho Mail The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
RE: [EXTERNAL] IN clause of prepared statement
It seems that there is no way doing this using Cassandra and even something like spark won't help because i'm going to read from a big Cassandra partition (bottleneck is reading from Cassandra) Sent using Zoho Mail On Tue, 22 May 2018 09:08:55 +0430 onmstester onmstester <onmstes...@zoho.com> wrote I try that too, using select ALL_NON_Collection_Columns ..., encoutered error: IN restrictions are not supported on indexed columns Sent using Zoho Mail On Mon, 21 May 2018 20:10:29 +0430 Durity, Sean R <sean_r_dur...@homedepot.com> wrote One of the columns you are selecting is a list or map or other kind of collection. You can’t do that with an IN clause against a clustering column. Either don’t select the collection column OR don’t use the IN clause. Cassandra is trying to protect itself (and you) from a query that won’t scale well. Honor that. As a good practice, you shouldn’t do select * (as a production query) against any database. You want to list the columns you actually want to select. That way a later “alter table add column” (or similar) doesn’t cause unpredictable results to the application. Sean Durity From: onmstester onmstester <onmstes...@zoho.com> Sent: Sunday, May 20, 2018 10:13 AM To: user <user@cassandra.apache.org> Subject: [EXTERNAL] IN clause of prepared statement The table is something like Samples ... partition key (partition,resource,(timestamp,metric_name) creating prepared statement : session.prepare("select * from samples where partition=:partition and resource=:resource and timestamp>=:start and timestamp<=:end and metric_name in :metric_names") failed with exception: can not restrict clustering columns by IN relations when a collection is selected by the query The query is OK using cqlsh. using column names in select did not help. Is there anyway to achieve this in Cassandra? I'm aware of performance problems of this query but it does not matter in my case! I'm using datastax driver 3.2 and Apache cassandra 3.11.2 Sent using Zoho Mail The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
cassandra concurrent read performance problem
By reading 90 partitions concurrently(each having size > 200 MB), My single node Apache Cassandra became unresponsive, no read and write works for almost 10 minutes. I'm using this configs: memtable_allocation_type: offheap_buffers gc: G1GC heap: 128GB concurrent_reads: 128 (having more than 12 disk) There is not much pressure on my resources except for the memory that the eden with 70GB is filled and reallocated in less than a minute. Cpu is about 20% while read is crashed and iostat shows no significant load on disk. Sent using Zoho Mail
cassandra update vs insert + delete
Hi I want to load all rows from many partitions and change a column value in each row, which of following ways is better concerning disk space and performance? 1. create a update statement for every row and batch update for each partitions 2. create an insert statement for every row and batch insert for each partition, then run a single statement to delete the whole old partition Thanks in advance Sent using Zoho Mail
Fwd: Re: cassandra update vs insert + delete
How update is working underneath? Does it create a new row (because i'm changing a column of partition key) and add a tombstone to the old row? Sent using Zoho Mail Forwarded message >From : Jonathan Haddad <j...@jonhaddad.com> To : <user@cassandra.apache.org> Date : Mon, 28 May 2018 00:07:36 +0430 Subject : Re: cassandra update vs insert + delete Forwarded message What is a “soft delete”? My 2 cents, if you want to update some information just update it. There’s no need to overthink it. Batches are good if they’re constrained to a single partition, not so hot otherwise. On Sun, May 27, 2018 at 8:19 AM Rahul Singh <rahul.xavier.si...@gmail.com> wrote: -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade Deletes create tombstones — not really something to consider. Better to add / update or insert data and do a soft delete on old data and apply a TTL to remove it at a future time. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 27, 2018, 5:36 AM -0400, onmstester onmstester <onmstes...@zoho.com>, wrote: Hi I want to load all rows from many partitions and change a column value in each row, which of following ways is better concerning disk space and performance? 1. create a update statement for every row and batch update for each partitions 2. create an insert statement for every row and batch insert for each partition, then run a single statement to delete the whole old partition Thanks in advance Sent using Zoho Mail
Fwd: 答复: Re: cassandra update vs insert + delete
It seems that i can't update a part of primary key in cqlsh update: message "PRIMARY Key part column found in SET part" Sent using Zoho Mail Forwarded message >From : Xiangfei Ni <xiangfei...@cm-dt.com> To : "user@cassandra.apache.org"<user@cassandra.apache.org> Date : Mon, 28 May 2018 11:04:06 +0430 Subject : 答复: Re: cassandra update vs insert + delete Forwarded message Yes, you are correct. Best Regards, 倪项菲/ David Ni 中移德电网络科技有限公司 Virtue Intelligent Network Ltd, co. Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei Mob: +86 13797007811|Tel: + 86 27 5024 2516 发件人: onmstester onmstester <onmstes...@zoho.com> 发送时间: 2018年5月28日 14:33 收件人: user <user@cassandra.apache.org> 主题: Fwd: Re: cassandra update vs insert + delete How update is working underneath? Does it create a new row (because i'm changing a column of partition key) and add a tombstone to the old row? Sent using Zoho Mail Forwarded message >From : Jonathan Haddad <j...@jonhaddad.com> To : <user@cassandra.apache.org> Date : Mon, 28 May 2018 00:07:36 +0430 Subject : Re: cassandra update vs insert + delete Forwarded message What is a “soft delete”? My 2 cents, if you want to update some information just update it. There’s no need to overthink it. Batches are good if they’re constrained to a single partition, not so hot otherwise. On Sun, May 27, 2018 at 8:19 AM Rahul Singh <rahul.xavier.si...@gmail.com> wrote: -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade Deletes create tombstones — not really something to consider. Better to add / update or insert data and do a soft delete on old data and apply a TTL to remove it at a future time. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 27, 2018, 5:36 AM -0400, onmstester onmstester <onmstes...@zoho.com>, wrote: Hi I want to load all rows from many partitions and change a column value in each row, which of following ways is better concerning disk space and performance? 1. create a update statement for every row and batch update for each partitions 2. create an insert statement for every row and batch insert for each partition, then run a single statement to delete the whole old partition Thanks in advance Sent using Zoho Mail
how to immediately delete tombstones
Hi, I've deleted 50% of my data row by row now disk usage of cassandra data is more than 80%. The gc_grace of table was default (10 days), now i set that to 0, although many compactions finished but no space reclaimed so far. How could i force deletion of tombstones in sstables and reclaim the disk used by deleted rows? I'm using cassandra on a single node. Sent using Zoho Mail
Re: how to immediately delete tombstones
Thanks for your replies But my current situation is that i do not have enough free disk for my biggest sstable, so i could not run major compaction or nodetool garbagecollect Sent using Zoho Mail On Thu, 31 May 2018 22:32:32 +0430 Alain RODRIGUEZwrote
data consistency without using nodetool repair
I'm using RF=2 (i know it should be at least 3 but i'm short of resources) and WCl=ONE and RCL=ONE in a cluster of 10 nodes in a insert-only scenario. The problem: i dont want to use nodetool repair because it would put hige load on my cluster for a long time, but also i need data consistency and fault tolerance in a way that: if one of my nodes fails: 1. there would be no single record data loss 2. write/read of data would be continued with no problem I know that current config won't satisfy No.1, so changed the Write Consistensy Level to ALL and to satisfy No.2, i'm catching exceptions of "1 replicas needed but 2 required", write those records again with WCL = ONE and put them somewhere for rewrite later with WCL=2. Is there anything wrong with this workaround? any better solution (Strong requirements: I can't change the RF, The system should tolerate 1 node failure with no data loss and no read/write failure) ? Sent using Zoho Mail
Re: data consistency without using nodetool repair
Thanks Jeff, If i run repair every 10 days, then, would there be a chance of losing data by losing one node (data inserted exactly after last repair) ? Sent using Zoho Mail On Sun, 10 Jun 2018 10:14:46 +0430 Jeff Jirsa <jji...@gmail.com> wrote On Jun 9, 2018, at 10:28 PM, onmstester onmstester <onmstes...@zoho.com> wrote: I'm using RF=2 (i know it should be at least 3 but i'm short of resources) and WCl=ONE and RCL=ONE in a cluster of 10 nodes in a insert-only scenario. The problem: i dont want to use nodetool repair because it would put hige load on my cluster for a long time, but also i need data consistency and fault tolerance in a way that: if one of my nodes fails: 1. there would be no single record data loss This requires write > 1 2. write/read of data would be continued with no problem This requires more replicas in the ring than the number of replicas requires for reads and writes I know that current config won't satisfy No.1, so changed the Write Consistensy Level to ALL and to satisfy No.2, i'm catching exceptions of "1 replicas needed but 2 required", write those records again with WCL = ONE and put them somewhere for rewrite later with WCL=2. Is there anything wrong with this workaround? any better solution (Strong requirements: I can't change the RF, The system should tolerate 1 node failure with no data loss and no read/write failure) Sorta works, but it’s dirty and a lot of edge cases. Depending on the budget and value of data, maybe it’s ok, but I wouldn’t trust it to be good enough for critical use cases The transient replication work that Ariel is doing (https://issues.apache.org/jira/browse/CASSANDRA-14404) will likely benefit you here in 4.0 (but it will require you to run repair).
saving distinct data in cassandra result in many tombstones
Hi, I needed to save a distinct value for a key in each hour, the problem with saving everything and computing distincts in memory is that there are too many repeated data. Table schema: Table distinct( hourNumber int, key text, distinctValue long primary key (hourNumber) ) I want to retrieve distinct count of all keys in a specific hour and using this data model it would be achieved by reading a single partition. The problem : i can't read from this table, system.log indicates that more than 100K tombstones read and no live data in it. The gc_grace time is the default (10 days), so i thought decreasing it to 1 hour and run compaction, but is this a right approach at all? i mean the whole idea of replacing some millions of rows. each 10 times in a partition again and again that creates alot of tombstones just to achieve distinct behavior? Thanks in advance Sent using Zoho Mail
Write performance degradation
Hi, I was doing 500K inserts + 100K counter update in seconds on my cluster of 12 nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements with no problem. I saw a lot of warning show that most of batches not concerning a single node, so they should not be in a batch, on the other hand input load of my application increased by 50%, so i switched to non-batch async inserts and increased number of client threads so the load increased by 50%. The system worked for 2 days with no problem with load of 750K inserts + 150K counter updates per seconds but suddendly a lot of timeout on insert generated in log files Decreasing input load to previous load, even less than that did not help. When i restart my client (after some hours that its been started log timeouts and erros) it works with no problem for 20 minutes but again starts logging timeout errors. CPU load of nodes in cluster is less than 25%. How can i solve this problem? I'm saving all jmx metrics of cassande\ra by monitoring system, What should i check? Sent using Zoho Mail
Re: saving distinct data in cassandra result in many tombstones
Can i set gc_grace_seconds to 0 in this case? because reappearing deleted data has no impact on my Business Logic, i'm just either creating a new row or replacing the exactly same row. Sent using Zoho Mail On Wed, 13 Jun 2018 03:41:51 +0430 Elliott Sims <elli...@backblaze.com> wrote If this is data that expires after a certain amount of time, you probably want to look into using TWCS and TTLs to minimize the number of tombstones. Decreasing gc_grace_seconds then compacting will reduce the number of tombstones, but at the cost of potentially resurrecting deleted data if the table hasn't been repaired during the grace interval. You can also just increase the tombstone thresholds, but the queries will be pretty expensive/wasteful. On Tue, Jun 12, 2018 at 2:02 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Hi, I needed to save a distinct value for a key in each hour, the problem with saving everything and computing distincts in memory is that there are too many repeated data. Table schema: Table distinct( hourNumber int, key text, distinctValue long primary key (hourNumber) ) I want to retrieve distinct count of all keys in a specific hour and using this data model it would be achieved by reading a single partition. The problem : i can't read from this table, system.log indicates that more than 100K tombstones read and no live data in it. The gc_grace time is the default (10 days), so i thought decreasing it to 1 hour and run compaction, but is this a right approach at all? i mean the whole idea of replacing some millions of rows. each 10 times in a partition again and again that creates alot of tombstones just to achieve distinct behavior? Thanks in advance Sent using Zoho Mail
Re: Write performance degradation
I think that could have pinpoint the problem, i have a table with a partition key related to timestamp so for one hour so many data would be inserted at one single node, this table creates a very big partitions (300MB-600MB), whatever node the current partition of that table would be inserted to, reports too many DroppedMutations (sometimes 6M in 5 minutes) and when the load increases it would slow down a single node in my cluster. So i think that i should change my data model and use sharding in partition key of problematic table. Sent using Zoho Mail On Mon, 18 Jun 2018 16:24:48 +0430 DuyHai Doan <doanduy...@gmail.com> wrote Maybe the disk I/O cannot keep up with the high mutation rate ? Check the number of pending compactions On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Hi, I was doing 500K inserts + 100K counter update in seconds on my cluster of 12 nodes (20 core/128GB ram/4 * 600 HDD 10K) using batch statements with no problem. I saw a lot of warning show that most of batches not concerning a single node, so they should not be in a batch, on the other hand input load of my application increased by 50%, so i switched to non-batch async inserts and increased number of client threads so the load increased by 50%. The system worked for 2 days with no problem with load of 750K inserts + 150K counter updates per seconds but suddendly a lot of timeout on insert generated in log files Decreasing input load to previous load, even less than that did not help. When i restart my client (after some hours that its been started log timeouts and erros) it works with no problem for 20 minutes but again starts logging timeout errors. CPU load of nodes in cluster is less than 25%. How can i solve this problem? I'm saving all jmx metrics of cassande\ra by monitoring system, What should i check? Sent using Zoho Mail
Re: saving distinct data in cassandra result in many tombstones
Two other questions: 1. How to use sharding partition key in a way that partitions end up in different nodes? 2. if i set gc_grace_seconds to 0, would it replace the row at memtable (not saving repeated rows in sstables) or it would be done at first compaction? Sent using Zoho Mail On Tue, 19 Jun 2018 08:16:28 +0430 onmstester onmstester <onmstes...@zoho.com> wrote Can i set gc_grace_seconds to 0 in this case? because reappearing deleted data has no impact on my Business Logic, i'm just either creating a new row or replacing the exactly same row. Sent using Zoho Mail On Wed, 13 Jun 2018 03:41:51 +0430 Elliott Sims <elli...@backblaze.com> wrote If this is data that expires after a certain amount of time, you probably want to look into using TWCS and TTLs to minimize the number of tombstones. Decreasing gc_grace_seconds then compacting will reduce the number of tombstones, but at the cost of potentially resurrecting deleted data if the table hasn't been repaired during the grace interval. You can also just increase the tombstone thresholds, but the queries will be pretty expensive/wasteful. On Tue, Jun 12, 2018 at 2:02 AM, onmstester onmstester <onmstes...@zoho.com> wrote: Hi, I needed to save a distinct value for a key in each hour, the problem with saving everything and computing distincts in memory is that there are too many repeated data. Table schema: Table distinct( hourNumber int, key text, distinctValue long primary key (hourNumber) ) I want to retrieve distinct count of all keys in a specific hour and using this data model it would be achieved by reading a single partition. The problem : i can't read from this table, system.log indicates that more than 100K tombstones read and no live data in it. The gc_grace time is the default (10 days), so i thought decreasing it to 1 hour and run compaction, but is this a right approach at all? i mean the whole idea of replacing some millions of rows. each 10 times in a partition again and again that creates alot of tombstones just to achieve distinct behavior? Thanks in advance Sent using Zoho Mail
copy sstables while cassandra is running
Hi I'm using two directories on different disks as cassandra data storage, the small disk is 90% full and the bigger diskis 30% full (the bigger one was added later that we find out we need more storage!!), so i want to move all data to the big disk, one way is to stop my application and copy all sstables from small disk to big one, but it would take some hours and not acceptable due to QoS. I thought maybe i could copy the big sstables (the one that won't be compact in weeks) to the big disk (near casssandra data but not right there) while cassandra and my app are still running , then stop cassandra and my app, move big file to exact directory of cassandra data on big disk (would take a few seconds) and then move remained small sstables from small disk to big one. Are there all of sstables related file immutable and (data, index, summary, ...) would only be changed by compactions? Any better workaround for this scenario would be appriciated? Thanks in Advance Sent using Zoho Mail
adding a non-used column just to debug ttl
Hi, Because of "Cannot use selection function ttl on PRIMARY KEY part type", i'm adding a boolean column to table with no non-primary key columns, i'm just worried about someday i would need debugging on ttl! is this a right approach? anyone else is doing this? Sent using Zoho Mail
Compaction out of memory
Cassandra crashed in Two out of 10 nodes in my cluster within 1 day, the error is: ERROR [CompactionExecutor:3389] 2018-07-10 11:27:58,857 CassandraDaemon.java:228 - Exception in thread Thread[CompactionExecutor:3389,1,main] org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:170) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.MmappedRegions.(MmappedRegions.java:73) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.MmappedRegions.(MmappedRegions.java:61) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:104) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:362) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:290) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:179) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:134) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:65) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:142) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:201) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:85) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[apache-cassandra-3.11.2.jar:3.11.2] at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:275) ~[apache-cassandra-3.11.2.jar:3.11.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.2.jar:3.11.2] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65] Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:939) ~[na:1.8.0_65] at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153) ~[apache-cassandra-3.11.2.jar:3.11.2] ... 23 common frames omitted Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) ~[na:1.8.0_65] at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:936) ~[na:1.8.0_65] ... 24 common frames omitted Each node has 128 GB ram which 32 GB allocated as Cassandra Heap. Sent using Zoho Mail
changing ip address of all nodes in cluster
I need to assign a new ip range to my cluster, What's the procedure? Thanks in advance Sent using Zoho Mail
Fwd: changing ip address of all nodes in cluster
I tested the single node scenario on all nodes iteratively and it worked: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsChangeIp.html Sent using Zoho Mail Forwarded message From : onmstester onmstester To : "user" Date : Sun, 15 Jul 2018 16:00:06 +0430 Subject : changing ip address of all nodes in cluster Forwarded message I need to assign a new ip range to my cluster, What's the procedure? Thanks in advance Sent using Zoho Mail
New cluster vs Increasing nodes to already existed cluster
Currently i have a cluster with 10 nodes dedicated to one keyspace (Hardware sizing been done according to input rate and ttl just for current application requirements). I need a launch a new application with new keyspace with another set of servers (8 nodes), there is no relation between the current and new application. I have two option: 1. add new nodes to already existed cluster (10 nodes + 8 nodes) and share the power and storage between the keyspace 2. create a new cluster for the new application (isolate clusters) Which option do you recommend and why ? (i care about of cost of maintenance, performance (write and read), isolation of problems) Sent using Zoho Mail
Re: Cassandra node RAM amount vs data-per-node/total data?
I actually never set Xmx > 32 GB, for any java application, unless it necessarily need more. Just because of the fact: "once you exceed this 32 GiB border JVM will stop using compressed object pointers, effectively reducing the available memory. That means increasing your JVM heap above 32 GiB you must go way above. increasing heap from 32 GiB to anything below 48 GiB will actually decrease the amount of available memory (!) because compressed object pointers are no longer there." And one another thing: why default cassandra setup won't go > 8GB, even if 256 GB of ram is available (Considering that default configs should be useful for most cases )? And also most of data structures could be mode to off-heap even memtable and it's been recommended for better performance (although i never changed default configs to move something to off-heap) Sent using Zoho Mail вт, 17 июл. 2018 г., 17:22 Rahul Singh : I usually don’t want to put more than 1.0-1.5 TB ( at the most ) per node. It makes streaming slow beyond my patience and keeps the repair / compaction processes lean. Memory depends on how much you plan to keep in memory in terms of key / row cache. For my uses, no less than 64GB if not more ~ 128GB. The lowest I’ve gone is 16GB but that’s for dev purposes only. -- Rahul Singh rahul.si...@anant.us https://www.anant.us/datastax Anant Corporation On Jul 17, 2018, 8:26 AM -0400, Vsevolod Filaretov , wrote: What are general community and/or your personal experience viewpoints on cassandra node RAM amount vs data stored per node question? Thank you very much. Best regards, Vsevolod.
how to fix too many native-transport-blocked?
Hi , On a cluster with 10 nodes, Out of 20K/seconds Native-Transports, 200/seconds of them blocked. They are mostly small single writes. Also I'm expriencing random read delays, which i suspect the filled native queue. On all nodes, cpu usage is less than 20 percent, and there is no problem in memory and disk usage so far. I'm going to fix all native blocks . I'm aware of two parameters to tune it: native_transport_max_threads max_queued_native_transport_requests I think that i should increase native_transport_max_threads until native blocks be 0 on each node and also i would monitor the impact on cpu and ram usage. The reason to increase only native_transport_max_threads and not changing max_queued_native_transport_requests is that read requests need to be processed quickly instead of waiting in any queue. Is this a good approach to tune cassandra cluster? Should i care for 0 native blocks or its normal in heavy write scenario? Thanks in advance Sent using Zoho Mail
Cassandra crashed with no log
Cassandra in one of my nodes, crashed without any error/warning in system/gc/debug log. All jmx metrics is being monitored, last fetched values for heap usage is 50% and for cpu usage is 20%. How can i find the cause of crash? Sent using Zoho Mail
JMX metric to report number failed WCL ALL
I'm using RF=2 and Write consistency = ONE, is there a counter in cassandra jmx to report number of writes that only acknowledged by one node (instead of both replica's)? Although i don't care all replicas acknowledge the write, but i consider this as normal status of cluster. Sent using Zoho Mail
Fwd: Re: Cassandra crashed with no log
Thanks Jeff, At time of crash it said: .../linux-4.4.0/mm/pgtable-generic.c:33: bad pmd So i just run this on all of my nodes: echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag Sent using Zoho Mail Forwarded message From : Jeff Jirsa To : Date : Sun, 22 Jul 2018 10:43:38 +0430 Subject : Re: Cassandra crashed with no log Forwarded message Anything in non-Cassandra logs? Dmesg? -- Jeff Jirsa On Jul 21, 2018, at 11:07 PM, onmstester onmstester wrote: Cassandra in one of my nodes, crashed without any error/warning in system/gc/debug log. All jmx metrics is being monitored, last fetched values for heap usage is 50% and for cpu usage is 20%. How can i find the cause of crash? Sent using Zoho Mail
Re: JMX metric to report number failed WCL ALL
Thanks Jordan, Sent using Zoho Mail On Mon, 23 Jul 2018 21:28:54 +0430 Jordan West wrote https://issues.apache.org/jira/browse/CASSANDRA-13289 is a new feature, not yet available (its merged into trunk), that I believe will let you monitor what you want. On Sun, Jul 22, 2018 at 4:03 AM, onmstester onmstester wrote: I'm using RF=2 and Write consistency = ONE, is there a counter in cassandra jmx to report number of writes that only acknowledged by one node (instead of both replica's)? Although i don't care all replicas acknowledge the write, but i consider this as normal status of cluster. Sent using Zoho Mail
Data model storage optimization
The current data model described as table name: ((partition_key),cluster_key),other_column1,other_column2,... user_by_name: ((time_bucket, username)),ts,request,email user_by_mail: ((time_bucket, email)),ts,request,username The reason that all 2 keys (username, email) repeated in all tables is that there may be different username with the same email or different email with same username, and the query for data model is: 1. username = X 2. mail=Y 3. username = X and mail= Y (we query one of tables and because there is small number of records in result, we filter the other column) This data model results in wasting lots of storage. I thought using UUID or hash code or sequence to handle this but i can't keep track of the old vs new records (the ones that already have UUID). Any recommendation on optimizing data model to save storage? Sent using Zoho Mail
Fwd: Re: Data model storage optimization
How many rows in average per partition? around 10K. Let me get this straight : You are bifurcating your partitions on either email or username , essentially potentially doubling the data because you don’t have a way to manage a central system of record of users ? We are just analyzing output logs of a "perfectly" running application!, so no one let me change its data design, i thought maybe it would be a more general problem for cassandra users that someone both 1. needed to access a identical set of columns by multiple keys (all the keys should be present in rows) 2. there was a storage limit (due to TTL * input rate would be some TBs) I know that there is a strict rule in cassandra data modeling : "never use foreign keys and sacrifice disk instead", but anyone ever been forced to do such a thing and How?
full text search on some text columns
I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this with home-made solutions would be appreciated? Sent using Zoho Mail
Re: full text search on some text columns
Thanks Jordan, There would be millions of rows per day, is SASI capable of standing such a rate? Sent using Zoho Mail On Tue, 31 Jul 2018 19:47:55 +0430 Jordan West wrote On Tue, Jul 31, 2018 at 7:45 AM, onmstester onmstester wrote: I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). For simple LIKE queries on existing columns you could give SASI (https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/useSASIIndex.html) a try without having to stand up a separate piece of software. Its relatively new and isn’t as battle tested as other parts of Cassandra but it has been used in production. There are some performance issues with wider-CQL partitions if you have those (https://issues.apache.org/jira/browse/CASSANDRA-11990). I hope to address that for 4.0, time permitted. Full disclosure, I was one of the original SASI authors. The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this with home-made solutions would be appreciated? Sent using Zoho Mail
RE: [EXTERNAL] full text search on some text columns
Actually we can't afford buying DataStax Search Sent using Zoho Mail On Tue, 31 Jul 2018 19:38:28 +0430 Durity, Sean R wrote That sounds like a problem tailor-made for the DataStax Search (embedded SOLR) solution. I think that would be the fastest path to success. Sean Durity From: onmstester onmstester Sent: Tuesday, July 31, 2018 10:46 AM To: user Subject: [EXTERNAL] full text search on some text columns I need to do a full text search (like) on one of my clustering keys and one of partition keys (it use text as data type). The input rate is high so only Cassandra could handle it, Is there any open source version project which help using cassandra+ solr or cassandra + elastic? Any Recommendation on doing this with home-made solutions would be appreciated? Sent using Zoho Mail The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Fwd: Re: [EXTERNAL] full text search on some text columns
It seems to be an interesting project but sort of abandoned. No update in last 8 Months and not supporting Cassandra 3.11.2 (the version i currently use) Sent using Zoho Mail Forwarded message From : Andrzej Śliwiński To : Date : Wed, 01 Aug 2018 08:16:06 +0430 Subject : Re: [EXTERNAL] full text search on some text columns Forwarded message Maybe this plugin could do the job: https://github.com/Stratio/cassandra-lucene-index On Tue, 31 Jul 2018 at 22:37, onmstester onmstester wrote:
updating old partitions in STCS
I read in some best practising documents on datam model that: do not update old partitions while using STCS. But i always use cluster keys in my queries and cqlsh-tracing reports that it only accesses sstables with data having specified cluster key (not all sstables containing part of partition). For example: sstable1 contains (part_key1,cluster_key1) sstable2 contains (part_key1,cluster_key2) query: select * from my_table where partition_key=part_key1 and custering_key=cluster_key1, only accesses sstable1. So should i care about this rule and Why? Thanks in advance Sent using Zoho Mail
data loss
I am inserting to Cassandra by a simple insert query and an update counter query for every input record. input rate is so high. I've configured the update query with idempotent = true (no config for insert query, default is false IMHO) I've seen multiple records having rows in counter table (idempotent one) while not having any row in the table with simple insert! I'm using executeAsync and in catch i will retry the insert/uodate for whole batch of statements (while-true so retry until all satements been inserted) and using this i was sure that everything would be persisted in Cassandra. If a non idempotent insert timed out, wouldn't it should throw exception and be retried in my java code?
bigger data density with Cassandra 4.0?
I've noticed this new feature of 4.0: Streaming optimizations (https://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html) Is this mean that we could have much more data density with Cassandra 4.0 (less problems than 3.X)? I mean > 10 TB of data on each node without worrying about node join/remove? This is something needed for Write-Heavy applications that do not read a lot. When you have like 2 TB of data per day and need to keep it for 6 month, it would be waste of money to purchase 180 servers (even Commodity or Cloud). IMHO, even if 4.0 fix problem with streaming/joining a new node, still Compaction is another evil for a big node, but we could tolerate that somehow Sent using Zoho Mail
Fwd: Re: bigger data density with Cassandra 4.0?
Thanks Kurt, Actually my cluster has > 10 nodes, so there is a tiny chance to stream a complete SSTable. While logically any Columnar noSql db like Cassandra, needs always to re-sort grouped data for later-fast-reads and having nodes with big amount of data (> 2 TB) would be annoying for this background process, How is it possible that some of these databases like HBase and Scylla db does not emphasis on small nodes (like Cassandra do)? Sent using Zoho Mail Forwarded message From : kurt greaves To : "User" Date : Wed, 29 Aug 2018 12:03:47 +0430 Subject : Re: bigger data density with Cassandra 4.0? Forwarded message My reasoning was if you have a small cluster with vnodes you're more likely to have enough overlap between nodes that whole SSTables will be streamed on major ops. As N gets >RF you'll have less common ranges and thus less likely to be streaming complete SSTables. Correct me if I've misunderstood.
Re: Re: bigger data density with Cassandra 4.0?
Could you please explain more about (you mean slower performance in compare to Cassandra?) ---Hbase tends to be quite average for transactional data and about: ScyllaDB IDK, I'd assume they just sorted out streaming by learning from C*'s mistakes. While ScyllaDB is a much younger project than Cassandra with so much less usage and attention, Currently I encounter a dilemma on launching new clusters which is: should i wait for Cassandra community to apply all enhancement's and bug fixes that applied by their main competitors (Scylla DB or Cosmos DB) or just switch to competitors (afraid of the new world!)? For example right now is there a motivation to handle more dense nodes in near future? Again, Thank you for your time Sent using Zoho Mail On Wed, 29 Aug 2018 15:16:40 +0430 kurt greaves wrote Most of the issues around big nodes is related to streaming, which is currently quite slow (should be a bit better in 4.0). HBase is built on top of hadoop, which is much better at large files/very dense nodes, and tends to be quite average for transactional data. ScyllaDB IDK, I'd assume they just sorted out streaming by learning from C*'s mistakes. On 29 August 2018 at 19:43, onmstester onmstester wrote: Thanks Kurt, Actually my cluster has > 10 nodes, so there is a tiny chance to stream a complete SSTable. While logically any Columnar noSql db like Cassandra, needs always to re-sort grouped data for later-fast-reads and having nodes with big amount of data (> 2 TB) would be annoying for this background process, How is it possible that some of these databases like HBase and Scylla db does not emphasis on small nodes (like Cassandra do)? Sent using Zoho Mail Forwarded message From : kurt greaves To : "User" Date : Wed, 29 Aug 2018 12:03:47 +0430 Subject : Re: bigger data density with Cassandra 4.0? Forwarded message My reasoning was if you have a small cluster with vnodes you're more likely to have enough overlap between nodes that whole SSTables will be streamed on major ops. As N gets >RF you'll have less common ranges and thus less likely to be streaming complete SSTables. Correct me if I've misunderstood.
adding multiple node to a cluster, cleanup and num_tokens
I'm going to add more 6 nodes to my cluster (already has 4 nodesand RF=2) using GossipingPropertyFileSnitch, and NetworkTopologyStrategy and default num_tokens = 256. It recommended to join nodes one by one, although there is < 200GB on each node, i will do so. In the document mentioned that i should run nodetool cleanup after joining a new node: Run nodetool cleanup on the source node and on neighboring nodes that shared the same subrange after the new node is up and running. Failure to run this command after adding a node causes Cassandra to include the old data to rebalance the load on that node It also mentioned that Cleanup can be safely postponed for low-usage hours. Should i run nodetool cleanup on each node, after adding every node? (considering that cleanup too should be done one-by-one , it would be a lot of tasks to do! ) is it possible to run clean-up once (after all new nodes joined the cluster) on all the nodes? I also don't understand the part for: allocate_tokens_for_local_replication_factor, i didn't change num_tokes:256 and anything related to vnode config in yaml conf and load already distributed evenly (is this a good approach and good num_tokens, while i'm using nodes with same spec?), so should i consider this config ( allocate_tokens_for_local_replication_factor) while adding new node having a single keyspace with RF=2? Sent using Zoho Mail
Re: adding multiple node to a cluster, cleanup and num_tokens
Thanks Alex, So you suggest that i should not worry about this: Failure to run this command (cleanup) after adding a node causes Cassandra to include the old data to rebalance the load on that node Would you kindly explain a little more? Sent using Zoho Mail It makes a lot of sense to run cleanup once after you have added all the new nodes. Cheers, -- Alex
Re: adding multiple node to a cluster, cleanup and num_tokens
What i have understood from this part of document is that, when i already have node A,B and C in cluster there would be some old data on A,B,C after new node D joined the cluster completely which is data streamed to D, then if i add node E to the cluster immediately, the old data on A,B,C would be also moved between nodes everytime? Sent using Zoho Mail On Mon, 03 Sep 2018 14:39:37 +0430 onmstester onmstester wrote Thanks Alex, So you suggest that i should not worry about this: Failure to run this command (cleanup) after adding a node causes Cassandra to include the old data to rebalance the load on that node Would you kindly explain a little more? Sent using Zoho Mail It makes a lot of sense to run cleanup once after you have added all the new nodes. Cheers, -- Alex
counter mutation not persisted
My application updates a counter table with rate 50K per seconds in a cluster with 10 nodes. The problem is that counter value is less than what it should be in 20% of cases. dropped counter mutation in jmx is always equal to 0. I'm using batch statements to update counters and executeAsync. I use BatchStatement.Type.UNLOGGED but i noticed that also there is a BatchStatement.Type.COUNTER, is there any benefit in using this type in my update counter method ? Thanks in Advance Sent using Zoho Mail
Cluster CPU usage limit
IMHO, Cassandra write is more of a CPU bound task, so while determining cluster write throughput, what CPU usage percent (avg among all cluster nodes) should be determined as limit? Rephrase: what's the normal CPU usage in Cassandra cluster (while no compaction, streaming or heavy-read running) ? For a cluster with 10 nodes, i got 700K write per seconds for my data model, average cpu load is about 40%, i'm going to increase number of native threads (now is 256) and native queue (1024) to increase throughput (and CPU usage subsequently). Sent using Zoho Mail
Fwd: Re: Cluster CPU usage limit
Actually its 256 Native Transport threads, the number of concurrent threads on each node is 32. My main concern is, What amount of CPU capacity should i keep free for tasks other than write that includes compaction and read? Sent using Zoho Mail Forwarded message From : Elliott Sims To : Date : Fri, 07 Sep 2018 08:05:27 +0430 Subject : Re: Cluster CPU usage limit Forwarded message It's interesting and a bit surprising that 256 write threads isn't enough. Even with a lot of cores, I'd expect you to be able to saturate CPU with that many threads. I'd make sure you don't have other bottlenecks, like GC, IOPs, network, or "microbursts" where your load is actually fluctuating between 20-100% CPU. Admittedly, I actually did get best results with 256 threads (and haven't tested higher, but lower is definitely not enough), but every advice I've seen is for a lower write thread count being optimal for most cases. On Thu, Sep 6, 2018 at 5:51 AM, onmstester onmstester wrote: IMHO, Cassandra write is more of a CPU bound task, so while determining cluster write throughput, what CPU usage percent (avg among all cluster nodes) should be determined as limit? Rephrase: what's the normal CPU usage in Cassandra cluster (while no compaction, streaming or heavy-read running) ? For a cluster with 10 nodes, i got 700K write per seconds for my data model, average cpu load is about 40%, i'm going to increase number of native threads (now is 256) and native queue (1024) to increase throughput (and CPU usage subsequently). Sent using Zoho Mail
RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens
Why not setting default vnodes count to that recommendation in Cassandra installation files? Sent using Zoho Mail On Tue, 04 Sep 2018 17:35:54 +0430 Durity, Sean R wrote Longer term, I agree with Oleksandr, the recommendation for number of vnodes is now much smaller than 256. I am using 8 or 16. Sean Durity
Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens
Thanks Jon, But i never concerned about num_tokens config before, because no official cluster setup documents (on datastax: https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html or other blogs) warned us-beginners to be concerned about it. I always setup my clusters with nodes having same hardware spec (homogeneous) and num_tokens = 256, and data seems to be evenly distributed, at least nodetool status report it that way + killing any node, i still got all of my data and application was working, So i assumed data perfectly and evenly distributed among nodes. So could you please explain more why should i run that python command and config allocate_tokens_for_keyspace? i only have one keyspace per cluster. Im using Network replication strategy, and a rack-aware topology config. Sent using Zoho Mail On Sat, 08 Sep 2018 17:17:10 +0430 Jonathan Haddad wrote 256 tokens is a pretty terrible default setting especially post 3.0. I recommend folks use 4 tokens for new clusters, with some caveats. When you fire up a cluster, there's no way to make the initial tokens be distributed evenly, you'll get random ones. You'll want to set them explicitly using: python -c 'print( [str(((2**64 / 4) * i) - 2**63) for i in range(4)])' After you fire up the first seed, create a keyspace using RF=3 (or whatever you're planning on using) and set allocate_tokens_for_keyspace to that keyspace in your config, and join the rest of the nodes. That gives even distribution. On Sat, Sep 8, 2018 at 1:40 AM onmstester onmstester wrote: -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens
Thanks Jeff, You mean that with RF=2, num_tokens = 256 and having less than 256 nodes i should not worry about data distribution? Sent using Zoho Mail On Sat, 08 Sep 2018 21:30:28 +0430 Jeff Jirsa wrote Virtual nodes accomplish two primary goals 1) it makes it easier to gradually add/remove capacity to your cluster by distributing the new host capacity around the ring in smaller increments 2) it increases the number of sources for streaming, which speeds up bootstrap and decommission Whether or not either of these actually is true depends on a number of factors, like your cluster size (for #1) and your replication factor (for #2). If you have 4 hosts and 4 tokens per host and add a 5th host, you’ll probably add a neighbor near each existing host (#1) and stream from every other host (#2), so that’s great. If you have 20 hosts and add a new host with 4 tokens, most of your existing ranges won’t change at all - you’re nominally adding 5% of your cluster capacity but you won’t see a 5% improvement because you don’t have enough tokens to move 5% of your ranges. If you had 32 tokens, you’d probably actually see that 5% improvement, because you’d likely add a new range near each of the existing ranges. Going down to 1 token would mean you’d probably need to manually move tokens after each bootstrap to rebalance, which is fine, it just takes more operator awareness. I don’t know how DSE calculates which replication factor to use for their token allocation logic, maybe they guess or take the highest or something. Cassandra doesn’t - we require you to be explicit, but we could probably do better here.
node replacement failed
Hi, Cluster Spec: 30 nodes RF = 2 NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i lost all disks of cassandar-data on one of my racks, after replacing the disks, tried to replace the nodes with same ip using this: https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html starting the to-be-replace-node fails with: java.lang.IllegalStateException: unable to find sufficient sources for streaming range in keyspace system_traces the problem is that i did not changed default replication config for System keyspaces, but Now when i altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 but then running nodetool repair failed: Endpoint not alive /IP of dead node that i'm trying to replace. What should i do now? Can i just remove previous nodes, change dead nodes IPs and re-join them to cluster? Sent using Zoho Mail
Re: node replacement failed
Any idea? Sent using Zoho Mail On Sun, 09 Sep 2018 11:23:17 +0430 onmstester onmstester wrote Hi, Cluster Spec: 30 nodes RF = 2 NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i lost all disks of cassandar-data on one of my racks, after replacing the disks, tried to replace the nodes with same ip using this: https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html starting the to-be-replace-node fails with: java.lang.IllegalStateException: unable to find sufficient sources for streaming range in keyspace system_traces the problem is that i did not changed default replication config for System keyspaces, but Now when i altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 but then running nodetool repair failed: Endpoint not alive /IP of dead node that i'm trying to replace. What should i do now? Can i just remove previous nodes, change dead nodes IPs and re-join them to cluster? Sent using Zoho Mail
Re: node replacement failed
fully on your side. * You can also 'nodetool removenode' on each of the dead nodes. This will have nodes streaming around and the rack isolation guarantee will no longer be valid. It's hard to reason about what would happen to the data and in terms of streaming. * Alternatively, if you don't have enough space, you can even 'force' the 'nodetool removenode'. See the documentation. Forcing it will prevent streaming and remove the node (token ranges handover, but not the data). If that does not work you can use the 'nodetool assassinate' command as well. When adding nodes back to the broken DC, the first nodes will take probably 100% of the ownership, which is often too much. You can consider adding back all the nodes with 'auto_bootstrap: false' before repairing them once they have their final token ownership, the same ways we do when building a new data center. This option is not really clean, and have some caveats that you need to consider before starting as there are token range movements and nodes available that do not have the data. Yet this should work. I imagine it would work nicely with RF=3 and QUORUM and with RF=2 (if you have 2+ racks), I guess it should work as well but you will have to pick one of availability or consistency while repairing the data. Be aware that read requests hitting these nodes will not find data! Plus, you are using an RF=2. Thus using consistency of 2+ (TWO, QUORUM, ALL), for at least one of reads or writes is needed to preserve consistency while re-adding the nodes in this case. Otherwise, reads will not detect the mismatch with certainty and might show inconsistent data the time for the nodes to be repaired. I must say, that I really prefer odd values for the RF, starting with RF=3. Using RF=2 you will have to pick. Consistency or Availability. With a consistency of ONE everywhere, the service is available, no single point of failure. using anything bigger than this, for writes or read, brings consistency but it creates single points of failures (actually any node becomes a point of failure). RF=3 and QUORUM for both write and reads take the best of the 2 worlds somehow. The tradeoff with RF=3 and quorum reads is the latency increase and the resource usage. Maybe is there a better approach, I am not too sure, but I think I would try option 1 first in any case. It's less destructive, less risky, no token range movements, no empty nodes available. I am not sure about limitation you might face though and that's why I suggest a second option for you to consider if the first is not actionable. Let us know how it goes, C*heers, ------- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le lun. 10 sept. 2018 à 09:09, onmstester onmstester a écrit : Any idea? Sent using Zoho Mail On Sun, 09 Sep 2018 11:23:17 +0430 onmstester onmstester wrote Hi, Cluster Spec: 30 nodes RF = 2 NetworkTopologyStrategy GossipingPropertyFileSnitch + rack aware Suddenly i lost all disks of cassandar-data on one of my racks, after replacing the disks, tried to replace the nodes with same ip using this: https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html starting the to-be-replace-node fails with: java.lang.IllegalStateException: unable to find sufficient sources for streaming range in keyspace system_traces the problem is that i did not changed default replication config for System keyspaces, but Now when i altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 but then running nodetool repair failed: Endpoint not alive /IP of dead node that i'm trying to replace. What should i do now? Can i just remove previous nodes, change dead nodes IPs and re-join them to cluster? Sent using Zoho Mail
Re: node replacement failed
Thanks, I am still thinking about it, but before going deeper, is this still an issue for you at the moment? Yes, It is.
Scale SASI index
By adding new nodes to cluster, should i rebuild SASI indexes on all nodes ?
stuck with num_tokens 256
I noticed that currently there is a discussion in ML with subject: changing default token behavior for 4.0. Any recommendation to guys like me who already have multiple clusters ( > 30 nodes in each cluster) with random partitioner and num_tokens = 256? I should also add some nodes to existing clusters, is it possible with num_tokens = 256? How could we fix this bug (reduce num_tokens in existent clusters)? Cassandra version: 3.11.2 Sent using Zoho Mail
Re: stuck with num_tokens 256
Thanks, Because all my clusters are already balanced, i won't change their config But one more question, should i use num_tokens : 8 (i would follow datastax recommendation) and allocate_tokens_for_local_replication_factor=3 (which is max RF among my keyspaces) for new clusters which i'm going to setup? Is the Allocation algorithm, now recommended algorithm and mature enough to replace the Random algorithm? if its so, it should be the default one at 4.0? On Sat, 22 Sep 2018 13:41:47 +0330 kurt greaves wrote If you have problems with balance you can add new nodes using the algorithm and it'll balance out the cluster. You probably want to stick to 256 tokens though. To reduce your # tokens you'll have to do a DC migration (best way). Spin up a new DC using the algorithm on the nodes and set a lower number of tokens. You'll want to test first but if you create a new keyspace for the new DC prior to creation of the new nodes with the desired RF (ie. a keyspace just in the "new" DC with your RF) then add your nodes using that keyspace for allocation tokens should be distributed evenly amongst that DC, and when migrate you can decommission the old DC and hopefully end up with a balanced cluster. Definitely test beforehand though because that was just me theorising... I'll note though that if your existing clusters don't have any major issues it's probably not worth the migration at this point. On Sat, 22 Sep 2018 at 17:40, onmstester onmstester wrote: I noticed that currently there is a discussion in ML with subject: changing default token behavior for 4.0. Any recommendation to guys like me who already have multiple clusters ( > 30 nodes in each cluster) with random partitioner and num_tokens = 256? I should also add some nodes to existing clusters, is it possible with num_tokens = 256? How could we fix this bug (reduce num_tokens in existent clusters)? Cassandra version: 3.11.2 Sent using Zoho Mail
Re: stuck with num_tokens 256
If you have problems with balance you can add new nodes using the algorithm and it'll balance out the cluster. You probably want to stick to 256 tokens though. I read somewhere (don't remember the ref) that all nodes of the cluster should use the same algorithm, so if my cluster suffer from imbalanced nodes using random algorithm i can not add new nodes that are using Allocation algorithm. isn't that correct?
Re: node replacement failed
I have a cunning plan (Baldrick wise) to solve this problem: stop client application run nodetool flush on all nodes to save memtables to disk stop cassandra on all of the nodes rename original Cassandra data directory to data-old start cassandra on all the nodes to create a fresh cluster including the old dead nodes again create the application related keyspaces in cqlsh and this time set rf=2 on system keyspaces (to never encounter this problem again!) move sstables from data-backup dir to current data dirs and restart cassandra or reload sstables Should this work and solve my problem? On Mon, 10 Sep 2018 17:12:48 +0430 onmstester onmstester wrote Thanks Alain, First here it is more detail about my cluster: 10 racks + 3 nodes on each rack nodetool status: shows 27 nodes UN and 3 nodes all related to single rack as DN version 3.11.2 Option 1: (Change schema and) use replace method (preferred method) * Did you try to have the replace going, without any former repairs, ignoring the fact 'system_traces' might be inconsistent? You probably don't care about this table, so if Cassandra allows it with some of the nodes down, going this way is relatively safe probably. I really do not see what you could lose that matters in this table. * Another option, if the schema first change was accepted, is to make the second one, to drop this table. You can always rebuild it in case you need it I assume. I really love to let the replace going, but it stops with the error: java.lang.IllegalStateException: unable to find sufficient sources for streaming range in keyspace system_traces Also i could delete system_traces which is empty anyway, but there is a system_auth and system_distributed keyspace too and they are not empty, Could i delete them safely too? If i could just somehow skip streaming the system keyspaces from node replace phase, the option 1 would be great. P.S: Its clear to me that i should use at least RF=3 in production, but could not manage to acquire enough resources yet (i hope would be fixed in recent future) Again Thank you for your time Sent using Zoho Mail On Mon, 10 Sep 2018 16:20:10 +0430 Alain RODRIGUEZ wrote Hello, I am sorry it took us (the community) more than a day to answer to this rather critical situation. That being said, my recommendation at this point would be for you to make sure about the impacts of whatever you would try. Working on a broken cluster, as an emergency might lead you to a second mistake, possibly more destructive than the first one. It happened to me and around, for many clusters. Move forward even more carefuly in these situations as a global advice. Suddenly i lost all disks of cassandar-data on one of my racks With RF=2, I guess operations use LOCAL_ONE consistency, thus you should have all the data in the safe rack(s) with your configuration, you probably did not lose anything yet and have the service only using the nodes up, that got the right data. tried to replace the nodes with same ip using this: https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html As a side note, I would recommend you to use 'replace_address_first_boot' instead of 'replace_address'. This does basically the same but will be ignored after the first bootstrap. A detail, but hey, it's there and somewhat safer, I would use this one. java.lang.IllegalStateException: unable to find sufficient sources for streaming range in keyspace system_traces By default, non-user keyspace use 'SimpleStrategy' and a small RF. Ideally, this should be changed in a production cluster, and you're having an example of why. Now when i altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 but then running nodetool repair failed: Endpoint not alive /IP of dead node that i'm trying to replace. Changing the replication strategy you made the dead rack owner of part of the token ranges, thus repairs just can't work as there will always be one of the nodes involved down as the whole rack is down. Repair won't work, but you probably do not need it! 'system_traces' is a temporary / debug table. It's probably empty or with irrelevant data. Here are some thoughts: * It would be awesome at this point for us (and for you if you did not) to see the status of the cluster: ** 'nodetool status' ** 'nodetool describecluster' --> This one will tell if the nodes agree on the schema (nodes up). I have seen schema changes with nodes down inducing some issues. ** Cassandra version ** Number of racks (I assumer #racks >= 2 in this email) Option 1: (Change schema and) use replace method (preferred method) * Did you try to have the replace going, without any former repairs, ignoring the fact 'system_traces' might be inconsistent? You probably don't care about this table, so if Cassand
Re: node replacement failed
Another question, Is there a management tool to do nodetool cleanup one by one (wait until finish of cleaning up one node then start clean up for the next node in cluster)? On Sat, 22 Sep 2018 16:02:17 +0330 onmstester onmstester wrote I have a cunning plan (Baldrick wise) to solve this problem: stop client application run nodetool flush on all nodes to save memtables to disk stop cassandra on all of the nodes rename original Cassandra data directory to data-old start cassandra on all the nodes to create a fresh cluster including the old dead nodes again create the application related keyspaces in cqlsh and this time set rf=2 on system keyspaces (to never encounter this problem again!) move sstables from data-backup dir to current data dirs and restart cassandra or reload sstables Should this work and solve my problem? On Mon, 10 Sep 2018 17:12:48 +0430 onmstester onmstester wrote Thanks Alain, First here it is more detail about my cluster: 10 racks + 3 nodes on each rack nodetool status: shows 27 nodes UN and 3 nodes all related to single rack as DN version 3.11.2 Option 1: (Change schema and) use replace method (preferred method) * Did you try to have the replace going, without any former repairs, ignoring the fact 'system_traces' might be inconsistent? You probably don't care about this table, so if Cassandra allows it with some of the nodes down, going this way is relatively safe probably. I really do not see what you could lose that matters in this table. * Another option, if the schema first change was accepted, is to make the second one, to drop this table. You can always rebuild it in case you need it I assume. I really love to let the replace going, but it stops with the error: java.lang.IllegalStateException: unable to find sufficient sources for streaming range in keyspace system_traces Also i could delete system_traces which is empty anyway, but there is a system_auth and system_distributed keyspace too and they are not empty, Could i delete them safely too? If i could just somehow skip streaming the system keyspaces from node replace phase, the option 1 would be great. P.S: Its clear to me that i should use at least RF=3 in production, but could not manage to acquire enough resources yet (i hope would be fixed in recent future) Again Thank you for your time Sent using Zoho Mail On Mon, 10 Sep 2018 16:20:10 +0430 Alain RODRIGUEZ wrote Hello, I am sorry it took us (the community) more than a day to answer to this rather critical situation. That being said, my recommendation at this point would be for you to make sure about the impacts of whatever you would try. Working on a broken cluster, as an emergency might lead you to a second mistake, possibly more destructive than the first one. It happened to me and around, for many clusters. Move forward even more carefuly in these situations as a global advice. Suddenly i lost all disks of cassandar-data on one of my racks With RF=2, I guess operations use LOCAL_ONE consistency, thus you should have all the data in the safe rack(s) with your configuration, you probably did not lose anything yet and have the service only using the nodes up, that got the right data. tried to replace the nodes with same ip using this: https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html As a side note, I would recommend you to use 'replace_address_first_boot' instead of 'replace_address'. This does basically the same but will be ignored after the first bootstrap. A detail, but hey, it's there and somewhat safer, I would use this one. java.lang.IllegalStateException: unable to find sufficient sources for streaming range in keyspace system_traces By default, non-user keyspace use 'SimpleStrategy' and a small RF. Ideally, this should be changed in a production cluster, and you're having an example of why. Now when i altered the system_traces keyspace startegy to NetworkTopologyStrategy and RF=2 but then running nodetool repair failed: Endpoint not alive /IP of dead node that i'm trying to replace. Changing the replication strategy you made the dead rack owner of part of the token ranges, thus repairs just can't work as there will always be one of the nodes involved down as the whole rack is down. Repair won't work, but you probably do not need it! 'system_traces' is a temporary / debug table. It's probably empty or with irrelevant data. Here are some thoughts: * It would be awesome at this point for us (and for you if you did not) to see the status of the cluster: ** 'nodetool status' ** 'nodetool describecluster' --> This one will tell if the nodes agree on the schema (nodes up). I have seen schema changes with nodes down inducing some issues. ** Cassandra version ** Number of racks (I assumer #racks >= 2 in this email) Option 1: (Change
High CPU usage on writer application
Hi, My app writes 100K rows per seconds to a C* cluster (including 30 nodes and using version 3.11.2). There are 20 threads, each writing 10K (list size in below code is 100K) statements using async API: for (Statement s:list) { ResultSetFuture future = session.executeAsync(s); tasks.add(future); if (tasks.size() < 1) continue; for (ResultSetFuture t:tasks) t.getUninterruptibly(1, TimeUnit.MILLISECONDS); tasks.clear(); } if (tasks.size() != 0) { for (ResultSetFuture t:tasks) t.getUninterruptibly(1, TimeUnit.MILLISECONDS); } CPU usage for my loader application is > 80% on a Xeon 20 core, using sample on jvisualvm find out these at top by percentage of all CPU time: io.netty.channel.epoll.Native.epollWait0 40% shade.com.datastax.spark.connecto.google.common.util.concurrent.AbstractFuture$Sync.get() 10% com.datastax.driver.core.RequestHanlder.init 10% It seems like that, it checks for finishing all tasks every some nano seconds. Is there any workaround to decrease CPU usage of my application, which currently is the bottleneck? Sent using Zoho Mail
how to configure the Token Allocation Algorithm
Since i failed to find a document on how to configure and use the Token Allocation Algorithm (to replace the random Algorithm), just wanted to be sure about the procedure i've done: 1. Using Apache Cassandra 3.11.2 2. Configured one of seed nodes with num_tokens=8 and started it. 3. Using Cqlsh created keyspace test with NetworkTopologyStrategy and RF=3. 4. Stopped the seed node. 5. add this line to cassandra.yaml of all nodes (all have num_tokens=8) and started the cluster: allocate_tokens_for_keyspace=test My cluster Size won't go beyond 150 nodes, should i still use The Allocation Algorithm instead of random with 256 tokens (performance wise or load-balance wise)? Is the Allocation Algorithm, widely used and tested with Community and can we migrate all clusters with any size to use this Algorithm Safely? Out of Curiosity, i wonder how people (i.e, in Apple) config and maintain token management of clusters with thousands of nodes? Sent using Zoho Mail
Fwd: Re: how to configure the Token Allocation Algorithm
Thanks Alain, What if instead of running that python and having one node with non-vnode config, i remove the first seed node and re-add it after cluster was fully up ? so the token ranges of first seed node would also be assigned by Allocation Alg Forwarded message From : Alain RODRIGUEZ To : "user cassandra.apache.org" Date : Mon, 01 Oct 2018 13:14:21 +0330 Subject : Re: how to configure the Token Allocation Algorithm Forwarded message Hello, Your process looks good to me :). Still a couple of comments to make it more efficient (hopefully). - Improving step 2: I believe you can actually get a slightly better distribution picking the tokens for the (first) seed node. This is to prevent the node from randomly calculating its token ranges. You can calculate the token ranges using the following python code: $ python # Start the python shell [...] >>> number_of_tokens = 8 >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] ['-9223372036854775808', '-6917529027641081856', '-4611686018427387904', '-2305843009213693952', '0', '2305843009213693952', '4611686018427387904', '6917529027641081856'] Set the 'initial_token' with the above list (coma separated list) and the number of vnodes to 'num_tokens: 8'. This technique proved to be way more efficient (especially for low token numbers / small number of nodes). Luckily it's also easy to test.
Fwd: Re: Re: how to configure the Token Allocation Algorithm
Thanks Alex, You are right, that would be a mistake. Sent using Zoho Mail Forwarded message From : Oleksandr Shulgin To : "User" Date : Mon, 01 Oct 2018 13:53:37 +0330 Subject : Re: Re: how to configure the Token Allocation Algorithm Forwarded message On Mon, Oct 1, 2018 at 12:18 PM onmstester onmstester wrote: What if instead of running that python and having one node with non-vnode config, i remove the first seed node and re-add it after cluster was fully up ? so the token ranges of first seed node would also be assigned by Allocation Alg I think this is tricky because the random allocation of the very first tokens from the first seed affects the choice of tokens made by the algorithm on the rest of the nodes: it basically tries to divide the token ranges in more or less equal parts. If your very first 8 tokens resulted in really bad balance, you are not going to remove that imbalance by removing the node, it would still have the lasting effect on the rest of your cluster. -- Alex
Re: Re: Re: how to configure the Token Allocation Algorithm
Sent using Zoho Mail On Mon, 01 Oct 2018 18:36:03 +0330 Alain RODRIGUEZ wrote Hello again :), I thought a little bit more about this question, and I was actually wondering if something like this would work: Imagine 3 node cluster, and create them using: For the 3 nodes: `num_token: 4` Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2, 4611686018427387901` Node 2: `intial_token: -7686143364045646507, -3074457345618258604, 1537228672809129299, 6148914691236517202` Node 3: `intial_token: -6148914691236517206, -1537228672809129303, 3074457345618258600, 7686143364045646503` If you know the initial size of your cluster, you can calculate the total number of tokens: number of nodes * vnodes and use the formula/python code above to get the tokens. Then use the first token for the first node, move to the second node, use the second token and repeat. In my case there is a total of 12 tokens (3 nodes, 4 tokens each) ``` >>> number_of_tokens = 12 >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206', '-4611686018427387905', '-3074457345618258604', '-1537228672809129303', '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901', '6148914691236517202', '7686143364045646503'] ``` Using manual initial_token (your idea), how could i add a new node to a long running cluster (the procedure)?
High CPU usage on some of the nodes due to message coalesce
3 nodes in my cluster have 100% cpu usage and most of it is used by org.apache.cassandra.util.coalesceInternal and SepWorker.run? The most active threads are the messaging-service-incomming. Other nodes are normal, having 30 nodes, using Rack Aware strategy. with 10 rack each having 3 nodes. The problematic nodes are configured for one rack, on normal write load, system.log reports too many hint message dropped (cross node). also there are alot of parNewGc with about 700-1000ms and commit log isolated disk, is utilized about 80-90%. on startup of these 3 nodes, there are alot of "updateing topology" logs (1000s of them pending). Using iperf, i'm sure that network is OK checking NTPs and mutations on each node, load is balanced among the nodes. using apache cassandra 3.11.2 I can not not figure out the root cause of the problem, although there are some obvious symptoms. Best Regards Sent using Zoho Mail
How to validate if network infrastructure is efficient for Cassandra cluster?
Currently, before launching the production cluster, i run 'iperf -s' on half of the cluster and then run 'iperf -c $nextIP' on the other half using parallel ssh, So simultaneously all cluster's nodes are connecting together (paired) and then examining the result of iperfs, doing the math that if the Switches could keep up with Cassandra load or not? I'm afraid that i do not determine usual packet size of Cassandra and in real scenarios each node is streaming with many other nodes Any better idea on examining network before running a cluster? Sent using Zoho Mail
Fwd: Re: High CPU usage on some of the nodes due to message coalesce
I don't think that root cause is related to Cassandra config, because the nodes are homogeneous and config for all of them are the same (16GB heap with default gc), also mutation counter and Native Transport counter is the same in all of the nodes, but only these 3 nodes experiencing 100% CPU usage (others have less than 20% CPU usage) I even decommissioned these 3 nodes from cluster and re-add them, but still the same The cluster is OK without these 3 nodes (in a state that these nodes are decommissioned) Sent using Zoho Mail Forwarded message From : Chris Lohfink To : Date : Sat, 20 Oct 2018 23:24:03 +0330 Subject : Re: High CPU usage on some of the nodes due to message coalesce Forwarded message 1s young gcs are horrible and likely cause of some of your bad metrics. How large are your mutations/query results and what gc/heap settings are you using? You can use https://github.com/aragozin/jvm-tools to see the threads generating allocation pressure and using the cpu (ttop) and what garbage is being created (hh --dead-young). Just a shot in the dark, I would guess you have rather large mutations putting pressure on commitlog and heap. G1 with a larger heap might help in that scenario to reduce fragmentation and adjust its eden and survivor regions to the allocation rate better (but give it a bigger reserve space) but theres limits to what can help if you cant change your workload. Without more info on schema etc its hard to tell but maybe that can help give you some ideas on places to look. It could just as likely be repair coordination, wide partition reads, or compactions so need to look more at what within the app is causing the pressure to know if its possible to improve with settings or if the load your application is producing exceeds what your cluster can handle (needs more nodes). Chris On Oct 20, 2018, at 5:18 AM, onmstester onmstester wrote: 3 nodes in my cluster have 100% cpu usage and most of it is used by org.apache.cassandra.util.coalesceInternal and SepWorker.run? The most active threads are the messaging-service-incomming. Other nodes are normal, having 30 nodes, using Rack Aware strategy. with 10 rack each having 3 nodes. The problematic nodes are configured for one rack, on normal write load, system.log reports too many hint message dropped (cross node). also there are alot of parNewGc with about 700-1000ms and commit log isolated disk, is utilized about 80-90%. on startup of these 3 nodes, there are alot of "updateing topology" logs (1000s of them pending). Using iperf, i'm sure that network is OK checking NTPs and mutations on each node, load is balanced among the nodes. using apache cassandra 3.11.2 I can not not figure out the root cause of the problem, although there are some obvious symptoms. Best Regards Sent using Zoho Mail
Re: Re: High CPU usage on some of the nodes due to message coalesce
What takes the most CPU? System or User? most of it is used by org.apache.cassandra.util.coalesceInternal and SepWorker.run Did you try removing a problematic node and installing a brand new one (instead of re-adding)? I did not install a new node, but did remove the problematic node and CPU load in all the cluster became normal again When you decommissioned these nodes, did the high CPU "move" to other nodes (probably data model/query issues) or was it completely gone? (server issues) it was completely gone
Fwd: Re: Re: High CPU usage on some of the nodes due to message coalesce
Any cron or other scheduler running on those nodes? no Lots of Java processes running simultaneously? no, just Apache Cassandra Heavy repair continuously running? none Lots of pending compactions? none, the cpu goes to 100% on first seconds of insert (write load) so no memtable flushed yet, Is the number of CPU cores the same in all the nodes? yes, 12 Did you try rebooting one of the nodes? Yes, cold rebooted all of them once, no luck! Thanks for your time
Fwd: A quick question on unlogged batch
Read this: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/batch_r.html Please use batch (any type of batch) for statements that only concerns a single partition, otherwise it cause a lot of performance degradation on your cluster and after a while throughput would be alot less than parallel single statements with executeAsync. Sent using Zoho Mail Forwarded message From : wxn...@zjqunshuo.com To : "user" Date : Thu, 01 Nov 2018 10:48:33 +0330 Subject : A quick question on unlogged batch Forwarded message Hi All, What's the difference between logged batch and unlogged batch? I'm asking this question it's because I'm seeing the below WARNINGs after a new app started writting to the cluster. WARNING in system.log: Unlogged batch covering 135 partitions detected against table [cargts.eventdata]. You should use a logged batch for atomicity, or asynchronous writes for performance Best regards, -Simon
Fwd: Re: How to set num tokens on live node
IMHO, the best option with two datacenters is to config replication strategy to stream data from dc with wrong num_token to correct one, and then a repair on each node would move your data to the other dc Sent using Zoho Mail Forwarded message From : Goutham reddy To : Date : Fri, 02 Nov 2018 10:46:10 +0330 Subject : Re: How to set num tokens on live node Forwarded message Elliott, Thanks Elliott, how about if we have two Datacenters, any comments? Thanks and Regards, Goutham. On Thu, Nov 1, 2018 at 5:40 PM Elliott Sims wrote: -- Regards Goutham Reddy As far as I know, it's not possible to change it live. You have to create a new "datacenter" with new hosts using the new num_tokens value, then switch everything to use the new DC and tear down the old. On Thu, Nov 1, 2018 at 6:16 PM Goutham reddy wrote: Hi team, Can someone help me out I don’t find anywhere how to change the numtokens on a running nodes. Any help is appreciated Thanks and Regards, Goutham. -- Regards Goutham Reddy
Fwd: Re: Re: How to set num tokens on live node
I think that is not possible. If currently both DC's are in use, you should remove one of them (gently, by changing replication config), then change num_tokens in removed dc, add it again with changing replication config, and finally do the same for the other dc. P.S A while ago, there was a thread in this forum, discussing that num_tokens 256 is not a good default in Cassandra and should use a smaller number like 4,8 or 16, i recommend you to read it through, maybe the whole migration (from 8 to 256) became unnecessary Sent using Zoho Mail Forwarded message From : Goutham reddy To : Date : Fri, 02 Nov 2018 11:52:53 +0330 Subject : Re: Re: How to set num tokens on live node Forwarded message Onmstester, Thanks for the reply, but for both the DC’s I need to change my num_token value from 8 to 256. So that is the challenge I am facing. Any comments. Thanks and Regards, Goutham On Fri, Nov 2, 2018 at 1:08 AM onmstester onmstester wrote: -- Regards Goutham Reddy IMHO, the best option with two datacenters is to config replication strategy to stream data from dc with wrong num_token to correct one, and then a repair on each node would move your data to the other dc Sent using Zoho Mail Forwarded message From : Goutham reddy To : Date : Fri, 02 Nov 2018 10:46:10 +0330 Subject : Re: How to set num tokens on live node Forwarded message Elliott, Thanks Elliott, how about if we have two Datacenters, any comments? Thanks and Regards, Goutham. On Thu, Nov 1, 2018 at 5:40 PM Elliott Sims wrote: -- Regards Goutham Reddy As far as I know, it's not possible to change it live. You have to create a new "datacenter" with new hosts using the new num_tokens value, then switch everything to use the new DC and tear down the old. On Thu, Nov 1, 2018 at 6:16 PM Goutham reddy wrote: Hi team, Can someone help me out I don’t find anywhere how to change the numtokens on a running nodes. Any help is appreciated Thanks and Regards, Goutham. -- Regards Goutham Reddy
Fwd: Re: A quick question on unlogged batch
unlogged batch meaningfully outperforms parallel execution of individual statements, especially at scale, and creates lower memory pressure on both the clients and cluster. They do outperform parallel individuals, but in cost of higher pressure on coordinators which leads to more blocked Natives and dropped mutations, Actually i think that 10-20% better write performance + 20-30% less CPU usage on client machines (we don't care about client machines in compare with cluster machines) which is outcome of batch statements with multiple partitions on each batch, does not worth it, because less-busy cluster nodes are needed to answer read queries, compactions, repairs, etc The biggest major downside to unlogged batches are that the unit of retry during failure is the entire batch. So if you use a retry policy, write timeouts will tip over your cluster a lot faster than individual statements. Bounding your batch sizes helps mitigate this risk. I assume that in most scenarios, the client machines are in the same network with Cassandra cluster, so is it still faster? Thank you all. Now I understand whether to use batch or asynchronous writes really depends on use case. Till now batch writes work for me in a 8 nodes cluster with over 500 million requests per day. Did you compare the cluster performance including blocked natives, dropped mutations, 95 percentiles, cluster CPU usage, etc in two scenarios (batch vs single)? Although 500M per day is not so much for 8 nodes cluster (if the node spec is compliant with datastax recommendations) and async single statements could handle it (just demands high CPU on client machine), the impact of such things (non compliant batch statements annoying the cluster) would show up after some weeks, when suddenly a lot of cluster tasks need to be run simultaneously; one or two big compactions are running on most of the nodes, some hinted hand offs and cluster could not keep up and starts to became slower and slower. The way to prevent it sooner, would be keep the error counters as low as possible, things like blocked NTPs, dropped, errors, hinted hinted hand-offs, latencies, etc.
Multiple cluster for a single application
Hi, One of my applications requires to create a cluster with more than 100 nodes, I've read documents recommended to use clusters with less than 50 or 100 nodes (Netflix got hundreds of clusters with less 100 nodes on each). Is it a good idea to use multiple clusters for a single application, just to decrease maintenance problems and system complexity/performance? If So, which one of below policies is more suitable to distribute data among clusters and Why? 1. each cluster' would be responsible for a specific partial set of tables only (table sizes are almost equal so easy calculations here) for example inserts to table X would go to cluster Y 2. shard data at loader level by some business logic grouping of data, for example all rows with some column starting with X would go to cluster Y I would appreciate sharing your experiences working with big clusters, problem encountered and solutions. Thanks in Advance Sent using Zoho Mail
Fwd: Re: Multiple cluster for a single application
Thank you all, Actually, "the documents" i mentioned in my question, was a talk in youtube seen long time ago and could not find it. Also noticing that a lot of companies like Netflix built hundreds of Clusters each having 10s of nodes and saying that its much stable, i just concluded that big cluster is not recommended. I see some of the reasons in your answers: the problem with dynamic snitch and probability of node failures that simply increases with more nodes in cluster which even could cause cluster outage.
Fwd: RE : issue while connecting to apache-cassandra-3.11.1 hosted on a remote VM.
Also set rpc_address to your remote ip address and restart cassandra. Run nodetool status on Cassandra node to be sure that its running properly. The port you should look for and connect to is 9042, 7199 is the JMX port Sent using Zoho Mail Forwarded message From : Gaurav Kumar To : "d...@cassandra.apache.org" Date : Fri, 16 Nov 2018 13:13:56 +0330 Subject : RE : issue while connecting to apache-cassandra-3.11.1 hosted on a remote VM. Forwarded message Hi, Whenever I am trying to connect to apache-cassandra-3.11.1, I am getting exception Unexpected client failure - null The detailed explanation : JMXConnectionPool.getJMXConnection - Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host:; nested exception is: java.net.ConnectException: Connection refused (Connection refused)] I tried following workaround 1) Adding IP Address of the machine (where server is installed) in /etc/hosts (For Linux OS) 2) Adding IP Address of the machine (where server is installed) in cassandra.yaml (entry as seed and listen addresses) 3)Also check for proper variables are set for java and Cassandra. However, while executing the "netstat -an | grep 7199" command ,I am still getting the 127.0.0.1 as the hosted ip. Can you suggest me any change which needs to be done in the configuration or the connection mechanism of apache-cassandra-3.11.1 ,because my application is working fine for apache-cassandra-3.0.15. ? Kindly, revert ASAP. Thanks and Regards, Gaurav Kumar- Software Engineer
How to gracefully decommission a highly loaded node?
One node suddenly uses 100% CPU, i suspect hardware problems and do not have time to trace that, so decided to just remove the node from the cluster, but although the node state changed to UL, but no sign of Leaving: the node is still compacting and flushing memtables, writing mutations and CPU is 100% for hours since. Is there any means to force a Cassandra Node to just decommission and stop doing normal things? Due to W.CL=ONE, i can not use removenode and shutdown the node Best Regards Sent using Zoho Mail
Fwd: Re: How to gracefully decommission a highly loaded node?
After a long time stuck in LEAVING, and "not doing any streams", i killed Cassandra process and restart it, then again ran nodetool decommission (Datastax recipe for stuck decommission), now it says, LEAVING, "unbootstrap $(the node id)" What's going on? Should i forget about decommission and just remove the node? There is an issue to make decommission resumable: https://issues.apache.org/jira/browse/CASSANDRA-12008 but i couldn't figure out how this suppose to work? I was expecting that after restarting stucked-decommission-cassandra, it resume the decommissioning process, but the node became UN after restart. Sent using Zoho Mail Forwarded message From : Simon Fontana Oscarsson To : "user@cassandra.apache.org" Date : Tue, 04 Dec 2018 15:20:15 +0330 Subject : Re: How to gracefully decommission a highly loaded node? Forwarded message Hi, If it already uses 100 % CPU I have a hard time seeing it being able to do a decomission while serving requests. If you have a lot of free space I would first try nodetool disableautocompaction. If you don't see any progress in nodetool netstats you can also disablebinary, disablethrift and disablehandoff to stop serving client requests. -- SIMON FONTANA OSCARSSON Software Developer Ericsson Ölandsgatan 1 37133 Karlskrona, Sweden simon.fontana.oscars...@ericsson.com www.ericsson.com On tis, 2018-12-04 at 14:21 +0330, onmstester onmstester wrote: One node suddenly uses 100% CPU, i suspect hardware problems and do not have time to trace that, so decided to just remove the node from the cluster, but although the node state changed to UL, but no sign of Leaving: the node is still compacting and flushing memtables, writing mutations and CPU is 100% for hours since. Is there any means to force a Cassandra Node to just decommission and stop doing normal things? Due to W.CL=ONE, i can not use removenode and shutdown the node Best Regards Sent using Zoho Mail smime.p7s Description: Binary data - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Fwd: Re: How to gracefully decommission a highly loaded node?
After few hours, i just removed the node. done another node decommissioned, which finished successfully (the writer app was down, so no pressure on the cluster) Started another node decommission (third), Since didn't have time to wait for decommissioning to finish, i started the writer Application, when almost most of decommissioning-node's streaming was done and only a few GBs to two other nodes remained to be streamed. After 12 Hours i checked the decommissioning node and netstats says: LEAVING, Restore Replica Count! So just ran removednode on this one too. Is there something wrong with decommissioning while someones writing to Cluster? Using Apache Cassandra 3.11.2 Sent using Zoho Mail Forwarded message ==== From : onmstester onmstester To : "user" Date : Wed, 05 Dec 2018 09:00:34 +0330 Subject : Fwd: Re: How to gracefully decommission a highly loaded node? Forwarded message After a long time stuck in LEAVING, and "not doing any streams", i killed Cassandra process and restart it, then again ran nodetool decommission (Datastax recipe for stuck decommission), now it says, LEAVING, "unbootstrap $(the node id)" What's going on? Should i forget about decommission and just remove the node? There is an issue to make decommission resumable: https://issues.apache.org/jira/browse/CASSANDRA-12008 but i couldn't figure out how this suppose to work? I was expecting that after restarting stucked-decommission-cassandra, it resume the decommissioning process, but the node became UN after restart. Sent using Zoho Mail Forwarded message From : Simon Fontana Oscarsson To : "user@cassandra.apache.org" Date : Tue, 04 Dec 2018 15:20:15 +0330 Subject : Re: How to gracefully decommission a highly loaded node? Forwarded message - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org Hi, If it already uses 100 % CPU I have a hard time seeing it being able to do a decomission while serving requests. If you have a lot of free space I would first try nodetool disableautocompaction. If you don't see any progress in nodetool netstats you can also disablebinary, disablethrift and disablehandoff to stop serving client requests. -- SIMON FONTANA OSCARSSON Software Developer Ericsson Ölandsgatan 1 37133 Karlskrona, Sweden simon.fontana.oscars...@ericsson.com www.ericsson.com On tis, 2018-12-04 at 14:21 +0330, onmstester onmstester wrote: One node suddenly uses 100% CPU, i suspect hardware problems and do not have time to trace that, so decided to just remove the node from the cluster, but although the node state changed to UL, but no sign of Leaving: the node is still compacting and flushing memtables, writing mutations and CPU is 100% for hours since. Is there any means to force a Cassandra Node to just decommission and stop doing normal things? Due to W.CL=ONE, i can not use removenode and shutdown the node Best Regards Sent using Zoho Mail smime.p7s Description: Binary data - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Fwd: Cassandra does launch since computer was accidentally unplugged
Delete the file: C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542650688953.log and restart Cassandra. Its possible that you lose a bit of data that just existed on this log (not matter if you have replica or could re-insert data again) Sent using Zoho Mail Forwarded message From : Will Mackle To : Date : Sat, 08 Dec 2018 11:56:00 +0330 Subject : Cassandra does launch since computer was accidentally unplugged Forwarded message Hello, I am a novice cassandra user and am looking for some insight with respect to my circumstance: The computer I was using to run cassandra was accidentally unplugged by my friend, since this event, I have not been able to successfully relaunch cassandra. I have included a chunk from the log file below. It looks to me like the corrupt log files are the issue, but I would like to confirm that that error is not dependent on the earlier JMX error. Does this JMX error impact cassandra's launch if cassandra is only being accessed by the computer that cassandra is running on? I have the port assinged in cassandra-env.sh, so it is really confusing to me why this error occurs. With respect to the log file corruption, does there exist the capacity to recover/repair the issue? I'm assuming that if I delete the log file to launch cassandra that I will lose data.. am I correct in this assumption? I left some lines out of the log file that were not errors or warnings, if it is important for me to include them, I can do so, I'm simply not sure if any info from the log file is a security risk for me to share. INFO 14:50:01 JVM Arguments: [-ea, -javaagent:C:\Program Files\DataStax-DDC\apache-cassandra\lib\jamm-0.3.0.jar, -Xms1G, -Xmx1G, -XX:+HeapDumpOnOutOfMemoryError, -XX:+UseParNewGC, -XX:+UseConcMarkSweepGC, -XX:+CMSParallelRemarkEnabled, -XX:SurvivorRatio=8, -XX:MaxTenuringThreshold=1, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Dcom.sun.management.jmxremote.port=7199, -Dcom.sun.management.jmxremote.ssl=false, -Dcom.sun.management.jmxremote.authenticate=false, -Dlog4j.configuration=log4j-server.properties, -Dlog4j.defaultInitOverride=true, -DCassandra] WARN 14:50:01 JNA link failure, one or more native method will be unavailable. WARN 14:50:01 JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. ERROR 14:50:01 cassandra.jmx.local.port missing from cassandra-env.sh, unable to start local JMX service. WARN 14:50:01 Use of com.sun.management.jmxremote.port at startup is deprecated. Please use cassandra.jmx.remote.port instead. --- INFO 14:50:05 Not submitting build tasks for views in keyspace system as storage service is not initialized WARN 14:50:05 JMX settings in cassandra-env.sh have been bypassed as the JMX connector server is already initialized. Please refer to cassandra-env.(sh|ps1) for JMX configuration info INFO 14:50:07 Populating token metadata from system tables --- INFO 14:50:08 Completed loading (15 ms; 26 keys) KeyCache cache INFO 14:50:08 Replaying C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542650688952.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542650688953.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542987010987.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542987613467.log, C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542990216101.log ERROR 14:50:09 Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Could not read commit log descriptor in file C:\Program Files\DataStax-DDC\data\commitlog\CommitLog-6-1542650688953.log at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:155) [apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:85) [apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:135) [apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:187) [apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:167) [apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:323) [apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) [apache-cassandra-3.9.0.jar:3.9.0] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730) [apache-cassandra-3.9.0.jar:3.9.0] Any help/insight is much appreciated, Thanks
slow commitlog sync
Hi, I'm seeing a lot of logs like this in all of my nodes (every 5 minutes): WARN [PERIODIC-COMMIT-LOG-SYNCER] 2018-05-23 08:59:19,075 NoSpamLogger.java:94 - Out of 50 commit log syncs over the past 300s with average duration of 300.00ms, 30 have exceeded the configured commit interval by an average of 400.00ms Should i worry about it? if not, which parameter to tune? Using C* 3.11.2 and separate disk for commitlog (7200 rpm) Best Regards Sent using Zoho Mail
Fwd: Question about allocate_tokens_for_keyspace
You could only have one keyspace for the value of allocate_tokens_for_keyspace to specify a keyspace from which the algorithm can find the replication to optimize for. So as far as your keyspaces are using similar replication strategies and replication factor you should not worry about this. for more detail read this doc: https://www.datastax.com/dev/blog/token-allocation-algorithm Sent using https://www.zoho.com/mail/ Forwarded message >From : Ahmed Eljami To : Date : Mon, 28 Jan 2019 12:14:24 +0330 Subject : Question about allocate_tokens_for_keyspace Forwarded message Hi Folks, I'm about to configure a new cluster with num_token = 32 and using the new token allocation. For the first keyspace, I understood that it will be used to start my cluster: allocate_tokens_for_keyspace = my_first_ks. My question is how about the rest of keyspaces ? they will take the same conf than the first ks or I have to add them to the cassandra.yaml and restart each time my cluster? Thanks
forgot to run nodetool cleanup
Hi, I should have run cleanup after adding a few nodes to my cluster, about 2 months ago, the ttl is 6 month, What happens now? Should i worry about any catastrophics? Should i run the cleanup now? Thanks in advance Sent using https://www.zoho.com/mail/