Re: No node was available to execute query error

2021-03-17 Thread Kane Wilson
2021-03-16’ > > … > > > > Also, there are implementations of Spark that will create the proper, > single partition queries for large data sets. DataStax Analytics is one > example (spark runs on each node). > > > > > > Sean Durity – Staff Systems Engineer, Cassand

Re: No node was available to execute query error

2021-03-17 Thread Joe Obernberger
user@cassandra.apache.org *Subject:* [EXTERNAL] Re: No node was available to execute query error � There are different approaches, depending on the application's logic. Roughly speaking, there's two distinct scenarios: 1. Your application knows all the partition keys of the requi

RE: No node was available to execute query error

2021-03-16 Thread Durity, Sean R
, single partition queries for large data sets. DataStax Analytics is one example (spark runs on each node). Sean Durity – Staff Systems Engineer, Cassandra From: Bowen Song Sent: Monday, March 15, 2021 5:27 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: No node was available to execute

Re: No node was available to execute query error

2021-03-15 Thread Bowen Song
There are different approaches, depending on the application's logic. Roughly speaking, there's two distinct scenarios: 1. Your application knows all the partition keys of the required data in advance, either by reading them from another data source (e.g.: another Cassandra table, other da

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
Thank you. What is the best way to iterate over a very large number of rows in Cassandra?  I know the datastax driver let's java do blocks of n records, but is that the best way? -joe On 3/15/2021 1:42 PM, Bowen Song wrote: I personally try to avoid using secondary indexes, especially in l

Re: No node was available to execute query error

2021-03-15 Thread Bowen Song
I personally try to avoid using secondary indexes, especially in large clusters. SI is not scalable, because a SI query doesn't have the partition key information, Cassandra must send it to nearly all nodes in a DC to get the answer. Thus, the more nodes you have in a cluster, the slower and

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
Great stuff - thank you.  I've spent the morning here redesigning with smaller partitions. If I have a large number of unique IDs that I want to regularly 'do something' with, would it make sense to have a table where a UUID is the partition key, and create a secondary index on a field (call

Re: No node was available to execute query error

2021-03-15 Thread Bowen Song
To be clear, this CREATE TABLE ... PRIMARY KEY (k1, k2); is the same as: CREATE TABLE ... PRIMARY KEY ((k1), k2); but they are NOT the same as: CREATE TABLE ... PRIMARY KEY ((k1, k2)); The first two statements creates a table with a partition key k1 and a clustering key k2. The 3rd

Re: No node was available to execute query error

2021-03-15 Thread Joe Obernberger
Thank you Bowen - I'm redesigning the tables now.  When you give Cassandra two parts to the primary key like create table xyz (uuid text, source text, primary key (source, uuid)); How is the second part of the primary key used to determine partition size? -Joe On 3/12/2021 5:27 PM, Bowen Song

Re: No node was available to execute query error

2021-03-12 Thread Bowen Song
The partition size min/avg/max of 8409008/15096925/25109160 bytes looks fine for the table fieldcounts, but the number of partitions is a bit worrying. Only 3 partitions? Are you expecting the partition size (instead of number of partitions) to grow in the future? That can lead to a lots of hea

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger
Thank you very much for helping me out on this!  The table fieldcounts is currently pretty small - 6.4 million rows. cfstats are: Total number of tables: 81 Keyspace : doc         Read Count: 3713134         Read Latency: 0.2664131157130338 ms         Writ

Re: No node was available to execute query error

2021-03-12 Thread Bowen Song
The highlight is "millions rows in a **single** query". Fetching that amount of data in a single query is bad, because the Java heap memory overhead. You can fetch millions of rows in Cassandra, just make sure you do that over thousands or millions of queries, not one single query. On 12/03/2

Re: No node was available to execute query error

2021-03-12 Thread Bowen Song
Sleep-then-retry works is just another indicator that it's likely a GC pause related issue. I'd recommend you to check your Cassandra servers' GC logs first. Do you know what's the maximum partition size for the doc.fieldcounts table? (Try the "nodetool cfstats doc.fieldcounts" command) I susp

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger
One question on the 'millions rows in a single query'.  How would you process that many rows?  At some point, I'd like to be able to process 10-100 billion rows.  Isn't that something that can be done with Cassandra?  I'm coming from HBase where we'd run map reduce jobs. Thank you. -Joe O

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger
The queries that are failing are: select fieldvalue, count from doc.ordered_fieldcounts where source=? and fieldname=? limit 10 Created with: CREATE TABLE doc.ordered_fieldcounts (     source text,     fieldname text,     count bigint,     fieldvalue text,     PRIMARY KEY ((sour

Re: No node was available to execute query error

2021-03-12 Thread Bowen Song
Millions rows in a single query? That sounds like a bad idea to me. Your "NoNodeAvailableException" could be caused by stop-the-world GC pauses, and the GC pauses are likely caused by the query itself. On 12/03/2021 13:39, Joe Obernberger wrote: Thank you Paul and Erick.  The keyspace is defi

Re: No node was available to execute query error

2021-03-12 Thread Joe Obernberger
Thank you Paul and Erick.  The keyspace is defined like this: CREATE KEYSPACE doc WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true; Would that cause this? The program that is having the problem selects data, calculates stuff, and inserts.Â

Re: No node was available to execute query error

2021-03-12 Thread Paul Chandler
Hi Joe This could also be caused by the replication factor of the keyspace, if you have NetworkTopologyStrategy and it doesn’t list a replication factor for the datacenter datacenter1 then you will get this error message too. Paul > On 12 Mar 2021, at 13:07, Erick Ramirez wrote: > > Does it

Re: No node was available to execute query error

2021-03-12 Thread Erick Ramirez
Does it get returned by the driver every single time? The NoNodeAvailableException gets thrown when (1) all nodes are down, or (2) all the contact points are invalid from the driver's perspective. Is it possible there's no route/connectivity from your app server(s) to the 172.16.x.x network? If yo