2021-03-16’
>
> …
>
>
>
> Also, there are implementations of Spark that will create the proper,
> single partition queries for large data sets. DataStax Analytics is one
> example (spark runs on each node).
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassand
user@cassandra.apache.org
*Subject:* [EXTERNAL] Re: No node was available to execute query error
�
There are different approaches, depending on the application's logic.
Roughly speaking, there's two distinct scenarios:
1. Your application knows all the partition keys of the requi
, single
partition queries for large data sets. DataStax Analytics is one example (spark
runs on each node).
Sean Durity – Staff Systems Engineer, Cassandra
From: Bowen Song
Sent: Monday, March 15, 2021 5:27 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: No node was available to execute
There are different approaches, depending on the application's logic.
Roughly speaking, there's two distinct scenarios:
1. Your application knows all the partition keys of the required data
in advance, either by reading them from another data source (e.g.:
another Cassandra table, other da
Thank you.
What is the best way to iterate over a very large number of rows in
Cassandra? I know the datastax driver let's java do blocks of n
records, but is that the best way?
-joe
On 3/15/2021 1:42 PM, Bowen Song wrote:
I personally try to avoid using secondary indexes, especially in l
I personally try to avoid using secondary indexes, especially in large
clusters.
SI is not scalable, because a SI query doesn't have the partition key
information, Cassandra must send it to nearly all nodes in a DC to get
the answer. Thus, the more nodes you have in a cluster, the slower and
Great stuff - thank you. I've spent the morning here redesigning with
smaller partitions.
If I have a large number of unique IDs that I want to regularly 'do
something' with, would it make sense to have a table where a UUID is the
partition key, and create a secondary index on a field (call
To be clear, this
CREATE TABLE ... PRIMARY KEY (k1, k2);
is the same as:
CREATE TABLE ... PRIMARY KEY ((k1), k2);
but they are NOT the same as:
CREATE TABLE ... PRIMARY KEY ((k1, k2));
The first two statements creates a table with a partition key k1 and a
clustering key k2. The 3rd
Thank you Bowen - I'm redesigning the tables now. When you give
Cassandra two parts to the primary key like
create table xyz (uuid text, source text, primary key (source, uuid));
How is the second part of the primary key used to determine partition size?
-Joe
On 3/12/2021 5:27 PM, Bowen Song
The partition size min/avg/max of 8409008/15096925/25109160 bytes looks
fine for the table fieldcounts, but the number of partitions is a bit
worrying. Only 3 partitions? Are you expecting the partition size
(instead of number of partitions) to grow in the future? That can lead
to a lots of hea
Thank you very much for helping me out on this! The table fieldcounts
is currently pretty small - 6.4 million rows.
cfstats are:
Total number of tables: 81
Keyspace : doc
       Read Count: 3713134
       Read Latency: 0.2664131157130338 ms
       Writ
The highlight is "millions rows in a **single** query". Fetching that
amount of data in a single query is bad, because the Java heap memory
overhead. You can fetch millions of rows in Cassandra, just make sure
you do that over thousands or millions of queries, not one single query.
On 12/03/2
Sleep-then-retry works is just another indicator that it's likely a GC
pause related issue. I'd recommend you to check your Cassandra servers'
GC logs first.
Do you know what's the maximum partition size for the doc.fieldcounts
table? (Try the "nodetool cfstats doc.fieldcounts" command) I susp
One question on the 'millions rows in a single query'. How would you
process that many rows? At some point, I'd like to be able to process
10-100 billion rows. Isn't that something that can be done with
Cassandra? I'm coming from HBase where we'd run map reduce jobs.
Thank you.
-Joe
O
The queries that are failing are:
select fieldvalue, count from doc.ordered_fieldcounts where source=? and
fieldname=? limit 10
Created with:
CREATE TABLE doc.ordered_fieldcounts (
   source text,
   fieldname text,
   count bigint,
   fieldvalue text,
   PRIMARY KEY ((sour
Millions rows in a single query? That sounds like a bad idea to me. Your
"NoNodeAvailableException" could be caused by stop-the-world GC pauses,
and the GC pauses are likely caused by the query itself.
On 12/03/2021 13:39, Joe Obernberger wrote:
Thank you Paul and Erick. The keyspace is defi
Thank you Paul and Erick. The keyspace is defined like this:
CREATE KEYSPACE doc WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '3'}Â AND durable_writes = true;
Would that cause this?
The program that is having the problem selects data, calculates stuff,
and inserts.Â
Hi Joe
This could also be caused by the replication factor of the keyspace, if you
have NetworkTopologyStrategy and it doesn’t list a replication factor for the
datacenter datacenter1 then you will get this error message too.
Paul
> On 12 Mar 2021, at 13:07, Erick Ramirez wrote:
>
> Does it
Does it get returned by the driver every single time? The
NoNodeAvailableException gets thrown when (1) all nodes are down, or (2)
all the contact points are invalid from the driver's perspective.
Is it possible there's no route/connectivity from your app server(s) to the
172.16.x.x network? If yo
19 matches
Mail list logo