Hi,
I have to add my own two cents here as the main thing that keeps me from
really running Cassandra is the amount of pain running it incurs.
Not so much because it's actually painful but because the tools are so
different and the documentation and best practices are scattered across a
dozen outd
Technically no. Cassandra is a NoSQL database. It is a columnar store — and so
it’s not a set of relations that can be arbitrarily queried. The sstable
structure is building for heavy writes and specific partook specific queries.
If you need the ability for arbitrary queries you are using the wr
Node density is active data managed in the cluster divided by the number of
active nodes. Eg. If you you have 500TB or active data under management then
you would need 250-500 nodes to get beast like optimum performance. It also
depends on how much memory is on the boxes and if you are using SSD
Agree with you, Daniel, regarding gaps in documentation.
---
At the same time I disagree with the folks who are complaining in this thread
about some functionality like 'advanced backup' etc is missing out of the box.
We all live in the time where there are literally tons of open-source tools
Hi, I'm wondering if it is possible resp. would it make sense to limit
concurrent streaming when joining a new node to cluster.
I'm currently operating a 15-Node C* Cluster (V 3.11.1) and joining another
Node every day.
The 'nodetool netstats' shows it always streams data from all other nodes.
Ho
Hi Jurgen,
stream_throughput_outbound_megabits_per_sec is the "given total throughput
in Mbps", so it does limit the "concurrent throughput" IMHO, is it not what
you are looking for?
The only limits I can think of are :
- number of connection between every node and the one boostrapping
- number o
Hi Nicolas,
I have seen that ' stream_throughput_outbound_megabits_per_sec', but afaik
this limits what each node will provide at a maximum.
What I'm more concerned of is the vast amount of connections to handle and
the concurrent threads of which at least two get started for every single
streaming
Dear Apache Enthusiast,
(You’re receiving this message because you’re subscribed to a user@ or
dev@ list of one or more Apache Software Foundation projects.)
We’re pleased to announce the upcoming ApacheCon [1] in Montréal,
September 24-27. This event is all about you — the Apache project com
Yes you are right, it limit how much data a node will send while streaming
data (repair, boostrap etc) total to other node, so that is does not affec
this node performance.
Boostraping is initiated by the boostraping node itself, which determine,
based on his token, which nodes to ask data from, t
I think what is really necessary is providing table-level recipes for
storing data. We need a lot of real world examples and the resulting
schema, compaction strategies, and tunings that were performed for them.
Right now I don't see such crucial cookbook data in the project.
AI is a bit ridiculou
Ok, so vnodes are random assignments under normal circumstances (I'm in
2.1.x, I'm assuming a derivative approach was in the works that would avoid
some hot node aspects of random primary range assingment for new nodes once
you had one or two or three in a cluster).
So... couldn't I just "engineer
As I understand it: Replicas of data are replicated to the next primary
range owner.
As tokens are randomly generated (at least in 2.1.x that I am on), can't we
have this situation:
Say we have RF3, but the tokens happen to line up where:
NodeA handles 0-10
NodeB handles 11-20
NodeA handlea 21-
So in theory, one could double a cluster by:
1) moving snapshots of each node to a new node.
2) for each snapshot moved, figure out the primary range of the new node by
taking the old node's primary range token and calculating the midpoint
value between that and the next primary range start token
That’s why you use a NTS + a snitch, it picks replaces based on rack awareness.
> On Feb 20, 2018, at 9:33 AM, Carl Mueller
> wrote:
>
> So in theory, one could double a cluster by:
>
> 1) moving snapshots of each node to a new node.
> 2) for each snapshot moved, figure out the primary range o
I ask Cassandra to be a database that is high-performance, highly
scalable with no single point of failure. Anything "cool" that's added
beyond must be added only as a separate, optional ring around Cassandra
and must not get in the way of my usage.
Yes, I would like some help with some of wha
Ahhh, the topology strategy does that.
But if one were to maintain the same rack topology and was adding nodes
just within the racks... Hm, might not be possible in new nodes. ALthough
AWS "racks" are at the availability zone IIRC, so that would be doable.
Outside of rack awareness, would the nex
How "hot" are your partition keys in these counters?
I would think, theoretically, if specific partition keys are getting
thousands of counter increments/mutations updates, then compaction won't
"compact" those together into the final value, and you'll start
experiencing the problems people get wi
Hello,
When querying large wide rows for multiple specific values is it
better to do separate queries for each value...or do it with one query
and an "IN"? I am using Cassandra 2.1.14
I am asking because I had changed my app to use 'IN' queries and it
**appears** to be slower rather than faster.
The scenario you describe is the typical point where people move away from
vnodes and towards single-token-per-node (or a much smaller number of
vnodes).
The default setting puts you in a situation where virtually all hosts are
adjacent/neighbors to all others (at least until you're way into the
h
Best approach to replace existing 8 smaller 8 nodes in production cluster with
New 8 nodes that are bigger in capacity without a downtime
We have 4 nodes each in 2 DC, and we want to replace these 8 nodes with new 8
nodes that are bigger in capacity in terms of RAM,CPU and Diskspace without a
d
Thanks Jeff,
your answer is really not what I expected to learn - which is again more manual
doing as soon as we start really using C*. But I‘m happy to be able to learn it
now and have still time to learn the neccessary Skills and ask the right
questions on how to correctly drive big data with
At a past job, we set the limit at around 60 hosts per cluster - anything
bigger than that got single token. Anything smaller, and we'd just tolerate
the inconveniences of vnodes. But that was before the new vnode token
allocation went into 3.0, and really assumed things that may not be true
for yo
Hi,
Consider using this approach, replacing nodes one by one:
https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement/
Regards,
Kyrill
From: Leena Ghatpande
Sent: Tuesday, February 20, 2018 10:24:24 PM
To: user@cassandra.apache
We do archiving data in Order to make assumptions on it in future. So, yes we
expect to grow continously. In the mean time I learned to go for predictable
grow per partition rather than unpredictable large partitioning. So today we
are growing 250.000.000 Records per Day going into a single tabl
The file format is independent from compaction. A compaction strategy only
selects sstables to be compacted, that’s it’s only job. It could have side
effects, like generating other files, but any decent compaction strategy will
account for the fact that those other files don’t exist.
I wrote
> We do archiving data in Order to make assumptions on it in future. So, yes
> we expect to grow continously. In the mean time I learned to go for
> predictable grow per partition rather than unpredictable large
> partitioning. So today we are growing 250.000.000 Records per Day going
> into a sing
There are some arguments to be made that the flush should consider compaction
strategy - would allow a bug flush to respect LCS filesizes or break into
smaller pieces to try to minimize range overlaps going from l0 into l1, for
example.
I have no idea how much work would be involved, but may be
Hi,
I have cassandra running on my machine(Windows). I have downloaded
commons-daemon-1.1.0-bin-windows.zip and extracted it to
cassandra\bin\daemon. I successfully created the service using
cassandra.bat -install.
When I go to start the service I get error below. When I start from the
command
You can also create a new DC and then terminate old one.
Sent from my iPhone
> On Feb 20, 2018, at 2:49 PM, Kyrylo Lebediev wrote:
>
> Hi,
> Consider using this approach, replacing nodes one by one:
> https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement/
>
> Rega
Someone can correct me if I'm wrong, but I believe if you do a large IN()
on a single partition's cluster keys, all the reads are going to be served
from a single replica. Compared to many concurrent individual equal
statements you can get the performance gain of leaning on several replicas
for pa
Typically you'll end up with an unrepaired SSTable and a repaired SSTable.
You'll only end up with one if there's absolutely no unrepaired data (which
is very unlikely).
On 13 February 2018 at 09:54, Bo Finnerup Madsen
wrote:
> Hi Eric,
>
> I had not seen your talk, it was very informative thank
>
> Then could it be that calling `nodetool drain` after calling `nodetool
> disablegossip` is what causes the problem?
Seems unlikely, but I guess it's a possiblity. Did you just try drain on
it's own?
I’m following this intensely as the discussion is important to me as well
> in understanding t
>
> Outside of rack awareness, would the next primary ranges take the replica
> ranges?
Yes.
Probably a lot of work but it would be incredibly useful for vnodes if
flushing was range aware (to be used with RangeAwareCompactionStrategy).
The writers are already range aware for JBOD, but that's not terribly
valuable ATM.
On 20 February 2018 at 21:57, Jeff Jirsa wrote:
> There are some arg
I'd say, "add new DC, then remove old DC" approach is more risky especially if
they use QUORUM CL (in this case they will need to change CL to LOCAL_QUORUM,
otherwise they'll run into a lot of blocking read repairs).
Also, if there is a chance to get rid of streaming, it worth doing as usually
d
You add the nodes with rf=0 so there’s no streaming, then bump it to rf=1 and
run repair, then rf=2 and run repair, then rf=3 and run repair, then you either
change the app to use local quorum in the new dc, or reverse the process by
decreasing the rf in the original dc by 1 at a time
--
Jeff
If you watch this video through you'll see why usability is so important. You
can't ignore usability issues.
Cassandra does not exist in a vacuum. The competitors are world class.
The video is on the New Cassandra API for Azure Cosmos DB:
https://www.youtube.com/watch?v=1Sf4McGN1AQ
Kennet
Hello.
Could you help me with LEAK DETECTED error while minor compaction process?
There is a table with a lot of small record 6.6*10^9 (mapping
(eventId, boxId) -> cellId)).
Minor compaction starts and then fails on 99% done with an error:
Stacktrace
ERROR [Reference-Reaper:1] 2018-02-05 10:06:1
Your bloom filter settings look broken. Did you set the FP ratio to 0? If so
that’s a bad idea and we should have stopped you from doing it.
--
Jeff Jirsa
> On Feb 20, 2018, at 11:01 PM, Дарья Меленцова wrote:
>
> Hello.
>
> Could you help me with LEAK DETECTED error while minor compaction
Jeff,
I don't think you can push the topic of usability back to developers by
asking them to open JIRAs. It is upon the technical leaders of the
Cassandra community to take the initiative in this regard. We can argue
back and forth on the dynamics of open source projects, but the usability
concern
40 matches
Mail list logo