First Group By is only allowed on partition keys and clustering columns,
not on arbitrary column. The internal implementation of group by tries to
fetch data on clustering order to avoid having to "re-sort" them in memory
which would be very expensive
Second, group by works best when restricted to
Hi all,
Wondering if anyone had any thoughts on this? At the moment the long running
repairs cause us to be running them on two nodes at once for a bit of time,
which obivould increases the cluster load.
On 2017-05-25 16:18 (+0100), Chris Stokesmore wrote:
> Hi,>
>
> We are running a 7 node
Hi,
we have a cluster of 11 nodes running Cassandra 2.2.9 where we regularly
get READ messages dropped:
> READ messages were dropped in last 5000 ms: 974 for internal timeout
> and 0 for cross node timeout
Looking at the logs, some are logged at the same time as Old Gen GCs.
These GCs all take aro
Hi Vincent,
dropped messages are indeed common in case of long GC pauses.
Having 4s to 6s pauses is not normal and is the sign of an unhealthy
cluster. Minor GCs are usually faster but you can have long ones too.
If you can share your hardware specs along with your current GC settings
(CMS or G1,
Hi Alexander.
Yeah, the minor GCs I see are usually around 300ms but sometimes jumping
to 1s or even more.
Hardware specs are:
- 8 core CPUs
- 32 GB of RAM
- 4 SSDs in hardware Raid 0, around 3TB of space per node
GC settings:-Xmx12G -Xms12G -XX:+UseG1GC -
XX:G1RSetUpdatingPauseTimePer
Hi Vincent,
it is very clear, thanks for all the info.
I would not stick with G1 in your case, as it requires much more heap to
perform correctly (>24GB).
CMS/ParNew should be much more efficient here and I would go with some
settings I usually apply on big workloads : 16GB heap / 6GB new gen
/ M
Hi Chris,
Using pr with incremental repairs does not make sense. Primary range repair is
an optimization over full repair. If you run full repair on a n node cluster
with RF=3, you would be repairing each data thrice. E.g. in a 5 node cluster
with RF=3, a range may exist on node A,B and C . When
Thanks Alexander for the help, lots of good info in there.
I'll try to switch back to CMS and see how it fares.
On Tue, Jun 6, 2017, at 05:06 PM, Alexander Dejanovski wrote:
> Hi Vincent,
>
> it is very clear, thanks for all the info.
>
> I would not stick with G1 in your case, as it requires
Thank you for the excellent and clear description of the different versions of
repair Anuj, that has cleared up what I expect to be happening.
The problem now is in our cluster, we are running repairs with options
(parallelism: parallel, primary range: false, incremental: true, job threads:
1,
Hi DuyHai,
thanks for your response.
I understand the reservations about implementing sorting in Cassandra. But I
think it is analogous to filtering. It may be bad in the general case, but can
be useful for particular use cases.
If Cassandra does not provide “order-by”, then the ordering has t
Hi DuyHai,
this is in response to the other points in your response.
My application is a real-time application. It monitors devices in the network
and displays the top N devices for various parameters averaged over a time
period. A query may involve anywhere from 10 to 50k devices, and anywhere
Hi Chris,
Can your share following info:
1. Exact repair commands you use for inc repair and pr repair
2. Repair time should be measured at cluster level for inc repair. So, whats
the total time it takes to run repair on all nodes for incremental vs pr
repairs?
3. You are repairing one dc DC3. Ho
The problem is not that it's not feasible from Cassandra side, it is
The problem is when doing arbitrary ORDER BY, Cassandra needs to resort to
in-memory sorting of a potentially huge amout of data --> more pressure on
heap --> impact on cluster stability
Whereas delegating this kind of job to Sp
Unfortunately this feature falls in a category of *incredibly useful*
features that have gotten the -1 over the years because it doesn't scale
like we want it to. As far as basic aggregations go, it's remarkably
trivial to roll up 100K-1MM items using very little memory, so at first it
seems like
All the explanation for why just 1 non PK column can be used as PK for MV
is here:
https://skillsmatter.com/skillscasts/7446-cassandra-udf-and-materialised-views-in-depth
Skip to 19:18 for the explanation
On Mon, May 8, 2017 at 8:08 PM, Fridtjof Sander <
fridtjof.san...@googlemail.com> wrote:
>
I can't recommend *anyone* use incremental repair as there's some pretty
horrible bugs in it that can cause Merkle trees to wildly mismatch & result
in massive overstreaming. Check out
https://issues.apache.org/jira/browse/CASSANDRA-9143.
TL;DR: Do not use incremental repair before 4.0.
On Tue,
On 2017-06-05 19:00 (-0700), "Roger Fischer (CW)" wrote:
> Hello,
>
> is there any intent to support "order by" and "limit" on aggregated values?
>
> For time series data, top n queries are quite common. Group-by was the first
> step towards supporting such queries, but ordering by value and
>
>
> My application is a real-time application. It monitors devices in the
> network and displays the top N devices for various parameters averaged over
> a time period. A query may involve anywhere from 10 to 50k devices, and
> anywhere from 5 to 2000 intervals. We expect a query to take less tha
Hi ,
I am trying to Setup Cassandra 3.9 on Multi DC.
Currently, I am having 2 DCs with 3 and 2 nodes respectively.
DC1 Name :- India
Nodes :- 192.16.0.1 , 192.16.0.2, 192.16.0.3
DC2 Name :- USA
Nodes :- 172.16.0.1 , 172.16.0.2
Please help me to know which files I need to make changes for configu
Hi All,
We are having 2 DC setup each consists of 20 odd nodes and recently we
decided to add 6 more nodes to DC1. We are using LWT's, application
dirvers are configuared to use LOCAL_SERIAL.
As we are adding multiple nodes at a time we used option
"-Dcassandra.consistent.rangemovement=false"
we
20 matches
Mail list logo