Re: Inter-node messaging latency

2018-11-28 Thread Jeff Jirsa
Are you sure you’re blocked on internode and not commitlog? Batch is typically 
not what people expect (group commitlog in 4.0 is probably closer to what you 
think batch does).

-- 
Jeff Jirsa


> On Nov 27, 2018, at 10:55 PM, Yuji Ito  wrote:
> 
> Hi,
> 
> Thank you for the reply.
> I've measured LWT throughput in 4.0.
> 
> I used the cassandra-stress tool to insert rows with LWT for 3 minutes on 
> i3.xlarge and i3.4xlarge
> For 3.11, I modified the tool to support LWT.
> Before each measurement, I cleaned up all Cassandra data.
> 
> The throughput in 4.0 is 5 % faster than 3.11.
> The CPU load of i3.4xlarge (16 vCPUs) is only up to 75% in both versions.
> And, the throughput was slower than 4 times that of i3.xlarge.
> I think the throughput wasn't bounded by CPU also in 4.0.
> 
> The CPU load of i3.4xlarge is up to 80 % with non-LWT write.
> 
> I wonder what is the bottleneck for writes on a many-core machine if the 
> issue about messaging has been resolved in 4.0.
> Can I use up CPU to insert rows by changing any parameter?
> 
> # LWT insert
> * Cassandra 3.11.3
> | instance type | # of threads | concurrent_writes | Throughput [op/s] |
> | i3.xlarge |   64 |32 |  2815 |
> |i3.4xlarge |  256 |   128 |  9506 |
> |i3.4xlarge |  512 |   256 | 10540 |
> 
> * Cassandra 4.0 (trunk)
> | instance type | # of threads | concurrent_writes | Throughput [op/s] |
> | i3.xlarge |   64 |32 |  2951 |
> |i3.4xlarge |  256 |   128 |  9816 |
> |i3.4xlarge |  512 |   256 | 11055 |
> 
> * Environment
> - 3 node cluster
> - Replication factor: 3
> - Node instance: AWS EC2 i3.xlarge / i3.4xlarge
> 
> * C* configuration
> - Apache Cassandra 3.11.3 / 4.0 (trunk)
> - commitlog_sync: batch
> - concurrent_writes: 32, 256
> - native_transport_max_threads: 128(default), 256 (when concurrent_writes is 
> 256)
> 
> Thanks,
> Yuji
> 
> 
> 2018年11月26日(月) 17:27 sankalp kohli :
>> Inter-node messaging is rewritten using Netty in 4.0. It will be better to 
>> test it using that as potential changes will mostly land on top of that. 
>> 
>>> On Mon, Nov 26, 2018 at 7:39 AM Yuji Ito  wrote:
>>> Hi,
>>> 
>>> I'm investigating LWT performance with C* 3.11.3.
>>> It looks that the performance is bounded by messaging latency when many 
>>> requests are issued concurrently.
>>> 
>>> According to the source code, the number of messaging threads per node is 
>>> only 1 thread for incoming and 1 thread for outbound "small" message to 
>>> another node.
>>> 
>>> I guess these threads are frequently interrupted because many threads are 
>>> executed when many requests are issued.
>>> Especially, I think it affects the LWT performance when many LWT requests 
>>> which need lots of inter-node messaging are issued.
>>> 
>>> I measured that latency. It took 2.5 ms in average to enqueue a message at 
>>> a node and to receive the message at the **same** node with 96 concurrent 
>>> LWT writes.
>>> Is it normal? I think it is too big latency, though a message was sent to 
>>> the same node.
>>> 
>>> Decreasing numbers of other threads like `concurrent_counter_writes`, 
>>> `concurrent_materialized_view_writes` reduced a bit the latency.
>>> Can I change any other parameter to reduce the latency?
>>> I've tried using message coalescing, but they didn't reduce that.
>>> 
>>> * Environment
>>> - 3 node cluster
>>> - Replication factor: 3
>>> - Node instance: AWS EC2 i3.xlarge
>>> 
>>> * C* configuration
>>> - Apache Cassandra 3.11.3
>>> - commitlog_sync: batch
>>> - concurrent_reads: 32 (default)
>>> - concurrent_writes: 32 (default)
>>> 
>>> Thanks,
>>> Yuji
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org


Re: request_scheduler functionalities for CQL Native Transport

2018-11-28 Thread dinesh.jo...@yahoo.com.INVALID
I think what you're looking for might be solved by CASSANDRA-8303. However, I 
am not sure if anybody is working on it. Generally you want to create different 
clusters for users to physically isolate them. What you propose has been 
discussed in the past and it is something that is currently unsupported.
Dinesh 

On Tuesday, November 27, 2018, 11:05:32 PM PST, Shaurya Gupta 
 wrote:  
 
 Hi,
We want to throttle maximum queries on any keyspace for clients connecting via 
CQL native transport. This option is available for clients connecting via 
thrift by property of request_scheduler in cassandra.yaml.Is there some option 
available for clients connecting via CQL native transport.If not is there any 
plan to do so in future.It is a must have feature if we want to support 
multiple teams on a single cassandra cluster or to prevent one keyspace from 
interfering with the performance of the other keyspaces.
RegardsShaurya Gupta


  

Re: Inter-node messaging latency

2018-11-28 Thread Yuji Ito
Hi Jeff,

I've not looked at the new inter-node latency in 4.0 yet.

I think it isn't blocked by commitlog.
In 3.11.3, I've probed each Paxos phase and commitlog sync.
(In the investigation, I didn't use cassandra-stress tool. The workload has
LWT read requests.)
The below table shows the average latency of each phase.
They are including inter-node messaging because I added `metrics` to
StorageProxy#cas().

It takes only 2,607 microseconds on average to sync commitlog in
`BatchCommitlogService`.
But each Paxos phase takes more than a few milliseconds except for the
commitlog sync.
Especially, though the read phase doesn't have write process, it takes
about 5 milliseconds.

MetricsLatency [us]
CAS Read 13556
CAS Write 32625
Prepare phase 8677
Read phase 4889
Propose phase 8706
Commit phase 10619

Thanks,
Yuji

2018年11月28日(水) 17:44 Jeff Jirsa :

> Are you sure you’re blocked on internode and not commitlog? Batch is
> typically not what people expect (group commitlog in 4.0 is probably closer
> to what you think batch does).
>
> --
> Jeff Jirsa
>
>
> On Nov 27, 2018, at 10:55 PM, Yuji Ito  wrote:
>
> Hi,
>
> Thank you for the reply.
> I've measured LWT throughput in 4.0.
>
> I used the cassandra-stress tool to insert rows with LWT for 3 minutes on
> i3.xlarge and i3.4xlarge
> For 3.11, I modified the tool to support LWT.
> Before each measurement, I cleaned up all Cassandra data.
>
> The throughput in 4.0 is 5 % faster than 3.11.
> The CPU load of i3.4xlarge (16 vCPUs) is only up to 75% in both versions.
> And, the throughput was slower than 4 times that of i3.xlarge.
> I think the throughput wasn't bounded by CPU also in 4.0.
>
> The CPU load of i3.4xlarge is up to 80 % with non-LWT write.
>
> I wonder what is the bottleneck for writes on a many-core machine if the
> issue about messaging has been resolved in 4.0.
> Can I use up CPU to insert rows by changing any parameter?
>
> # LWT insert
> * Cassandra 3.11.3
> | instance type | # of threads | concurrent_writes | Throughput [op/s] |
> | i3.xlarge |   64 |32 |  2815 |
> |i3.4xlarge |  256 |   128 |  9506 |
> |i3.4xlarge |  512 |   256 | 10540 |
>
> * Cassandra 4.0 (trunk)
> | instance type | # of threads | concurrent_writes | Throughput [op/s] |
> | i3.xlarge |   64 |32 |  2951 |
> |i3.4xlarge |  256 |   128 |  9816 |
> |i3.4xlarge |  512 |   256 | 11055 |
>
> * Environment
> - 3 node cluster
> - Replication factor: 3
> - Node instance: AWS EC2 i3.xlarge / i3.4xlarge
>
> * C* configuration
> - Apache Cassandra 3.11.3 / 4.0 (trunk)
> - commitlog_sync: batch
> - concurrent_writes: 32, 256
> - native_transport_max_threads: 128(default), 256 (when concurrent_writes
> is 256)
>
> Thanks,
> Yuji
>
>
> 2018年11月26日(月) 17:27 sankalp kohli :
>
>> Inter-node messaging is rewritten using Netty in 4.0. It will be better
>> to test it using that as potential changes will mostly land on top of that.
>>
>> On Mon, Nov 26, 2018 at 7:39 AM Yuji Ito  wrote:
>>
>>> Hi,
>>>
>>> I'm investigating LWT performance with C* 3.11.3.
>>> It looks that the performance is bounded by messaging latency when many
>>> requests are issued concurrently.
>>>
>>> According to the source code, the number of messaging threads per node
>>> is only 1 thread for incoming and 1 thread for outbound "small" message to
>>> another node.
>>>
>>> I guess these threads are frequently interrupted because many threads
>>> are executed when many requests are issued.
>>> Especially, I think it affects the LWT performance when many LWT
>>> requests which need lots of inter-node messaging are issued.
>>>
>>> I measured that latency. It took 2.5 ms in average to enqueue a message
>>> at a node and to receive the message at the **same** node with 96
>>> concurrent LWT writes.
>>> Is it normal? I think it is too big latency, though a message was sent
>>> to the same node.
>>>
>>> Decreasing numbers of other threads like `concurrent_counter_writes`,
>>> `concurrent_materialized_view_writes` reduced a bit the latency.
>>> Can I change any other parameter to reduce the latency?
>>> I've tried using message coalescing, but they didn't reduce that.
>>>
>>> * Environment
>>> - 3 node cluster
>>> - Replication factor: 3
>>> - Node instance: AWS EC2 i3.xlarge
>>>
>>> * C* configuration
>>> - Apache Cassandra 3.11.3
>>> - commitlog_sync: batch
>>> - concurrent_reads: 32 (default)
>>> - concurrent_writes: 32 (default)
>>>
>>> Thanks,
>>> Yuji
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>


Re: request_scheduler functionalities for CQL Native Transport

2018-11-28 Thread Shaurya Gupta
Hi,

CASSANDRA-8303 talks about more granular control at the query level. What
we are looking at is throttling on the basis of the number of queries
received for different keyspaces. This is what request_scheduler and
request_scheduler_options provide for clients connecting via thrift.

Regards

On Wed, Nov 28, 2018 at 2:27 PM dinesh.jo...@yahoo.com.INVALID
 wrote:

> I think what you're looking for might be solved by CASSANDRA-8303.
> However, I am not sure if anybody is working on it. Generally you want to
> create different clusters for users to physically isolate them. What you
> propose has been discussed in the past and it is something that is
> currently unsupported.
>
> Dinesh
>
>
> On Tuesday, November 27, 2018, 11:05:32 PM PST, Shaurya Gupta <
> shaurya.n...@gmail.com> wrote:
>
>
> Hi,
>
> We want to throttle maximum queries on any keyspace for clients connecting
> via CQL native transport. This option is available for clients connecting
> via thrift by property of request_scheduler in cassandra.yaml.
> Is there some option available for clients connecting via CQL native
> transport.
> If not is there any plan to do so in future.
> It is a must have feature if we want to support multiple teams on a single
> cassandra cluster or to prevent one keyspace from interfering with the
> performance of the other keyspaces.
>
> Regards
> Shaurya Gupta
>
>
>

-- 
Shaurya Gupta


multiple node bootstrapping

2018-11-28 Thread Osman YOZGATLIOĞLU
Hello,

I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.

I started one node in dc1 and its already joining. 3TB of 50TB finished in 2 
weeks. One year ttl time series data with twcs.

I know, its not best practise..

I want to start one node in dc2 and cassandra refused to start with mentioning 
already one node in joining state.

I find some workaround with jmx directives, but i'm not sure if I broke 
something on the way.

Is it wise to bootstrap in both dc at the same time?


Regards,

Osman


Data storage space unbalance issue

2018-11-28 Thread Eunsu Kim
(I am sending the previous mail again because it seems that it has not been 
sent properly.)

HI experts,

I am running 2 datacenters each containing five nodes. (total 10 nodes, all 
3.11.3)

My data is stored one at each data center. (REPLICATION = { 'class' : 
'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1', 
'datacenter2': '1’ })

Most of my data have a short TTL(14days). The gc_grace_seconds value for all 
tables is also 600sec.

I expect the two data centers to use the same size but datacenter2 is using 
more size. It seems that the datas of datacenter2 is rarely deleted. While the 
disk usage for datacenter1 remains constant, the disk usage for datacenter2 
continues to grow.

——
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack
UN  10.61.58.228  925.48 GiB  256  21.5% 
60d1bac8-b4d6-4e02-a05f-badee0bb36f5  rack1
UN  10.61.58.167  840 GiB256  20.0% 
a04fc77a-907f-490c-971c-4e1f964c7b14  rack1
UN  10.61.75.86   1.13 TiB   256  19.3% 
618c101b-036d-42e7-bf9f-2bcbd429cbd1  rack1
UN  10.61.59.22   844.19 GiB  256  20.0% 
d8a4a165-13f0-4f4a-9278-4024730b8116  rack1
UN  10.61.59.82   737.88 GiB  256  19.2% 
054a4eb5-6d1c-46fa-b550-34da610da4e0  rack1
Datacenter: datacenter2
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack
UN  10.42.6.120   1.11 TiB   256  18.6% 
69f15be0-e5a1-474e-87cf-b063e6854402  rack1
UN  10.42.5.207   1.17 TiB   256  20.0% 
f78bdce5-cb01-47e0-90b9-fcc31568e49e  rack1
UN  10.42.6.471.01 TiB   256  20.1% 
3ff93b47-2c15-4e1a-a4ea-2596f26b4281  rack1
UN  10.42.6.481007.67 GiB  256  20.4% 
8cbbe76d-6496-403a-8b09-fe6812c9dea2  rack1
UN  10.42.5.208   1.29 TiB   256  20.9% 
4aa96c6a-6083-417f-a58a-ec847bcbfc7e  rack1
--

A few days ago, one node of datacenter1 broke down and replaced it, and I 
worked on rebuild, repair, and cleanup.


What else can I do?

Thank you in advance.

Re: multiple node bootstrapping

2018-11-28 Thread Vitali Dyachuk
You can use auto_bootstrap set to false to add a new node to the ring, it
will calculate the token range for the new node, but will not start
streaming the data.
In this case you can add several nodes into the ring quickly. After that
you can start nodetool rebuild -dc  <> to start streaming data.
In your case 50Tb of data per node is quite a large amount of data i would
recommend, based on own experience keeping 1Tb per node, since when
streaming can be interrupted for some reason and it cannot be resumed so
you'll have to restart streaming. Also there will be compaction problems.

Vitali.
On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOĞLU <
osman.yozgatlio...@krontech.com> wrote:

> Hello,
>
> I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
>
> I started one node in dc1 and its already joining. 3TB of 50TB finished in
> 2 weeks. One year ttl time series data with twcs.
>
> I know, its not best practise..
>
> I want to start one node in dc2 and cassandra refused to start with
> mentioning already one node in joining state.
>
> I find some workaround with jmx directives, but i'm not sure if I broke
> something on the way.
>
> Is it wise to bootstrap in both dc at the same time?
>
>
> Regards,
>
> Osman
>


Re: multiple node bootstrapping

2018-11-28 Thread Jeff Jirsa
This violates any consistency guarantees you have and isn’t the right approach 
unless you know what you’re giving up (correctness, typically)

-- 
Jeff Jirsa


> On Nov 28, 2018, at 2:40 AM, Vitali Dyachuk  wrote:
> 
> You can use auto_bootstrap set to false to add a new node to the ring, it 
> will calculate the token range for the new node, but will not start streaming 
> the data.
> In this case you can add several nodes into the ring quickly. After that you 
> can start nodetool rebuild -dc  <> to start streaming data. 
> In your case 50Tb of data per node is quite a large amount of data i would 
> recommend, based on own experience keeping 1Tb per node, since when streaming 
> can be interrupted for some reason and it cannot be resumed so you'll have to 
> restart streaming. Also there will be compaction problems.
> 
> Vitali.
>> On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOĞLU 
>>  wrote:
>> Hello,
>> 
>> I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
>> 
>> I started one node in dc1 and its already joining. 3TB of 50TB finished in 2 
>> weeks. One year ttl time series data with twcs.
>> 
>> I know, its not best practise..
>> 
>> I want to start one node in dc2 and cassandra refused to start with 
>> mentioning already one node in joining state.
>> 
>> I find some workaround with jmx directives, but i'm not sure if I broke 
>> something on the way.
>> 
>> Is it wise to bootstrap in both dc at the same time?
>> 
>> 
>> 
>> Regards,
>> 
>> Osman


Apache Cassandra transactions commit and rollback

2018-11-28 Thread Ramya K
Hi All,

  I'm exploring Cassandra for our project and would like to know the best
practices for handling transactions in real time. Also suggest if any
drivers or tools are available for this.

  I've read about Apache Kundera transaction layer over Cassandra, is there
bottlenecks with this.

  Please suggest your views on this.

Reagrds,
Ramya.


Re: multiple node bootstrapping

2018-11-28 Thread Jonathan Haddad
Agree with Jeff here, using auto_bootstrap:false is probably not what you
want.

Have you increased your streaming throughput?

Upgrading to 3.11 might reduce the time by quite a bit:
https://issues.apache.org/jira/browse/CASSANDRA-9766

You'd be doing committers a huge favor if you grabbed some histograms and
flame graphs on both the sending an receiving nodes:
http://thelastpickle.com/blog/2018/01/16/cassandra-flame-graphs.html and
sent them to the dev mailing list.



On Wed, Nov 28, 2018 at 3:59 AM Jeff Jirsa  wrote:

> This violates any consistency guarantees you have and isn’t the right
> approach unless you know what you’re giving up (correctness, typically)
>
> --
> Jeff Jirsa
>
>
> On Nov 28, 2018, at 2:40 AM, Vitali Dyachuk  wrote:
>
> You can use auto_bootstrap set to false to add a new node to the ring, it
> will calculate the token range for the new node, but will not start
> streaming the data.
> In this case you can add several nodes into the ring quickly. After that
> you can start nodetool rebuild -dc  <> to start streaming data.
> In your case 50Tb of data per node is quite a large amount of data i would
> recommend, based on own experience keeping 1Tb per node, since when
> streaming can be interrupted for some reason and it cannot be resumed so
> you'll have to restart streaming. Also there will be compaction problems.
>
> Vitali.
> On Wed, Nov 28, 2018 at 12:03 PM Osman YOZGATLIOĞLU <
> osman.yozgatlio...@krontech.com> wrote:
>
>> Hello,
>>
>> I have 2 dc cassandra 3.0.14 setup. I need to add 2 new nodes to each dc.
>>
>> I started one node in dc1 and its already joining. 3TB of 50TB finished
>> in 2 weeks. One year ttl time series data with twcs.
>>
>> I know, its not best practise..
>>
>> I want to start one node in dc2 and cassandra refused to start with
>> mentioning already one node in joining state.
>>
>> I find some workaround with jmx directives, but i'm not sure if I broke
>> something on the way.
>>
>> Is it wise to bootstrap in both dc at the same time?
>>
>>
>> Regards,
>>
>> Osman
>>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


C* as fluent data storage, 10MB/sec/node?

2018-11-28 Thread Adam Smith
Hi All,

I need to use C* somehow as fluent data storage - maybe this is different
to the queue antipattern? Lots of data come in (10MB/sec/node), remains for
e.g. 1 hour and should then be evicted. It is somehow not critical when
data would occasionally disappear/get lost.

Thankful for any advice!

Is this nowadays possible without suffering too much from compactation? I
would not have ranged tombstones, and depending on a possible solution only
using point deletes (PK+CK). There is only one CK, could also be empty.

1) The data is usually 1 MB. Can I just update with empty data? PK + CK
would remain, but I would not carry about that. Would this create
tombstones or is equivalent to a DELETE?

2) Like 1) and later then set a TTL == small amount of data to be deleted
then? And hopefully small compactation?

3) Simply setting TTL 1h and hoping the best, because I am wrong with my
worries?

4) Any optimization strategies like setting the RF to 1? Which compactation
strategy is advised?

5) Are there any recent performance benchmarks for one of the scenarios?

What else could I do?

Thanks a lot!
Adam


Re: Data storage space unbalance issue

2018-11-28 Thread Elliott Sims
I think you answered your own question, sort of.

When you expand a cluster, it copies the appropriate rows to the new
node(s) but doesn't automatically remove them from the old nodes.  When you
ran cleanup on datacenter1, it cleared out those old extra copies.  I would
suggest running a repair first for safety on datacenter2, then a "nodetool
cleanup" on those hosts.

Also run "nodetool snapshot" to make sure you don't have any old snapshots
sitting around taking up space.

On Wed, Nov 28, 2018 at 5:29 AM Eunsu Kim  wrote:

> (I am sending the previous mail again because it seems that it has not
> been sent properly.)
>
> HI experts,
>
> I am running 2 datacenters each containing five nodes. (total 10 nodes,
> all 3.11.3)
>
> My data is stored one at each data center. (REPLICATION = { 'class' :
> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1'
> , 'datacenter2': '1’ })
>
> Most of my data have a short TTL(14days). The gc_grace_seconds value for
> all tables is also 600sec.
>
> I expect the two data centers to use the same size but datacenter2 is
> using more size. It seems that the datas of datacenter2 is rarely
> deleted. While the disk usage for datacenter1 remains constant, the disk
> usage for datacenter2 continues to grow.
>
> ——
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID
> Rack
> UN  10.61.58.228  925.48 GiB  256  21.5%
> 60d1bac8-b4d6-4e02-a05f-badee0bb36f5  rack1
> UN  10.61.58.167  840 GiB256  20.0%
> a04fc77a-907f-490c-971c-4e1f964c7b14  rack1
> UN  10.61.75.86   1.13 TiB   256  19.3%
> 618c101b-036d-42e7-bf9f-2bcbd429cbd1  rack1
> UN  10.61.59.22   844.19 GiB  256  20.0%
> d8a4a165-13f0-4f4a-9278-4024730b8116  rack1
> UN  10.61.59.82   737.88 GiB  256  19.2%
> 054a4eb5-6d1c-46fa-b550-34da610da4e0  rack1
> Datacenter: datacenter2
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID
> Rack
> UN  10.42.6.120   1.11 TiB   256  18.6%
> 69f15be0-e5a1-474e-87cf-b063e6854402  rack1
> UN  10.42.5.207   1.17 TiB   256  20.0%
> f78bdce5-cb01-47e0-90b9-fcc31568e49e  rack1
> UN  10.42.6.471.01 TiB   256  20.1%
> 3ff93b47-2c15-4e1a-a4ea-2596f26b4281  rack1
> UN  10.42.6.481007.67 GiB  256  20.4%
> 8cbbe76d-6496-403a-8b09-fe6812c9dea2  rack1
> UN  10.42.5.208   1.29 TiB   256  20.9%
> 4aa96c6a-6083-417f-a58a-ec847bcbfc7e  rack1
> --
>
> A few days ago, one node of datacenter1 broke down and replaced it, and I
> worked on rebuild, repair, and cleanup.
>
>
> What else can I do?
>
> Thank you in advance.
>


Re: C* as fluent data storage, 10MB/sec/node?

2018-11-28 Thread Jeff Jirsa
Probably fine as long as there’s some concept of time in the partition key to 
keep them from growing unbounded. 

Use TWCS, TTLs and something like 5-10 minute buckets. Don’t use RF=1, but you 
can write at CL ONE. TWCS will largely just drop whole sstables as they expire 
(especially with 3.11 and the more aggressive expiration logic there)



-- 
Jeff Jirsa


> On Nov 28, 2018, at 11:24 AM, Adam Smith  wrote:
> 
> Hi All,
> 
> I need to use C* somehow as fluent data storage - maybe this is different to 
> the queue antipattern? Lots of data come in (10MB/sec/node), remains for e.g. 
> 1 hour and should then be evicted. It is somehow not critical when data would 
> occasionally disappear/get lost.
> 
> Thankful for any advice!
> 
> Is this nowadays possible without suffering too much from compactation? I 
> would not have ranged tombstones, and depending on a possible solution only 
> using point deletes (PK+CK). There is only one CK, could also be empty.
> 
> 1) The data is usually 1 MB. Can I just update with empty data? PK + CK would 
> remain, but I would not carry about that. Would this create tombstones or is 
> equivalent to a DELETE?
> 
> 2) Like 1) and later then set a TTL == small amount of data to be deleted 
> then? And hopefully small compactation?
> 
> 3) Simply setting TTL 1h and hoping the best, because I am wrong with my 
> worries?
> 
> 4) Any optimization strategies like setting the RF to 1? Which compactation 
> strategy is advised?
> 
> 5) Are there any recent performance benchmarks for one of the scenarios? 
> 
> What else could I do?
> 
> Thanks a lot!
> Adam

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: C* as fluent data storage, 10MB/sec/node?

2018-11-28 Thread Adam Smith
Thanks for the excellent advice, this was extremely helpful! Did not know
about TWCS... curing a lot of headache.

Adam

Am Mi., 28. Nov. 2018 um 20:47 Uhr schrieb Jeff Jirsa :

> Probably fine as long as there’s some concept of time in the partition key
> to keep them from growing unbounded.
>
> Use TWCS, TTLs and something like 5-10 minute buckets. Don’t use RF=1, but
> you can write at CL ONE. TWCS will largely just drop whole sstables as they
> expire (especially with 3.11 and the more aggressive expiration logic there)
>
>
>
> --
> Jeff Jirsa
>
>
> > On Nov 28, 2018, at 11:24 AM, Adam Smith 
> wrote:
> >
> > Hi All,
> >
> > I need to use C* somehow as fluent data storage - maybe this is
> different to the queue antipattern? Lots of data come in (10MB/sec/node),
> remains for e.g. 1 hour and should then be evicted. It is somehow not
> critical when data would occasionally disappear/get lost.
> >
> > Thankful for any advice!
> >
> > Is this nowadays possible without suffering too much from compactation?
> I would not have ranged tombstones, and depending on a possible solution
> only using point deletes (PK+CK). There is only one CK, could also be empty.
> >
> > 1) The data is usually 1 MB. Can I just update with empty data? PK + CK
> would remain, but I would not carry about that. Would this create
> tombstones or is equivalent to a DELETE?
> >
> > 2) Like 1) and later then set a TTL == small amount of data to be
> deleted then? And hopefully small compactation?
> >
> > 3) Simply setting TTL 1h and hoping the best, because I am wrong with my
> worries?
> >
> > 4) Any optimization strategies like setting the RF to 1? Which
> compactation strategy is advised?
> >
> > 5) Are there any recent performance benchmarks for one of the scenarios?
> >
> > What else could I do?
> >
> > Thanks a lot!
> > Adam
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Data storage space unbalance issue

2018-11-28 Thread Eunsu Kim
Thank you for your response.

I will run repair from datacenter2 with your advice. Do I have to run repair on 
every node in datacenter2?

There is no snapshot when checked with nodetool listsnaphosts.

Thank you.

> On 29 Nov 2018, at 4:31 AM, Elliott Sims  wrote:
> 
> I think you answered your own question, sort of.
> 
> When you expand a cluster, it copies the appropriate rows to the new node(s) 
> but doesn't automatically remove them from the old nodes.  When you ran 
> cleanup on datacenter1, it cleared out those old extra copies.  I would 
> suggest running a repair first for safety on datacenter2, then a "nodetool 
> cleanup" on those hosts.  
> 
> Also run "nodetool snapshot" to make sure you don't have any old snapshots 
> sitting around taking up space.
> 
> On Wed, Nov 28, 2018 at 5:29 AM Eunsu Kim  > wrote:
> (I am sending the previous mail again because it seems that it has not been 
> sent properly.)
> 
> HI experts,
> 
> I am running 2 datacenters each containing five nodes. (total 10 nodes, all 
> 3.11.3)
> 
> My data is stored one at each data center. (REPLICATION = { 'class' : 
> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '1', 
> 'datacenter2': '1’ })
> 
> Most of my data have a short TTL(14days). The gc_grace_seconds value for all 
> tables is also 600sec.
> 
> I expect the two data centers to use the same size but datacenter2 is using 
> more size. It seems that the datas of datacenter2 is rarely deleted. While 
> the disk usage for datacenter1 remains constant, the disk usage for 
> datacenter2 continues to grow.
> 
> ——
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID   
> Rack
> UN  10.61.58.228  925.48 GiB  256  21.5% 
> 60d1bac8-b4d6-4e02-a05f-badee0bb36f5  rack1
> UN  10.61.58.167  840 GiB256  20.0% 
> a04fc77a-907f-490c-971c-4e1f964c7b14  rack1
> UN  10.61.75.86   1.13 TiB   256  19.3% 
> 618c101b-036d-42e7-bf9f-2bcbd429cbd1  rack1
> UN  10.61.59.22   844.19 GiB  256  20.0% 
> d8a4a165-13f0-4f4a-9278-4024730b8116  rack1
> UN  10.61.59.82   737.88 GiB  256  19.2% 
> 054a4eb5-6d1c-46fa-b550-34da610da4e0  rack1
> Datacenter: datacenter2
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID   
> Rack
> UN  10.42.6.120   1.11 TiB   256  18.6% 
> 69f15be0-e5a1-474e-87cf-b063e6854402  rack1
> UN  10.42.5.207   1.17 TiB   256  20.0% 
> f78bdce5-cb01-47e0-90b9-fcc31568e49e  rack1
> UN  10.42.6.471.01 TiB   256  20.1% 
> 3ff93b47-2c15-4e1a-a4ea-2596f26b4281  rack1
> UN  10.42.6.481007.67 GiB  256  20.4% 
> 8cbbe76d-6496-403a-8b09-fe6812c9dea2  rack1
> UN  10.42.5.208   1.29 TiB   256  20.9% 
> 4aa96c6a-6083-417f-a58a-ec847bcbfc7e  rack1
> --
> 
> A few days ago, one node of datacenter1 broke down and replaced it, and I 
> worked on rebuild, repair, and cleanup.
> 
> 
> What else can I do?
> 
> Thank you in advance.