token distribution in multi-dc

2017-05-01 Thread vasu gunja
Hi ,

I have a question regarding token distribution in muti-dc setup.

We are having multi-dc (DC1+DC2) setup with V-nodes enabled.
How token ranges will be distributed in cluster ?

Is complete cluster has completed one token range ?
Or each DC has complete token  range?


Seed nodes as part of cluster

2017-05-01 Thread Roman Naumenko
Hi,

I’d like to confirm that seed nodes doesn’t contain any data. Is it correct?

Can the instances for seed nodes be smaller size than for data nodes?

Thank you
Roman
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Service discovery in the Cassandra cluster

2017-05-01 Thread Roman Naumenko
If I understand how Cassandra nodes work, they must contain a list of seed’s IP 
addressed in config file.

This requirement makes cluster setup unnecessarily complicated. Is it possible 
to use DNS name for seed nodes?

Thanks,

—
Roman
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Seed nodes as part of cluster

2017-05-01 Thread vasu gunja
Seed will contain meta data + actual data too

On Mon, May 1, 2017 at 3:34 PM, Roman Naumenko  wrote:

> Hi,
>
> I’d like to confirm that seed nodes doesn’t contain any data. Is it
> correct?
>
> Can the instances for seed nodes be smaller size than for data nodes?
>
> Thank you
> Roman
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Seed nodes as part of cluster

2017-05-01 Thread Roman Naumenko
So they are like any other “data” node… but special?

I’m so freaking confused by this seed nodes design.

—
Roman

> On May 1, 2017, at 1:37 PM, vasu gunja  wrote:
> 
> Seed will contain meta data + actual data too
> 
> On Mon, May 1, 2017 at 3:34 PM, Roman Naumenko  > wrote:
> Hi,
> 
> I’d like to confirm that seed nodes doesn’t contain any data. Is it correct?
> 
> Can the instances for seed nodes be smaller size than for data nodes?
> 
> Thank you
> Roman
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
> 



Re: Seed nodes as part of cluster

2017-05-01 Thread daemeon reiydelle
Caps below for emphasis, not shouting ;{)

Seed nodes are IDENTICAL to all other node hdfs nodes or you will wish
otherwise. Folks get confused because of terminoligy. I refer to this stuff
as "the seed node service of a normal hdfs node". ANY HDFS NODE IS ABLE TO
ACT AS A SEED NODE BY DEFINITION. But ONLY the nodes listed as seeds in the
XML will be contacted, however.

The seed "function" is only used by new nodes when they FIRST join the
cluster for the FIRST time, then never used again (once an node joins the
cluster it is using different protocols, a separate list of nodes, etc.).




*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, May 1, 2017 at 2:05 PM, Roman Naumenko  wrote:

> So they are like any other “data” node… but special?
>
> I’m so freaking confused by this seed nodes design.
>
> —
> Roman
>
> On May 1, 2017, at 1:37 PM, vasu gunja  wrote:
>
> Seed will contain meta data + actual data too
>
> On Mon, May 1, 2017 at 3:34 PM, Roman Naumenko 
> wrote:
>
>> Hi,
>>
>> I’d like to confirm that seed nodes doesn’t contain any data. Is it
>> correct?
>>
>> Can the instances for seed nodes be smaller size than for data nodes?
>>
>> Thank you
>> Roman
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: Seed nodes as part of cluster

2017-05-01 Thread Roman Naumenko
Awesome, thanks for clarification.

So why new nodes can’t connect to ANY seed node's IP that is returned by DNS?
Why the IPs must be “hardcoded”?

—
Roman

> On May 1, 2017, at 2:11 PM, daemeon reiydelle  wrote:
> 
> Caps below for emphasis, not shouting ;{)
> 
> Seed nodes are IDENTICAL to all other node hdfs nodes or you will wish 
> otherwise. Folks get confused because of terminoligy. I refer to this stuff 
> as "the seed node service of a normal hdfs node". ANY HDFS NODE IS ABLE TO 
> ACT AS A SEED NODE BY DEFINITION. But ONLY the nodes listed as seeds in the 
> XML will be contacted, however.
> 
> The seed "function" is only used by new nodes when they FIRST join the 
> cluster for the FIRST time, then never used again (once an node joins the 
> cluster it is using different protocols, a separate list of nodes, etc.).
> 
> ...
> 
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
> 
> On Mon, May 1, 2017 at 2:05 PM, Roman Naumenko  > wrote:
> So they are like any other “data” node… but special?
> 
> I’m so freaking confused by this seed nodes design.
> 
> —
> Roman
> 
>> On May 1, 2017, at 1:37 PM, vasu gunja > > wrote:
>> 
>> Seed will contain meta data + actual data too
>> 
>> On Mon, May 1, 2017 at 3:34 PM, Roman Naumenko > > wrote:
>> Hi,
>> 
>> I’d like to confirm that seed nodes doesn’t contain any data. Is it correct?
>> 
>> Can the instances for seed nodes be smaller size than for data nodes?
>> 
>> Thank you
>> Roman
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>> 
>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>> 
>> 
>> 
> 
> 



Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
Sure, you could use DNS.  Where does it say IP addresses are a requirement?

> On May 1, 2017, at 1:36 PM, Roman Naumenko  wrote:
> 
> If I understand how Cassandra nodes work, they must contain a list of seed’s 
> IP addressed in config file.
> 
> This requirement makes cluster setup unnecessarily complicated. Is it 
> possible to use DNS name for seed nodes?
> 
> Thanks,
> 
> —
> Roman
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Service discovery in the Cassandra cluster

2017-05-01 Thread daemeon reiydelle
Yes, you can use host names. That merely adds another level of
configuration. When using terraform, I often use node names like
 and just use those. They are only routable within the
region/VPC but are in fact already in dns. You do have to watch out as if
you change the seeds (in tf) or the cluster can get terminated and rebuild.
If you have a way to capture these (you can do it in ansible, I had been
told it is really hard to do in Chef/Puppet) then your cms can just adjust
the xml as needed without fussing with route53.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Mon, May 1, 2017 at 3:50 PM, Jon Haddad 
wrote:

> Sure, you could use DNS.  Where does it say IP addresses are a requirement?
>
> > On May 1, 2017, at 1:36 PM, Roman Naumenko  wrote:
> >
> > If I understand how Cassandra nodes work, they must contain a list of
> seed’s IP addressed in config file.
> >
> > This requirement makes cluster setup unnecessarily complicated. Is it
> possible to use DNS name for seed nodes?
> >
> > Thanks,
> >
> > —
> > Roman
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Roman Naumenko
The docs mention IP addresses everywhere.

http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
 

Promote an existing node to a seed node by adding its IP address to -seeds list 
and remove (demote) the IP address of the dead seed node from the 
cassandra.yaml file for each node in the cluster.

http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
 

Note the Address of the dead node; it is used in step 5.

http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeSingleDS.html
 

> Properties to set:
> num_tokens: recommended value: 256
> -seeds: internal IP address of each seed node

I saw also hostnames mentioned few times, but it just makes it even more 
confusing.

—
Roman

> On May 1, 2017, at 3:50 PM, Jon Haddad  wrote:
> 
> Sure, you could use DNS.  Where does it say IP addresses are a requirement?
> 
>> On May 1, 2017, at 1:36 PM, Roman Naumenko  wrote:
>> 
>> If I understand how Cassandra nodes work, they must contain a list of seed’s 
>> IP addressed in config file.
>> 
>> This requirement makes cluster setup unnecessarily complicated. Is it 
>> possible to use DNS name for seed nodes?
>> 
>> Thanks,
>> 
>> —
>> Roman
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 



Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
The in-tree docs do not mention this anywhere, and even have some of the 
answers you’re asking:

https://cassandra.apache.org/doc/latest/faq/index.html?highlight=seed#what-are-seeds
 


The DataStax docs are maintained outside of the project, you’ll have to ask 
them why they’re wrong or misleading.

Jon

> On May 1, 2017, at 4:10 PM, Roman Naumenko  wrote:
> 
> The docs mention IP addresses everywhere.
> 
> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
>  
> 
> Promote an existing node to a seed node by adding its IP address to -seeds 
> list and remove (demote) the IP address of the dead seed node from the 
> cassandra.yaml file for each node in the cluster.
> 
> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
>  
> 
> Note the Address of the dead node; it is used in step 5.
> 
> http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeSingleDS.html
>  
> 
>> Properties to set:
>> num_tokens: recommended value: 256
>> -seeds: internal IP address of each seed node
> 
> I saw also hostnames mentioned few times, but it just makes it even more 
> confusing.
> 
> —
> Roman
> 
>> On May 1, 2017, at 3:50 PM, Jon Haddad > > wrote:
>> 
>> Sure, you could use DNS.  Where does it say IP addresses are a requirement?
>> 
>>> On May 1, 2017, at 1:36 PM, Roman Naumenko >> > wrote:
>>> 
>>> If I understand how Cassandra nodes work, they must contain a list of 
>>> seed’s IP addressed in config file.
>>> 
>>> This requirement makes cluster setup unnecessarily complicated. Is it 
>>> possible to use DNS name for seed nodes?
>>> 
>>> Thanks,
>>> 
>>> —
>>> Roman
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>>> 
>>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>>> 
>>> 
>> 
> 



Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Roman Naumenko
Well, I guess I have to figure out what’s up with IPs/hostnames by experiment.
Information about service discovery is practically absent.
Not to mention all important details about fqdns/hostnames, automatic replacing 
seed nodes or what not. 

—
Roman

> On May 1, 2017, at 4:14 PM, Jon Haddad  wrote:
> 
> The in-tree docs do not mention this anywhere, and even have some of the 
> answers you’re asking:
> 
> https://cassandra.apache.org/doc/latest/faq/index.html?highlight=seed#what-are-seeds
>  
> 
> 
> The DataStax docs are maintained outside of the project, you’ll have to ask 
> them why they’re wrong or misleading.
> 
> Jon
> 
>> On May 1, 2017, at 4:10 PM, Roman Naumenko > > wrote:
>> 
>> The docs mention IP addresses everywhere.
>> 
>> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
>>  
>> 
>> Promote an existing node to a seed node by adding its IP address to -seeds 
>> list and remove (demote) the IP address of the dead seed node from the 
>> cassandra.yaml file for each node in the cluster.
>> 
>> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
>>  
>> 
>> Note the Address of the dead node; it is used in step 5.
>> 
>> http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeSingleDS.html
>>  
>> 
>>> Properties to set:
>>> num_tokens: recommended value: 256
>>> -seeds: internal IP address of each seed node
>> 
>> I saw also hostnames mentioned few times, but it just makes it even more 
>> confusing.
>> 
>> —
>> Roman
>> 
>>> On May 1, 2017, at 3:50 PM, Jon Haddad >> > wrote:
>>> 
>>> Sure, you could use DNS.  Where does it say IP addresses are a requirement?
>>> 
 On May 1, 2017, at 1:36 PM, Roman Naumenko >>> > wrote:
 
 If I understand how Cassandra nodes work, they must contain a list of 
 seed’s IP addressed in config file.
 
 This requirement makes cluster setup unnecessarily complicated. Is it 
 possible to use DNS name for seed nodes?
 
 Thanks,
 
 —
 Roman
 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
 
 For additional commands, e-mail: user-h...@cassandra.apache.org 
 
 
>>> 
>> 
> 



Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Jon Haddad
Why do you have to figure out what’s up w/ them by accident?  You’ve gotten all 
the information you need.  Seeds are used to get the initial state of the 
cluster and as an optimization to spread gossip faster.  That’s it.  



> On May 1, 2017, at 4:37 PM, Roman Naumenko  wrote:
> 
> Well, I guess I have to figure out what’s up with IPs/hostnames by experiment.
> Information about service discovery is practically absent.
> Not to mention all important details about fqdns/hostnames, automatic 
> replacing seed nodes or what not. 
> 
> —
> Roman
> 
>> On May 1, 2017, at 4:14 PM, Jon Haddad > > wrote:
>> 
>> The in-tree docs do not mention this anywhere, and even have some of the 
>> answers you’re asking:
>> 
>> https://cassandra.apache.org/doc/latest/faq/index.html?highlight=seed#what-are-seeds
>>  
>> 
>> 
>> The DataStax docs are maintained outside of the project, you’ll have to ask 
>> them why they’re wrong or misleading.
>> 
>> Jon
>> 
>>> On May 1, 2017, at 4:10 PM, Roman Naumenko >> > wrote:
>>> 
>>> The docs mention IP addresses everywhere.
>>> 
>>> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
>>>  
>>> 
>>> Promote an existing node to a seed node by adding its IP address to -seeds 
>>> list and remove (demote) the IP address of the dead seed node from the 
>>> cassandra.yaml file for each node in the cluster.
>>> 
>>> http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
>>>  
>>> 
>>> Note the Address of the dead node; it is used in step 5.
>>> 
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeSingleDS.html
>>>  
>>> 
 Properties to set:
 num_tokens: recommended value: 256
 -seeds: internal IP address of each seed node
>>> 
>>> I saw also hostnames mentioned few times, but it just makes it even more 
>>> confusing.
>>> 
>>> —
>>> Roman
>>> 
 On May 1, 2017, at 3:50 PM, Jon Haddad >>> > wrote:
 
 Sure, you could use DNS.  Where does it say IP addresses are a requirement?
 
> On May 1, 2017, at 1:36 PM, Roman Naumenko  > wrote:
> 
> If I understand how Cassandra nodes work, they must contain a list of 
> seed’s IP addressed in config file.
> 
> This requirement makes cluster setup unnecessarily complicated. Is it 
> possible to use DNS name for seed nodes?
> 
> Thanks,
> 
> —
> Roman
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
 
>>> 
>> 
> 



Re: Service discovery in the Cassandra cluster

2017-05-01 Thread Roman Naumenko
Lol yeah, why
I guess I run some ec2 instances, drop some cassandra deb packages on 'em -
the thing will figure out how to run...

Also, how would you get "initial state of the cluster" if the cluster... is
being initialized?
Or that's easy, according to the docs - just hardcode some seed IPs into
each node, lol

It's all kinda funny, but in a sad way.

On Mon, May 1, 2017 at 4:45 PM, Jon Haddad 
wrote:

> Why do you have to figure out what’s up w/ them by accident?  You’ve
> gotten all the information you need.  Seeds are used to get the initial
> state of the cluster and as an optimization to spread gossip faster.
> That’s it.
>
>
>
> On May 1, 2017, at 4:37 PM, Roman Naumenko  wrote:
>
> Well, I guess I have to figure out what’s up with IPs/hostnames by
> experiment.
> Information about service discovery is practically absent.
> Not to mention all important details about fqdns/hostnames, automatic
> replacing seed nodes or what not.
>
> —
> Roman
>
> On May 1, 2017, at 4:14 PM, Jon Haddad  wrote:
>
> The in-tree docs do not mention this anywhere, and even have some of the
> answers you’re asking:
>
> https://cassandra.apache.org/doc/latest/faq/index.html?
> highlight=seed#what-are-seeds
>
> The DataStax docs are maintained outside of the project, you’ll have to
> ask them why they’re wrong or misleading.
>
> Jon
>
> On May 1, 2017, at 4:10 PM, Roman Naumenko  wrote:
>
> The docs mention IP addresses everywhere.
>
> http://docs.datastax.com/en/archived/cassandra/2.0/
> cassandra/operations/ops_replace_seed_node.html
> Promote an existing node to a seed node by adding its IP address to -seeds
> list and remove (demote) the IP address of the dead seed node from the
> cassandra.yaml file for each node in the cluster.
>
> http://docs.datastax.com/en/archived/cassandra/2.0/
> cassandra/operations/ops_replace_node_t.html
> Note the Address of the dead node; it is used in step 5.
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/
> initialize/initializeSingleDS.html
>
> Properties to set:
> num_tokens: recommended value: 256
> -seeds: internal IP address of each seed node
>
>
> I saw also *hostnames *mentioned few times, but it just makes it even
> more confusing.
>
> —
> Roman
>
> On May 1, 2017, at 3:50 PM, Jon Haddad  wrote:
>
> Sure, you could use DNS.  Where does it say IP addresses are a requirement?
>
> On May 1, 2017, at 1:36 PM, Roman Naumenko  wrote:
>
> If I understand how Cassandra nodes work, they must contain a list of
> seed’s IP addressed in config file.
>
> This requirement makes cluster setup unnecessarily complicated. Is it
> possible to use DNS name for seed nodes?
>
> Thanks,
>
> —
> Roman
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
>
>
>
>
>


Re: token distribution in multi-dc

2017-05-01 Thread Justin Cameron
Hi Vasu,

Each DC has a complete token range.

Cheers,
Justin

On Tue, 2 May 2017 at 06:32 vasu gunja  wrote:

> Hi ,
>
> I have a question regarding token distribution in muti-dc setup.
>
> We are having multi-dc (DC1+DC2) setup with V-nodes enabled.
> How token ranges will be distributed in cluster ?
>
> Is complete cluster has completed one token range ?
> Or each DC has complete token  range?
>
>
> --


*Justin Cameron*Senior Software Engineer





This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Migrating a cluster

2017-05-01 Thread Voytek Jarnot
Have a scenario where it's necessary to migrate a cluster to a different
set of hardware with minimal downtime. Setup is:

Current cluster: 4 nodes, RF 3
New cluster: 6 nodes, RF 3

My initial inclination is to follow this writeup on setting up the 6 new
nodes as a new DC:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

Basically, set up new DC, nodetool rebuild on new nodes to instruct
Cassandra to migrate data, change client to hit new DC, kill original DC.

First question - is this the recommended way to migrate an in-use cluster
to new hardware?

Secondly, on the assumption that it is: That link gives the impression that
DC-aware clients will not hit the "remote" DC - is that the case for the
Java driver? We don't currently explicitly set PoolingOptions
ConnectionsPerHost for HostDistance.REMOTE to 0 - seems like that would be
an important thing to do?

Thank you.


Re: Migrating a cluster

2017-05-01 Thread Justin Cameron
Yes - this is the recommended way to migrate to another DC.

Before you start the migration you'll need to ensure
1. that the replication strategy of all your keyspaces is
NetworkTopologyStrategy (if not, change it to this using ALTER KEYSPACE),
and
2. that each of your clients is using the DcAwareRoundRobinPolicy load
balancing policy, and that the localDc parameter is set to the name of your
existing data centre.
https://github.com/datastax/java-driver/tree/3.x/manual/load_balancing#dcawareroundrobinpolicy

In addition to points 1&2, in order to ensure that your clients do not
contact nodes in the new data centre, you will also need to use a LOCAL
consistency level for all your queries (e.g. LOCAL_QUORUM instead of QUORUM)

Cheers,
Justin


On Tue, 2 May 2017 at 11:02 Voytek Jarnot  wrote:

> Have a scenario where it's necessary to migrate a cluster to a different
> set of hardware with minimal downtime. Setup is:
>
> Current cluster: 4 nodes, RF 3
> New cluster: 6 nodes, RF 3
>
> My initial inclination is to follow this writeup on setting up the 6 new
> nodes as a new DC:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>
> Basically, set up new DC, nodetool rebuild on new nodes to instruct
> Cassandra to migrate data, change client to hit new DC, kill original DC.
>
> First question - is this the recommended way to migrate an in-use cluster
> to new hardware?
>
> Secondly, on the assumption that it is: That link gives the impression
> that DC-aware clients will not hit the "remote" DC - is that the case for
> the Java driver? We don't currently explicitly set PoolingOptions
> ConnectionsPerHost for HostDistance.REMOTE to 0 - seems like that would be
> an important thing to do?
>
> Thank you.
>
-- 


*Justin Cameron*Senior Software Engineer





This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: what is MemtableReclaimMemory mean ??

2017-05-01 Thread Pranay akula
Hi Alain,

when  "*MemtableReclaimMemory*"  Pending Tasks increasing, its slowly
backing up reads and writes mostly writes. yes i am seeing bit high GC
pressure, currently we are using 24Gb Heap  and G1GC collection. I tried
changing Memtable flush threshold it did helped a little but not much. I am
not seeing any Errors in the Logs.


Thanks
Pranay.

On Thu, Apr 27, 2017 at 6:08 AM, Alain RODRIGUEZ  wrote:

> Hi Pranay,
>
> According to http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/
> toolsTPstats.html, "*MemtableReclaimMemory*" is the thread pool used for
> "Making unused memory available". I don't know much about it since it was
> never an issue for me. Neither did I heard much about it.
>
>
>- Are pending tasks staying high for a long period? `watch -d nodetool
>tpstats`
>- What are your GC settings?
>- Any other threads pending, blocked or dropped?
>- Do you have errors or warnings in your logs?
>- Any GC pressure? (monitored through charts or logs at INFO level, or
>WARN on recent versions)
>
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2017-04-16 16:04 GMT+02:00 Pranay akula :
>
>> Hi,
>>
>> what is *MemtableReclaimMemory* mean in nodetooltpstats ?? does this
>> mean trying to flushing memtable from memory to SStables.
>>
>> I can see sometimes increase in pending tasks of  MemtableReclaimMemory
>> in nodetool tpstats, at that time i can see increase in load on those nodes.
>>
>> Does decreasing memtable_cleanup_threshold will help ??
>>
>> Thanks
>> Pranay.
>>
>
>


Re: Migrating a cluster

2017-05-01 Thread Bhuvan Rawal
+1 to Justin's answer!

As an additional step it's always good to run a full repair before deleting
data on existing nodes, as there is a possibility of ioexceptions during
rebuild. (Things like https://issues.apache.org/jira/browse/CASSANDRA-12830)

Also if you are on 3.8+ , you may go for CDC approach and instead of adding
a dc you can instead create a new cluster. Though this will involve some
downtime. Probable steps in that case:
1. Create cluster with new hardware machines
2. Migrate existing sstables
3. Bring app down & load CDC data into new cluster for the time elapsed
during step 2
4. Bring app up


On 02-May-2017 7:08 AM, "Justin Cameron"  wrote:

Yes - this is the recommended way to migrate to another DC.

Before you start the migration you'll need to ensure
1. that the replication strategy of all your keyspaces is
NetworkTopologyStrategy (if not, change it to this using ALTER KEYSPACE),
and
2. that each of your clients is using the DcAwareRoundRobinPolicy load
balancing policy, and that the localDc parameter is set to the name of your
existing data centre. https://github.com/datastax/java-driver/tree/3.x/
manual/load_balancing#dcawareroundrobinpolicy

In addition to points 1&2, in order to ensure that your clients do not
contact nodes in the new data centre, you will also need to use a LOCAL
consistency level for all your queries (e.g. LOCAL_QUORUM instead of QUORUM)

Cheers,
Justin


On Tue, 2 May 2017 at 11:02 Voytek Jarnot  wrote:

> Have a scenario where it's necessary to migrate a cluster to a different
> set of hardware with minimal downtime. Setup is:
>
> Current cluster: 4 nodes, RF 3
> New cluster: 6 nodes, RF 3
>
> My initial inclination is to follow this writeup on setting up the 6 new
> nodes as a new DC: https://docs.datastax.com/en/cassandra/3.0/cassandra/
> operations/opsAddDCToCluster.html
>
> Basically, set up new DC, nodetool rebuild on new nodes to instruct
> Cassandra to migrate data, change client to hit new DC, kill original DC.
>
> First question - is this the recommended way to migrate an in-use cluster
> to new hardware?
>
> Secondly, on the assumption that it is: That link gives the impression
> that DC-aware clients will not hit the "remote" DC - is that the case for
> the Java driver? We don't currently explicitly set PoolingOptions
> ConnectionsPerHost for HostDistance.REMOTE to 0 - seems like that would be
> an important thing to do?
>
> Thank you.
>
-- 


*Justin Cameron*Senior Software Engineer





This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: what is MemtableReclaimMemory mean ??

2017-05-01 Thread Chris Lohfink
Theres a read barrier to stop reclaiming a memtable when there are requests
actively reading it. The *MemtableReclaimMemory* pool offloads that wait
instead of blocking the caller. It in itself is not going to use any cpu or
increase load. It will however block the releasing of the memtable
resources which might cause additional heap allocation pressure. Its more
likely a symptom of GCs or reads being slow than the cause of the issue
however.

Chris

On Mon, May 1, 2017 at 9:01 PM, Pranay akula 
wrote:

> Hi Alain,
>
> when  "*MemtableReclaimMemory*"  Pending Tasks increasing, its slowly
> backing up reads and writes mostly writes. yes i am seeing bit high GC
> pressure, currently we are using 24Gb Heap  and G1GC collection. I tried
> changing Memtable flush threshold it did helped a little but not much. I am
> not seeing any Errors in the Logs.
>
>
> Thanks
> Pranay.
>
> On Thu, Apr 27, 2017 at 6:08 AM, Alain RODRIGUEZ 
> wrote:
>
>> Hi Pranay,
>>
>> According to http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/to
>> olsTPstats.html, "*MemtableReclaimMemory*" is the thread pool used for
>> "Making unused memory available". I don't know much about it since it was
>> never an issue for me. Neither did I heard much about it.
>>
>>
>>- Are pending tasks staying high for a long period? `watch -d
>>nodetool tpstats`
>>- What are your GC settings?
>>- Any other threads pending, blocked or dropped?
>>- Do you have errors or warnings in your logs?
>>- Any GC pressure? (monitored through charts or logs at INFO level,
>>or WARN on recent versions)
>>
>>
>> C*heers,
>> ---
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> 2017-04-16 16:04 GMT+02:00 Pranay akula :
>>
>>> Hi,
>>>
>>> what is *MemtableReclaimMemory* mean in nodetooltpstats ?? does this
>>> mean trying to flushing memtable from memory to SStables.
>>>
>>> I can see sometimes increase in pending tasks of  MemtableReclaimMemory
>>> in nodetool tpstats, at that time i can see increase in load on those nodes.
>>>
>>> Does decreasing memtable_cleanup_threshold will help ??
>>>
>>> Thanks
>>> Pranay.
>>>
>>
>>
>


Re: what is MemtableReclaimMemory mean ??

2017-05-01 Thread Chris Lohfink
Question though, how many tables do you have? If you have more than a few
hundreds it could be bottlenecking the flushing if it is flushing very
frequently.

On Mon, May 1, 2017 at 9:32 PM, Chris Lohfink  wrote:

> Theres a read barrier to stop reclaiming a memtable when there are
> requests actively reading it. The *MemtableReclaimMemory* pool offloads
> that wait instead of blocking the caller. It in itself is not going to use
> any cpu or increase load. It will however block the releasing of the
> memtable resources which might cause additional heap allocation pressure.
> Its more likely a symptom of GCs or reads being slow than the cause of the
> issue however.
>
> Chris
>
> On Mon, May 1, 2017 at 9:01 PM, Pranay akula 
> wrote:
>
>> Hi Alain,
>>
>> when  "*MemtableReclaimMemory*"  Pending Tasks increasing, its slowly
>> backing up reads and writes mostly writes. yes i am seeing bit high GC
>> pressure, currently we are using 24Gb Heap  and G1GC collection. I tried
>> changing Memtable flush threshold it did helped a little but not much. I am
>> not seeing any Errors in the Logs.
>>
>>
>> Thanks
>> Pranay.
>>
>> On Thu, Apr 27, 2017 at 6:08 AM, Alain RODRIGUEZ 
>> wrote:
>>
>>> Hi Pranay,
>>>
>>> According to http://docs.datastax.com/en/ca
>>> ssandra/3.0/cassandra/tools/toolsTPstats.html, "*MemtableReclaimMemory*"
>>> is the thread pool used for "Making unused memory available". I don't know
>>> much about it since it was never an issue for me. Neither did I heard much
>>> about it.
>>>
>>>
>>>- Are pending tasks staying high for a long period? `watch -d
>>>nodetool tpstats`
>>>- What are your GC settings?
>>>- Any other threads pending, blocked or dropped?
>>>- Do you have errors or warnings in your logs?
>>>- Any GC pressure? (monitored through charts or logs at INFO level,
>>>or WARN on recent versions)
>>>
>>>
>>> C*heers,
>>> ---
>>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>> 2017-04-16 16:04 GMT+02:00 Pranay akula :
>>>
 Hi,

 what is *MemtableReclaimMemory* mean in nodetooltpstats ?? does this
 mean trying to flushing memtable from memory to SStables.

 I can see sometimes increase in pending tasks of  MemtableReclaimMemory
 in nodetool tpstats, at that time i can see increase in load on those 
 nodes.

 Does decreasing memtable_cleanup_threshold will help ??

 Thanks
 Pranay.

>>>
>>>
>>
>


Weird Bootstrapping Issue

2017-05-01 Thread Gareth Collins
Hi,

We are running Cassandra 2.1.14 on an IBM AIX cluster using IBM Java 7
(1.7.1.64). I am having problems adding new nodes to the cluster. I am
seeing the following exception. It appears like the new node is
getting stuck trying to send the magic number on the first streaming
socket...whilst the receiving node never receives it and times out
after 10 seconds.

New Node:

INFO  [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,196
StreamSession.java:220 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /1.2.3.4

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:20,197
StreamSession.java:220 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Starting streaming to /5.6.7.8

INFO  [StreamConnectionEstablisher:1] 2017-04-28 17:39:20,209
StreamCoordinator.java:209 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session
with /1.2.3.4

INFO  [STREAM-IN-/1.2.3.4] 2017-04-28 17:39:20,276
StreamResultFuture.java:166 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92 ID#0] Prepare completed.
Receiving 2 files(43103 bytes), sending 0 files(0 bytes)

INFO  [StreamReceiveTask:2] 2017-04-28 17:39:20,410
StreamResultFuture.java:180 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /1.2.3.4 is
complete

ERROR [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,207
StreamSession.java:505 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Streaming error occurred

java.nio.channels.AsynchronousCloseException: null

at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:224)
~[na:1.7.0]

at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:538)
~[na:1.7.0]

at 
org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.ConnectionHandler$MessageHandler.sendInitMessage(ConnectionHandler.java:191)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:81)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:223)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:208)
[apache-cassandra-2.1.14.jar:2.1.14]

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
[na:1.7.0]

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627)
[na:1.7.0]

at java.lang.Thread.run(Thread.java:809) [na:1.7.0]

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,208
StreamResultFuture.java:180 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Session with /5.6.7.8 is
complete

WARN  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,211
StreamResultFuture.java:207 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92] Stream failed

INFO  [StreamConnectionEstablisher:2] 2017-04-28 17:39:30,212
StreamCoordinator.java:209 - [Stream
#22c10290-2c5b-11e7-a33c-8f9ab3a4bd92, ID#0] Beginning stream session
with /5.6.7.8

ERROR [main] 2017-04-28 17:39:30,213 CassandraDaemon.java:581 -
Exception encountered during startup

java.lang.RuntimeException: Error during boostrap: Stream failed

at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166)
~[apache-cassandra-2.1.14.jar:2.1.14]


Existing node:

DEBUG [ACCEPT-/5.6.7.8] 2017-04-28 17:39:29,914
MessagingService.java:1014 - Error reading the socket
Socket[addr=/9.0.1.2,port=55848,localport=7000]

java.net.SocketTimeoutException: null

at 
sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:242)
~[na:1.7.0]

at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:116)
~[na:1.7.0]

at java.io.DataInputStream.readFully(DataInputStream.java:207)
~[na:1.7.0]

at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0]

at 
org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:988)
~[apache-cassandra-2.1.14.jar:2.1.14]

TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,989
IncomingTcpConnection.java:92 - eof reading from socket; closing

java.io.EOFException: null

at java.io.DataInputStream.readFully(DataInputStream.java:209)
~[na:1.7.0]

at java.io.DataInputStream.readInt(DataInputStream.java:399) ~[na:1.7.0]

at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:171)
~[apache-cassandra-2.1.14.jar:2.1.14]

at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:88)
~[apache-cassandra-2.1.14.jar:2.1.14]

TRACE [MessagingService-Incoming-/9.0.1.2] 2017-04-28 17:39:29,990
IncomingTcpConnection.

Re: [Cassandra] nodetool compactionstats not showing pending task.

2017-05-01 Thread kurt greaves
I believe this is a bug with the estimation of tasks, however not aware of
any JIRA that covers the issue.

On 28 April 2017 at 06:19, Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in> wrote:

> Hi ,
>
>
>
> I will try with JMX but I try with tpstats. In tpstats its showing pending
> compaction as 0 but in nodetool compactionstats its showing 3. So, for me
> its seems strange.
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Sent:* Thursday, April 27, 2017 4:45 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [Cassandra] nodetool compactionstats not showing pending
> task.
>
>
>
> Maybe try to monitor through JMX with 
> 'org.apache.cassandra.db:type=CompactionManager',
> attribute 'Compactions' or 'CompactionsSummary'
>
>
>
> C*heers
>
> ---
>
> Alain Rodriguez - @arodream - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
>
>
> 2017-04-27 12:27 GMT+02:00 Alain RODRIGUEZ :
>
> Hi,
>
>
>
> I am not sure about this one. It happened to me in the past as well. I
> never really wondered about it as it was gone after a while or a restart
> off the top of my head. To get rid of it, a restart might be enough.
>
>
>
> But if you feel like troubleshooting this, I think the first thing is to
> try to see if compactions are really happening. Maybe using JMX, I believe
> `org.apache.cassandra.metrics:type=Compaction,name=PendingTasks` is what
> is used by 'nodetool compactionstats' but they might be more info there.
> Actually I don't really know what the 'system.compactions_in_progress'
> was replaced by, but any way to double check you could think of would
> probably help understanding better what's happening.
>
>
>
> Does someone now the way to check pending compactions details in 3.0.9?
>
>
>
> C*heers,
>
> ---
>
> Alain Rodriguez - @arodream - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
>
>
> 2017-04-25 15:13 GMT+02:00 Abhishek Kumar Maheshwari  timesinternet.in>:
>
> Hi All,
>
>
>
> In Production, I am using Cassandra 3.0.9.
>
>
>
> While I am running nodetool compactionstats command its just showing count
> not any other information like below:
>
>
>
> [mohit.kundra@AdtechApp bin]$ ./nodetool -h XXX.XX.XX.XX
> compactionstats
>
> pending tasks: 3
>
> [mohit.kundra@AdtechAppX bin]$
>
>
>
> So, this is some Cassandra bug or what? I am not able to understand.
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> "Learn journalism at India's largest media house - The Times of India
> Group. Last Date 28 April, 2017. Visit www.tcms.in for details."
>
>
>
>
>