Re: Node is UNREACHABLE after decommission

2020-09-19 Thread onmstester onmstester
Another workaround that i used for UNREACHABLE nodes problem, is to restart the 
whole cluster and it would be fixed, but i don't know if it cause any problem 
or not

Sent using https://www.zoho.com/mail/




 On Fri, 18 Sep 2020 01:19:35 +0430 Paulo Motta  
wrote 


Oh, if you're adding the same hosts to another cluster then the old cluster 
might try to communicate to the decommissioned nodes if you do that before the 
3 day grace period. The cluster name not matching is a good protection, 
otherwise the two clusters will connect to each other and mayhem will ensue. 
You definitely don't want this to happen!



Unfortunately this delay of 3 days is hard-coded and non-configurable before 
4.0 (see https://issues.apache.org/jira/browse/CASSANDRA-15596).



As long as all the old cluster nodes are UP and don't see the decommissioned 
node on "nodetool status", you can safely assassinate decommissioned nodes to 
prevent this.





The requirement that all nodes in the cluster are UP before safely 
assassinating a node is to prevent a down node from trying to connect to the 
decommissioned node after it recovers.





Em qui., 17 de set. de 2020 às 17:25, Krish Donald 
 escreveu:

Thanks Paulo,

We have to decommission multiple nodes from the  cluster and move those nodes 
to other clusters. 

So if we have to wait for 3 days for every node then it is going to take a lot 
of time.

If i am trying to add the decommissioned node to the other cluster it is giving 
me an error that cluster_name is not matching however cluster name is correct 
as per new cluster.

So until i issue assasinate , i am not able to move forward.  



On Thu, Sep 17, 2020 at 1:13 PM Paulo Motta  
wrote:

After decommissioning the node remains in gossip for a period of 3 days (if I 
recall correctly) and it will show up on describecluster during that period, so 
this is expected behavior. This allows other nodes that eventually were down 
when the node decommissioned to learn that this node left the cluster.



What assassinate does is remove the node from gossip, so that's why it no 
longer shows up on describecluster, but this shouldn't be necessary. You should 
check that the node successfully decommissioned if it doesn't show up on 
"nodetool status".



Em qui., 17 de set. de 2020 às 14:26, Krish Donald 
 escreveu:

We are on 3.11.5 opensource cassandra


On Thu, Sep 17, 2020 at 10:25 AM Krish Donald  
wrote:

Hi,

We decommissioned a node from the cluster.

On decommissioned node it said in system.log that node has been decommissioned .

But after couple of minutes only , on rest of the nodes the node is showing 
UNREACHABLE when we issue nodetool describecluster .



nodetool status is not showing the node however nodetool describecluster is 
showing UNREACHABLE.



I tried nodetool assassinate and now node is not showing in nodetool 
describecluster , however that seems to be the last option.



Ideally it should leave the cluster immediately after decommission.  

Once decommissioned is completed as per log then is there any issue in issuing 
nodetool assasinate ?



Thanks

Re: data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-19 Thread Sagar Jambhulkar
Don't really see a difference in two options. Won't the partitioner run on
user id and create a hash for you? Unless your hash function is better than
partitioner.

On Fri, 18 Sep 2020, 21:33 Attila Wind,  wrote:

> Hey guys,
>
> I'm curious about your experiences regarding a data modeling question we
> are facing with.
> At the moment we see 2 major different approaches in terms of how to build
> the tables
> But I'm googling around already for days with no luck to find any useful
> material explaining to me how a Map (as collection datatype) works on the
> storage engine, and what could surprise us later if we . So decided to ask
> this question... (If someone has some nice pointers here maybe that is also
> much appreciated!)
>
> So
> *To describe the problem* in a simplified form
>
>- Imagine you have users (everyone is identified with a UUID),
>- and we want to answer a simple question: "have we seen this guy
>before?"
>- we "just" want to be able to answer this question for a limited time
>- let's say for 3 months
>- but... there are lots of lots of users we run into... many
>millions / each day...
>- and ~15-20% of them are returning users only - so many guys we
>just might see once
>
> We are thinking about something like a big big Map, in a form of
> userId => lastSeenTimestamp
>
> Obviously if we would have something like that then answering the above
> question is simply:
> if(map.get(userId) != null)  => TRUE - we have seen the guy before
>
> Regarding the 2 major modelling approaches I mentioned above
>
> *Approach 1*
> Just simply use a table, something like this
>
> CREATE TABLE IF NOT EXISTS users (
> user_idvarchar,
> last_seenint,-- a UNIX timestamp is enough,
> thats why int
>
> PRIMARY KEY (user_id)
> ) 
> AND default_time_to_live = <3 months of seconds>;
>
>
> *Approach 2 *to do not produce that much rows, "cluster" the guys a bit
> together (into 1 row) so
> introduce a hashing function over the userId, producing a value btw [0;
> 1]
> and go with a table like
>
> CREATE TABLE IF NOT EXISTS users (
> user_id_hashint,
> users_seenmap,-- this is a userId =>
> last timestamp map
> PRIMARY KEY (user_id_hash)
> ) 
> AND default_time_to_live = <3 months of seconds>;-- yes, its
> clearly not a good enough way ...
>
>
> In theory:
>
>- on a WRITE path both representation gives us a way to do the write
>without the need of read
>- even the READ path is pretty efficient in both cases
>- Approach2 is worse definitely when we come to the cleanup - "remove
>info if older than 3 month"
>- Approach2 might affect the balance of the cluster more - thats clear
>(however not that much due to the "law of large number" and really enough
>random factors)
>
> And what we are struggling around is: what do you think
> *Which approach would be better over time? *So will slow down the cluster
> less considering in compaction etc etc
>
> As far as we can see the real question is:
>
> which hurts more?
>
>- much more rows, but very small rows (regarding data size), or
>- much less rows, but much bigger rows (regarding data size)
>
> ?
>
> Any thoughts, comments, pointers to some related case studies, articles,
> etc is highly appreciated!! :-)
>
> thanks!
> --
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
>
>
>


Re: data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-19 Thread Rahul Singh
Not necessarily. A deterministic hash randomizes a key that may be susceptible 
to “clustering” that also may need to be used in other non Cassandra systems.

This way records can be accessed in both systems while leveraging the 
partitioner in Cassandra without pitfalls.

The same can be done with natural string keys like “email.”

Best regards,
Rahul Singh
From: Sagar Jambhulkar 
Sent: Saturday, September 19, 2020 6:45:25 AM
To: user@cassandra.apache.org ; Attila Wind 

Subject: Re: data modeling qu: use a Map datatype, or just simple rows... ?

Don't really see a difference in two options. Won't the partitioner run on user 
id and create a hash for you? Unless your hash function is better than 
partitioner.

On Fri, 18 Sep 2020, 21:33 Attila Wind,  wrote:
> Hey guys,
> I'm curious about your experiences regarding a data modeling question we are 
> facing with.
> At the moment we see 2 major different approaches in terms of how to build 
> the tables
> But I'm googling around already for days with no luck to find any useful 
> material explaining to me how a Map (as collection datatype) works on the 
> storage engine, and what could surprise us later if we . So decided to ask 
> this question... (If someone has some nice pointers here maybe that is also 
> much appreciated!)
> So
> To describe the problem in a simplified form
>  • Imagine you have users (everyone is identified with a UUID),
>  • and we want to answer a simple question: "have we seen this guy before?"
>  • we "just" want to be able to answer this question for a limited time - 
> let's say for 3 months
>  • but... there are lots of lots of users we run into... many millions / 
> each day...
>  • and ~15-20% of them are returning users only - so many guys we just 
> might see once
> We are thinking about something like a big big Map, in a form of
>     userId => lastSeenTimestamp
>
> Obviously if we would have something like that then answering the above 
> question is simply:
>     if(map.get(userId) != null)  => TRUE - we have seen the guy before
> Regarding the 2 major modelling approaches I mentioned above
>
> Approach 1
> Just simply use a table, something like this
>
> CREATE TABLE IF NOT EXISTS users (
>     user_id            varchar,
>     last_seen        int,                -- a UNIX timestamp is enough, thats 
> why int
>     PRIMARY KEY (user_id)
> ) 
> AND default_time_to_live = <3 months of seconds>;
> Approach 2
>  to do not produce that much rows, "cluster" the guys a bit together (into 1 
> row) so
> introduce a hashing function over the userId, producing a value btw [0; 1]
> and go with a table like
> CREATE TABLE IF NOT EXISTS users (
>     user_id_hash    int,
>     users_seen        map,            -- this is a userId => last 
> timestamp map
>     PRIMARY KEY (user_id_hash)
> ) 
> AND default_time_to_live = <3 months of seconds>;        -- yes, its clearly 
> not a good enough way ...
>
> In theory:
>  • on a WRITE path both representation gives us a way to do the write without 
> the need of read
>  • even the READ path is pretty efficient in both cases
>  • Approach2 is worse definitely when we come to the cleanup - "remove info 
> if older than 3 month"
>  • Approach2 might affect the balance of the cluster more - thats clear 
> (however not that much due to the "law of large number" and really enough 
> random factors)
> And what we are struggling around is: what do you think
> Which approach would be better over time? So will slow down the cluster less 
> considering in compaction etc etc
> As far as we can see the real question is:
> which hurts more?
>  • much more rows, but very small rows (regarding data size), or
>  • much less rows, but much bigger rows (regarding data size)
> ?
> Any thoughts, comments, pointers to some related case studies, articles, etc 
> is highly appreciated!! :-)
> thanks!
> --
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
>
>


Re: Cassandra timeout during read query

2020-09-19 Thread Deepak Sharma
Thanks Attila and Aaron for the response. These are great insights. I will
check and get back to you in case I have any questions.

Best,
Deepak

On Tue, Sep 15, 2020 at 4:33 AM Attila Wind  wrote:

> Hi Deepak,
>
> Aaron has right - in order being able to help (better) you need to share
> those details
>
> That 5 secs timeout comes from the coordinator node I think - see
> cassandra.yaml "read_request_timeout_in_ms" setting - that is influencing
> this
>
> But it does not matter too much... The point is that none of the replicas
> could completed your query within that 5 secs. And this is a clean
> indication of something is slow with your query.
> Maybe 4) is a bit less important here, or I would a bit make it more
> precise: considered with your fetchSize together (driver setting on the
> query level)
>
> By experience one reason could be if the query which used to works starts
> not to work any longer is growing number of data. And a possible "wide
> cluster" problem.
> Do you have monitoring on the Cassandra machines? What does iowait show?
> (for us when things like this will start happening is a clean indication)
>
> cheers
> Attila Wind
>
> http://www.linkedin.com/in/attilaw
> Mobile: +49 176 43556932
>
>
> 14.09.2020 18:36 keltezéssel, Aaron Ploetz írta:
>
> Deepak,
>
> Can you reply with:
>
> 1) The query you are trying to run.
> 2) The table definition (PRIMARY KEY, specifically).
> 3) Maybe a little description of what the table is designed to do.
> 4) How much data you're expecting returned (both # of rows and data size).
>
> Thanks,
>
> Aaron
>
>
> On Mon, Sep 14, 2020 at 10:58 AM Deepak Sharma
> 
>  wrote:
>
>> Hi There,
>>
>> We are running into a strange issue in our Cassandra Cluster where one
>> specific query is failing with following error:
>>
>> Cassandra timeout during read query at consistency QUORUM (3 responses
>> were required but only 0 replica responded)
>>
>> This is not a typical query read timeout that we know for sure. This
>> error is getting spit out within 5 seconds and the query timeout we have
>> set is around 30 seconds
>>
>> Can we know what is happening here and how can we reproduce this in our
>> local environment?
>>
>> Thanks,
>> Deepak
>>
>>