Python driver concistency problem

2019-05-22 Thread Vlad
Hi,
we have three nodes cluster with KS defined as 

CREATE KEYSPACE someks WITH REPLICATION = { 'class' : 
'org.apache.cassandra.locator.NetworkTopologyStrategy', 'some-dc': '3' } AND 
DURABLE_WRITES = true;

next I read with Pyhton (cassandra-driver 3.11) from Cassandra 3.11.3 and get 
error 

Error from server: code=1200 [Coordinator node timed out waiting for replica 
nodes' responses] message="Operation timed out - received only 2 responses." 
info={'received_responses': 2, 'required_responses': 3, 'consistency': 'ALL'}
nor session.default_consistency_level = ConsistencyLevel.QUORUM neither 
query = SimpleStatement("SELECT **", 
consistency_level=ConsistencyLevel.QUORUM)
rows = session.execute(query)

helps. What it could be?

Thanks in advance.



Re: Python driver concistency problem

2019-05-22 Thread Chakravarthi Manepalli
Hi Vlad,

Maybe the consistency level has been set manually in CQLSH. Did you try
checking your consistency level and set it back to normal? (Just a thought,
Not sure!!)

On Wed, May 22, 2019 at 2:00 PM Vlad  wrote:

> Hi,
>
> we have three nodes cluster with KS defined as
>
> *CREATE KEYSPACE someks WITH REPLICATION = { 'class' :
> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'some-dc': '3' }
> AND DURABLE_WRITES = true;*
>
> next I read with Pyhton (cassandra-driver 3.11) from Cassandra 3.11.3 and
> get error
>
> *Error from server: code=1200 [Coordinator node timed out waiting for
> replica nodes' responses] message="Operation timed out - received only 2
> responses." info={'received_responses': 2, 'required_responses': 3,
> 'consistency': 'ALL'}*
>
> nor *session.default_consistency_level = ConsistencyLevel.QUORUM* neither
>
> *query = SimpleStatement("SELECT **",
> consistency_level=ConsistencyLevel.QUORUM)rows = session.execute(query)*
>
> helps. What it could be?
>
> Thanks in advance.
>
>


Re: Python driver concistency problem

2019-05-22 Thread Vlad
Hi,
I do reads in my own Python code, how cqlsh can affect it? 

On Wednesday, May 22, 2019 12:02 PM, Chakravarthi Manepalli 
 wrote:
 

 Hi Vlad,
Maybe the consistency level has been set manually in CQLSH. Did you try 
checking your consistency level and set it back to normal? (Just a thought, Not 
sure!!)
On Wed, May 22, 2019 at 2:00 PM Vlad  wrote:

Hi,
we have three nodes cluster with KS defined as 

CREATE KEYSPACE someks WITH REPLICATION = { 'class' : 
'org.apache.cassandra.locator.NetworkTopologyStrategy', 'some-dc': '3' } AND 
DURABLE_WRITES = true;

next I read with Pyhton (cassandra-driver 3.11) from Cassandra 3.11.3 and get 
error 

Error from server: code=1200 [Coordinator node timed out waiting for replica 
nodes' responses] message="Operation timed out - received only 2 responses." 
info={'received_responses': 2, 'required_responses': 3, 'consistency': 'ALL'}
nor session.default_consistency_level = ConsistencyLevel.QUORUM neither 
query = SimpleStatement("SELECT **", 
consistency_level=ConsistencyLevel.QUORUM)
rows = session.execute(query)

helps. What it could be?

Thanks in advance.




   

Re: Python driver concistency problem

2019-05-22 Thread shalom sagges
In a lot of cases, the issue is with the data model.
Can you describe the table?
Can you provide the query you use to retrieve the data?
What's the load on your cluster?
Are there lots of tombstones?

You can set the consistency level to ONE, just to check if you get
responses. Although normally I would never use ALL unless I run a DDL
command.
I prefer local_quorum if I want my consistency to be strong while keeping
Cassandra's high availability.

Regards,


Re: Python driver concistency problem

2019-05-22 Thread Vlad
That's the issue - I do not use consistency ALL. I set QUORUM or ONE but it 
still performs with ALL. 

On Wednesday, May 22, 2019 12:42 PM, shalom sagges  
wrote:
 

 In a lot of cases, the issue is with the data model. 
Can you describe the table?
Can you provide the query you use to retrieve the data?What's the load on your 
cluster?Are there lots of tombstones?
You can set the consistency level to ONE, just to check if you get responses. 
Although normally I would never use ALL unless I run a DDL command. 
I prefer local_quorum if I want my consistency to be strong while keeping 
Cassandra's high availability. 

Regards,






   

Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-22 Thread Vsevolod Filaretov
Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced data. Usual
per-partition selects work somewhat fine, and are processed by limited
number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
such command stalls all 8 nodes to halt and unresponsiveness to external
requests while disk IO jumps to 100% across whole cluster. In several
minutes all nodes seem to finish ptocessing the request and cluster goes
back to being responsive. Replication level across whole data is 3.

1) Why such behavior? I thought any given SELECT request is handled by a
limited subset of C* nodes and not by all of them, as per connection
consistency/table replication settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.


Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-22 Thread shalom sagges
Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is handled by a
limited subset of C* nodes and not by all of them, as per connection
consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know where the
data is located, so it has to go node by node, searching for the requested
data.

2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
I'm not familiar with such a flag. In my case, I just try to educate the
R&D teams.

Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
wrote:

> Hello everyone,
>
> We have an 8 node C* cluster with large volume of unbalanced data. Usual
> per-partition selects work somewhat fine, and are processed by limited
> number of nodes, but if user issues SELECT WHERE IN () ALLOW FILTERING,
> such command stalls all 8 nodes to halt and unresponsiveness to external
> requests while disk IO jumps to 100% across whole cluster. In several
> minutes all nodes seem to finish ptocessing the request and cluster goes
> back to being responsive. Replication level across whole data is 3.
>
> 1) Why such behavior? I thought any given SELECT request is handled by a
> limited subset of C* nodes and not by all of them, as per connection
> consistency/table replication settings, in case.
>
> 2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
>
> Thank you all very much in advance,
> Vsevolod Filaretov.
>


Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-05-22 Thread Hiroyuki Yamada
Hi,

FYI: I created a bug ticket since I think the behavior is just not right.
https://issues.apache.org/jira/browse/CASSANDRA-15138

Thanks,
Hiro

On Mon, May 13, 2019 at 10:58 AM Hiroyuki Yamada  wrote:

> Hi,
>
> Should I post a bug ?
> It doesn't seem to be an expected behavior,
> so I think it should be at least documented somewhere.
>
> Thanks,
> Hiro
>
>
> On Fri, Apr 26, 2019 at 3:17 PM Hiroyuki Yamada 
> wrote:
>
>> Hello,
>>
>> Thank you for some feedbacks.
>>
>> >Ben
>> Thank you.
>> I've tested with lower concurrency in my side, the issue still occurs.
>> We are using 3 x T3.xlarge instances for C* and small and separate
>> instance for the client program.
>> But if we tried with 1 host with 3 C* nodes, the issue didn't occur.
>>
>> > Alok
>> We also thought so and tested with hints disabled, but it doesn't make
>> any difference. (the issue still occurs)
>>
>> Thanks,
>> Hiro
>>
>>
>>
>>
>> On Fri, Apr 26, 2019 at 8:19 AM Alok Dwivedi <
>> alok.dwiv...@instaclustr.com> wrote:
>>
>>> Could it be related to hinted hand offs being stored in Node1 and then
>>> attempted to be replayed in Node2 when it comes back causing more load as
>>> new mutations are also being applied from cassandra-stress at same time?
>>>
>>> Alok Dwivedi
>>> Senior Consultant
>>> https://www.instaclustr.com/
>>>
>>>
>>>
>>>
>>> On 26 Apr 2019, at 09:04, Ben Slater  wrote:
>>>
>>> In the absence of anyone else having any bright ideas - it still sounds
>>> to me like the kind of scenario that can occur in a heavily overloaded
>>> cluster. I would try again with a lower load.
>>>
>>> What size machines are you using for stress client and the nodes? Are
>>> they all on separate machines?
>>>
>>> Cheers
>>> Ben
>>>
>>> ---
>>>
>>>
>>> *Ben Slater**Chief Product Officer*
>>>
>>> 
>>>
>>> 
>>> 
>>> 
>>>
>>> Read our latest technical blog posts here
>>> .
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>> On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada 
>>> wrote:
>>>
 Hello,

 Sorry again.
 We found yet another weird thing in this.
 If we stop nodes with systemctl or just kill (TERM), it causes the
 problem,
 but if we kill -9, it doesn't cause the problem.

 Thanks,
 Hiro

 On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada 
 wrote:

> Sorry, I didn't write the version and the configurations.
> I've tested with C* 3.11.4, and
> the configurations are mostly set to default except for the
> replication factor and listen_address for proper networking.
>
> Thanks,
> Hiro
>
> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada 
> wrote:
>
>> Hello Ben,
>>
>> Thank you for the quick reply.
>> I haven't tried that case, but it does't recover even if I stopped
>> the stress.
>>
>> Thanks,
>> Hiro
>>
>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater <
>> ben.sla...@instaclustr.com> wrote:
>>
>>> Is it possible that stress is overloading node 1 so it’s not
>>> recovering state properly when node 2 comes up? Have you tried running 
>>> with
>>> a lower load (say 2 or 3 threads)?
>>>
>>> Cheers
>>> Ben
>>>
>>> ---
>>>
>>>
>>> *Ben Slater*
>>> *Chief Product Officer*
>>>
>>>
>>> 
>>> 
>>> 
>>>
>>> Read our latest technical blog posts here
>>> .
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not 
>>> copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the 
>>> message.
>>>
>>>
>>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada 
>>> wrote:
>>>
 Hello,

 I faced a weird issue when recovering a cluster after two nodes are
 stopped.
 It is easily reproduce-able and looks like a bug or an issue to fix,
 so let me write down the steps to reproduce.

 === STEPS TO REPRODUCE ===
 * Create

Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-22 Thread Attila Wind

Hi,

"When you run a query with allow filtering, Cassandra doesn't know where 
the data is located, so it has to go node by node, searching for the 
requested data."


a) Interesting... But only in case you do not provide partitioning key 
right? (so IN() is for partitioning key?)


b) Still does not explain or justify "all 8 nodes to halt and 
unresponsiveness to external requests" behavior... Even if servers are 
busy with the request seriously becoming non-responsive...?


cheers

Attila Wind

http://www.linkedin.com/in/attilaw
Mobile: +36 31 7811355


On 2019. 05. 23. 0:37, shalom sagges wrote:

Hi Vsevolod,

1) Why such behavior? I thought any given SELECT request is handled by 
a limited subset of C* nodes and not by all of them, as per connection 
consistency/table replication settings, in case.
When you run a query with allow filtering, Cassandra doesn't know 
where the data is located, so it has to go node by node, searching for 
the requested data.


2) Is it possible to forbid ALLOW FILTERING flag for given users/groups?
I'm not familiar with such a flag. In my case, I just try to educate 
the R&D teams.


Regards,

On Wed, May 22, 2019 at 5:01 PM Vsevolod Filaretov 
mailto:vsfilare...@gmail.com>> wrote:


Hello everyone,

We have an 8 node C* cluster with large volume of unbalanced data.
Usual per-partition selects work somewhat fine, and are processed
by limited number of nodes, but if user issues SELECT WHERE IN ()
ALLOW FILTERING, such command stalls all 8 nodes to halt and
unresponsiveness to external requests while disk IO jumps to 100%
across whole cluster. In several minutes all nodes seem to finish
ptocessing the request and cluster goes back to being responsive.
Replication level across whole data is 3.

1) Why such behavior? I thought any given SELECT request is
handled by a limited subset of C* nodes and not by all of them, as
per connection consistency/table replication settings, in case.

2) Is it possible to forbid ALLOW FILTERING flag for given
users/groups?

Thank you all very much in advance,
Vsevolod Filaretov.



Necessary consistency level for LWT writes

2019-05-22 Thread Craig Pastro
Hello!

I am trying to understand the consistency level (not serial consistency)
required for LWTs. Basically what I am trying to understand is that if a
consistency level of ONE is enough for a LWT write operation if I do my
read with a consistency level of SERIAL?

It would seem so based on what is written for the datastax python driver:

http://datastax.github.io/python-driver/api/cassandra/query.html#cassandra.query.Statement.serial_consistency_level

However, that is the only place that I can find this information so I am a
little hesitant to believe it 100%.

By the way, I did find basically the same question (
https://www.mail-archive.com/user@cassandra.apache.org/msg45453.html) but I
am unsure if the answer there really answers my question.

Thank you in advance for any help!

Best regards,
Craig