different query result after a rerun of the same query

2019-04-29 Thread Marco Gasparini
Hi all,

I'm using Cassandra 3.11.3.5.

I have just noticed that when I perform a query I get 0 result but if I
launch that same query after few seconds I get the right result.

I have traced the query:

cqlsh> select event_datetime, id_url, uuid, num_pages from
mkp_history.mkp_lookup where id_url= 1455425 and url_type='mytype' ;

 event_datetime | id_url | uuid | num_pages
++--+---

(0 rows)

Tracing session: dda9d1a0-6a51-11e9-9e36-f54fe3235e69

 activity

   | timestamp  | source| source_elapsed |
client
--++---++---


 Execute CQL3 query | 2019-04-29 09:39:05.53 | 10.8.0.10 |
0 | 10.8.0.10
 Parsing select event_datetime, id_url, uuid, num_pages from
mkp_history.mkp_lookup where id_url= 1455425 and url_type=' mytype'\n;
[Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
238 | 10.8.0.10

Preparing statement
[Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
361 | 10.8.0.10

   reading data from /10.8.0.38
[Native-Transport-Requests-2] | 2019-04-29 09:39:05.531000 | 10.8.0.10 |
527 | 10.8.0.10

  Sending READ message to /10.8.0.38
[MessagingService-Outgoing-/10.8.0.38-Small] | 2019-04-29 09:39:05.531000 |
10.8.0.10 |620 | 10.8.0.10

 READ message received from /10.8.0.10
[MessagingService-Incoming-/10.8.0.10] | 2019-04-29 09:39:05.535000 |
10.8.0.8 | 44 | 10.8.0.10

speculating read retry on /10.8.0.8
[Native-Transport-Requests-2] | 2019-04-29 09:39:05.535000 | 10.8.0.10 |
   4913 | 10.8.0.10

 Executing single-partition query on mkp_lookup
[ReadStage-2] | 2019-04-29 09:39:05.535000 |  10.8.0.8 |304 |
10.8.0.10

Sending READ message to /10.8.0.8
[MessagingService-Outgoing-/10.8.0.8-Small] | 2019-04-29 09:39:05.535000 |
10.8.0.10 |   4970 | 10.8.0.10

   Acquiring sstable references
[ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |391 |
10.8.0.10

 Bloom filter allows skipping sstable 1
[ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |490 |
10.8.0.10

  Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
[ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |549 |
10.8.0.10

  Merged data from memtables and 0 sstables
[ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |697 |
10.8.0.10

 Read 0 live rows and 0 tombstone cells
[ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |808 |
10.8.0.10

   Enqueuing response to /10.8.0.10
[ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |896 |
10.8.0.10

Sending REQUEST_RESPONSE message to /10.8.0.10
[MessagingService-Outgoing-/10.8.0.10-Small] | 2019-04-29 09:39:05.536000
|  10.8.0.8 |   1141 | 10.8.0.10

   REQUEST_RESPONSE message received from /10.8.0.8
[MessagingService-Incoming-/10.8.0.8] | 2019-04-29 09:39:05.539000 |
10.8.0.10 |   8627 | 10.8.0.10

  Processing response from /10.8.0.8
[RequestResponseStage-3] | 2019-04-29 09:39:05.539000 | 10.8.0.10 |
   8739 | 10.8.0.10


 Request complete | 2019-04-29 09:39:05.538823 | 10.8.0.10 |   8823
| 10.8.0.10



And here I rerun the query just after few seconds:


cqlsh> select event_datetime, id_url, uuid, num_pages from
mkp_history.mkp_lookup where id_url= 1455425 and url_type='mytype';

 event_datetime  | id_url  | uuid
   | num_pages
-+-+--+---
 2019-04-15 21:32:27.031000+ | 1455425 |
91114c7d-3dd3-4913-ac9c-0dfa12b4198b | 1
 2019-04-14 21:34:23.63+ | 1455425 |
e97b160d-3901-4550-9ce6-36893a6dcd90 | 1
 2019-04-11 21:57:23.025000+ | 1455425 |
1566cc7c-7893-43f0-bffe-caab47dec851 | 1

(3 rows)

Tracing session: f4b7eb20-6a51-11e9-9e36-f54fe3235e69

 activity

 | timestamp  | source| source_elapsed |
client
++---++---


 Execute CQL3 query | 2019-04-29 09:39:44.21 | 10.8.0.10 |
0 | 10.8.0.10
 Parsing select event_datetime, id_url, uuid, num_pages from
mkp_history.mkp_lookup where id_url= 1455425 and url_type='m

Re: different query result after a rerun of the same query

2019-04-29 Thread Ben Slater
You haven’t said what consistency level you are using. CQLSH by default
uses consistency level one which may be part of the issue - try using a
higher level (eg CONSISTENCY QUOROM).

After results are returned correctly are they then returned correctly for
all future runs? When was the data inserted (relative to your attempt to
query it)?

Cheers
Ben

---


*Ben Slater**Chief Product Officer*



   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Mon, 29 Apr 2019 at 17:57, Marco Gasparini <
marco.gaspar...@competitoor.com> wrote:

> Hi all,
>
> I'm using Cassandra 3.11.3.5.
>
> I have just noticed that when I perform a query I get 0 result but if I
> launch that same query after few seconds I get the right result.
>
> I have traced the query:
>
> cqlsh> select event_datetime, id_url, uuid, num_pages from
> mkp_history.mkp_lookup where id_url= 1455425 and url_type='mytype' ;
>
>  event_datetime | id_url | uuid | num_pages
> ++--+---
>
> (0 rows)
>
> Tracing session: dda9d1a0-6a51-11e9-9e36-f54fe3235e69
>
>  activity
>
>| timestamp  | source| source_elapsed |
> client
>
> --++---++---
>
>
>  Execute CQL3 query | 2019-04-29 09:39:05.53 | 10.8.0.10 |
> 0 | 10.8.0.10
>  Parsing select event_datetime, id_url, uuid, num_pages from
> mkp_history.mkp_lookup where id_url= 1455425 and url_type=' mytype'\n;
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
> 238 | 10.8.0.10
>
>   Preparing statement
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
> 361 | 10.8.0.10
>
>  reading data from /10.8.0.38
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.531000 | 10.8.0.10 |
> 527 | 10.8.0.10
>
> Sending READ message to /10.8.0.38
> [MessagingService-Outgoing-/10.8.0.38-Small] | 2019-04-29 09:39:05.531000 |
> 10.8.0.10 |620 | 10.8.0.10
>
>READ message received from /10.8.0.10
> [MessagingService-Incoming-/10.8.0.10] | 2019-04-29 09:39:05.535000 |
> 10.8.0.8 | 44 | 10.8.0.10
>
>   speculating read retry on /10.8.0.8
> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.535000 | 10.8.0.10 |
>4913 | 10.8.0.10
>
>Executing single-partition query on
> mkp_lookup [ReadStage-2] | 2019-04-29 09:39:05.535000 |  10.8.0.8 |
> 304 | 10.8.0.10
>
>   Sending READ message to /10.8.0.8
> [MessagingService-Outgoing-/10.8.0.8-Small] | 2019-04-29 09:39:05.535000 |
> 10.8.0.10 |   4970 | 10.8.0.10
>
>  Acquiring sstable
> references [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 391 | 10.8.0.10
>
>Bloom filter allows skipping sstable
> 1 [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |490 |
> 10.8.0.10
>
> Skipped 0/1 non-slice-intersecting sstables, included 0 due to
> tombstones [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 549 | 10.8.0.10
>
> Merged data from memtables and 0
> sstables [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
>   697 | 10.8.0.10
>
>Read 0 live rows and 0 tombstone
> cells [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 808 | 10.8.0.10
>
>  Enqueuing response to /
> 10.8.0.10 [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
> 896 | 10.8.0.10
>
> Sending REQUEST_RESPONSE message to /10.8.0.10
> [MessagingService-Outgoing-/10.8.0.10-Small] | 2019-04-29 09:39:05.536000
> |  10.8.0.8 |   1141 | 10.8.0.10
>
>  REQUEST_RESPONSE message received from /10.8.0.8
> [MessagingService-Incoming-/10.8.0.8] | 2019-04-29 09:39:05.539000 |
> 10.8.0.10 |   8627 | 10.8.0.10
>
> Processing response from /10.8.0.8
> [RequestResponseStage-3] | 2019-04-29 09:39:05.539000 | 10.8.0.10 |
>8739 | 10.8.0.10
>
>
>  Request complete | 2019-04-29

Re: Re: Re: how to configure the Token Allocation Algorithm

2019-04-29 Thread Jean Carlo
Hello Anthony,

Effectively I did not start the seed of every rack firsts. Thank you for
the post. I believe this is something important to have as official
documentation in cassandra.apache.org. This issues as many others are not
documented properly.

Of course I find the blog of last pickle very useful in this matters, but
having a properly documentation of how to start a fresh new cluster
cassandra is basic.

I have one question about your post, when you mention
"*However, therein lies the problem, for existing clusters updating this
setting is easy, as a keyspace already exists*"
What is the interest to use allocate_tokens_for_keyspace in a cluster with
data if there tokens are already distributed? in the worst case scenario,
the cluster is already unbalanced


Cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso 
wrote:

> Hi Jean,
>
> It sounds like there are no nodes in one of the racks for the eu-west-3
> datacenter. What does the output of nodetool status look like currently?
>
> Note, you will need to start a node in each rack before creating the
> keyspace. I wrote a blog post with the procedure to set up a new cluster
> using the predictive token allocation algorithm:
> http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
> Regards,
> Anthony
>
> On Fri, 26 Apr 2019 at 19:53, Jean Carlo 
> wrote:
>
>> Creating a fresh new cluster in aws using this procedure, I got this
>> problem once I am bootstrapping the second rack of the cluster of 6
>> machines with 3 racks and a keyspace of rf 3
>>
>> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
>> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
>> 3265006217757525070, 5054577454645148534, 314677103601736696,
>> 7660890915606146375, -5329427405842523680]
>> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
>> configuration error
>> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
>> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
>> replication factor 3.
>>
>> Someone got this problem ?
>>
>> I am not quite sure why I have this, since my cluster has 3 racks.
>>
>> Cluster Information:
>> Name: test
>> Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>> DynamicEndPointSnitch: enabled
>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> Schema versions:
>> 3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>>
>>
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>>
>> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami 
>> wrote:
>>
>>> Hi folks,
>>>
>>> What about adding new keyspaces in the existing cluster, test_2 with
>>> the same RF.
>>>
>>> It will use the same logic as the existing kesypace test ? Or I should
>>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>>
>>> Thanks.
>>>
>>> Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
>>> écrit :
>>>
 Hi,

 Managing `initial_token` by yourself will give you more control over
 scale-in and scale-out.
 Let's say you have three node cluster with `num_token: 1`

 And your initial range looks like:-

 Datacenter: datacenter1
 ==
 AddressRackStatus State   LoadOwns
Token

  3074457345618258602

 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
-9223372036854775808
 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
-3074457345618258603
 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
3074457345618258602

 Now let's say you want to scale out the cluster to twice the current
 throughput(means you are adding 3 more nodes)

 If you are using AWS EBS volumes then you can use the same volumes and
 spin three more nodes by selecting midpoints of existing ranges which means
 your new nodes are already having data.
 Once you have mounted volumes on your new nodes:-
 * You need to delete every system table except schema related tables.
 * You need to generate system/local table by yourself which has
 `Bootstrap state` as completed and schema-version same as other existing
 nodes.
 * You need to remove extra data on all the machines using cleanup
 commands

 This is how you can scale out Cassandra cluster in the minutes. In case
 you want to add nodes one by one then you need to write some small tool
 which will always figure out the bigger range in the existing cluster and
 will split it into the half.

 However, I never tested it thoroughly but this should work
 conceptually. So here we are taking advantage of the fact that we have
 volumes(data) for the new node beforehand so we no need to boot

Re: different query result after a rerun of the same query

2019-04-29 Thread Marco Gasparini
thank you Ben for the reply.

> You haven’t said what consistency level you are using. CQLSH by default
uses consistency level one which may be part of the issue - try using a
higher level (eg CONSISTENCY QUOROM)
yes, actually I used CQLSH so the consistency level was set to ONE. After I
changed it I get the right results.

>After results are returned correctly are they then returned correctly for
all future runs?
yes it seems that after they returned I can get access to them at each run
of the same query on each node i run it.

> When was the data inserted (relative to your attempt to query it)?
about a day before the query


Thanks


Il giorno lun 29 apr 2019 alle ore 10:29 Ben Slater <
ben.sla...@instaclustr.com> ha scritto:

> You haven’t said what consistency level you are using. CQLSH by default
> uses consistency level one which may be part of the issue - try using a
> higher level (eg CONSISTENCY QUOROM).
>
> After results are returned correctly are they then returned correctly for
> all future runs? When was the data inserted (relative to your attempt to
> query it)?
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater**Chief Product Officer*
>
> 
>
>    
>
>
> Read our latest technical blog posts here
> .
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Mon, 29 Apr 2019 at 17:57, Marco Gasparini <
> marco.gaspar...@competitoor.com> wrote:
>
>> Hi all,
>>
>> I'm using Cassandra 3.11.3.5.
>>
>> I have just noticed that when I perform a query I get 0 result but if I
>> launch that same query after few seconds I get the right result.
>>
>> I have traced the query:
>>
>> cqlsh> select event_datetime, id_url, uuid, num_pages from
>> mkp_history.mkp_lookup where id_url= 1455425 and url_type='mytype' ;
>>
>>  event_datetime | id_url | uuid | num_pages
>> ++--+---
>>
>> (0 rows)
>>
>> Tracing session: dda9d1a0-6a51-11e9-9e36-f54fe3235e69
>>
>>  activity
>>
>>  | timestamp  | source| source_elapsed
>> | client
>>
>> --++---++---
>>
>>
>>  Execute CQL3 query | 2019-04-29 09:39:05.53 | 10.8.0.10 |
>> 0 | 10.8.0.10
>>  Parsing select event_datetime, id_url, uuid, num_pages from
>> mkp_history.mkp_lookup where id_url= 1455425 and url_type=' mytype'\n;
>> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
>> 238 | 10.8.0.10
>>
>>   Preparing statement
>> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
>> 361 | 10.8.0.10
>>
>>  reading data from /10.8.0.38
>> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.531000 | 10.8.0.10 |
>> 527 | 10.8.0.10
>>
>> Sending READ message to /10.8.0.38
>> [MessagingService-Outgoing-/10.8.0.38-Small] | 2019-04-29 09:39:05.531000 |
>> 10.8.0.10 |620 | 10.8.0.10
>>
>>READ message received from /10.8.0.10
>> [MessagingService-Incoming-/10.8.0.10] | 2019-04-29 09:39:05.535000 |
>> 10.8.0.8 | 44 | 10.8.0.10
>>
>>   speculating read retry on /10.8.0.8
>> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.535000 | 10.8.0.10 |
>>4913 | 10.8.0.10
>>
>>Executing single-partition query on
>> mkp_lookup [ReadStage-2] | 2019-04-29 09:39:05.535000 |  10.8.0.8 |
>> 304 | 10.8.0.10
>>
>>   Sending READ message to /10.8.0.8
>> [MessagingService-Outgoing-/10.8.0.8-Small] | 2019-04-29 09:39:05.535000 |
>> 10.8.0.10 |   4970 | 10.8.0.10
>>
>>  Acquiring sstable
>> references [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
>> 391 | 10.8.0.10
>>
>>Bloom filter allows skipping sstable
>> 1 [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |490 |
>> 10.8.0.10
>>
>> Skipped 0/1 non-slice-intersecting sstables, included 0 due to
>> tombstones [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
>> 549 | 10.8.0.10
>>
>> Merged data from memtables and 0
>> sstables [ReadStage-2] | 2019-04-29 09:39:05.536000 |  10.8.0.8 |
>> 

Increasing the size limits implications

2019-04-29 Thread Bobbie Haynes
Hi,
  I'm inserting into cassandra in batches(With each containing single
PK ).But my batch is failing and throwing exceptions.
I want to know if we increase batch_size_warn_threshold_in_kb to 200KB
and batch_size_fail_threshold_in_kb
to 300KB. What could be potential issues i could be facing

Thanks,
Bobbie


Re: different query result after a rerun of the same query

2019-04-29 Thread Ben Slater
My guess is the initial query was causing a read repair so, on subsequent
queries, there were replicas of the data on every node and it still got
returned at consistency one.

There are a number of ways the data could have become inconsistent in the
first place - eg  badly overloaded or down nodes, changes in topology
without following proper procedure, etc.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*



   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Mon, 29 Apr 2019 at 19:50, Marco Gasparini <
marco.gaspar...@competitoor.com> wrote:

> thank you Ben for the reply.
>
> > You haven’t said what consistency level you are using. CQLSH by default
> uses consistency level one which may be part of the issue - try using a
> higher level (eg CONSISTENCY QUOROM)
> yes, actually I used CQLSH so the consistency level was set to ONE. After
> I changed it I get the right results.
>
> >After results are returned correctly are they then returned correctly for
> all future runs?
> yes it seems that after they returned I can get access to them at each run
> of the same query on each node i run it.
>
> > When was the data inserted (relative to your attempt to query it)?
> about a day before the query
>
>
> Thanks
>
>
> Il giorno lun 29 apr 2019 alle ore 10:29 Ben Slater <
> ben.sla...@instaclustr.com> ha scritto:
>
>> You haven’t said what consistency level you are using. CQLSH by default
>> uses consistency level one which may be part of the issue - try using a
>> higher level (eg CONSISTENCY QUOROM).
>>
>> After results are returned correctly are they then returned correctly for
>> all future runs? When was the data inserted (relative to your attempt to
>> query it)?
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> 
>>
>> 
>> 
>> 
>>
>> Read our latest technical blog posts here
>> .
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Mon, 29 Apr 2019 at 17:57, Marco Gasparini <
>> marco.gaspar...@competitoor.com> wrote:
>>
>>> Hi all,
>>>
>>> I'm using Cassandra 3.11.3.5.
>>>
>>> I have just noticed that when I perform a query I get 0 result but if I
>>> launch that same query after few seconds I get the right result.
>>>
>>> I have traced the query:
>>>
>>> cqlsh> select event_datetime, id_url, uuid, num_pages from
>>> mkp_history.mkp_lookup where id_url= 1455425 and url_type='mytype' ;
>>>
>>>  event_datetime | id_url | uuid | num_pages
>>> ++--+---
>>>
>>> (0 rows)
>>>
>>> Tracing session: dda9d1a0-6a51-11e9-9e36-f54fe3235e69
>>>
>>>  activity
>>>
>>>  | timestamp  | source| source_elapsed
>>> | client
>>>
>>> --++---++---
>>>
>>>
>>>  Execute CQL3 query | 2019-04-29 09:39:05.53 | 10.8.0.10 |
>>> 0 | 10.8.0.10
>>>  Parsing select event_datetime, id_url, uuid, num_pages from
>>> mkp_history.mkp_lookup where id_url= 1455425 and url_type=' mytype'\n;
>>> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
>>> 238 | 10.8.0.10
>>>
>>> Preparing statement
>>> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.53 | 10.8.0.10 |
>>> 361 | 10.8.0.10
>>>
>>>reading data from /10.8.0.38
>>> [Native-Transport-Requests-2] | 2019-04-29 09:39:05.531000 | 10.8.0.10 |
>>> 527 | 10.8.0.10
>>>
>>>   Sending READ message to /10.8.0.38
>>> [MessagingService-Outgoing-/10.8.0.38-Small] | 2019-04-29 09:39:05.531000 |
>>> 10.8.0.10 |620 | 10.8.0.10
>>>
>>>  READ message received from /10.8.0.10
>>> [Mes

Re: Increasing the size limits implications

2019-04-29 Thread Nitan Kainth
Increasing batch size could potentially lead to longer GC pauses.

Try to break you batch size. 300kb is a decent limit for most use cases.


Regards,
Nitan
Cell: 510 449 9629

> On Apr 29, 2019, at 12:17 PM, Bobbie Haynes  wrote:
> 
> Hi, 
>   I'm inserting into cassandra in batches(With each containing single PK 
> ).But my batch is failing and throwing exceptions.
> I want to know if we increase batch_size_warn_threshold_in_kb to 200KB and 
> batch_size_fail_threshold_in_kb to 300KB. What could be potential issues i 
> could be facing 
> 
> Thanks,
> Bobbie


Re: cassandra node was put down with oom error

2019-04-29 Thread yeomii999
Hello,

I'm suffering from similar problem with OSS cassandra version3.11.3.
My cassandra cluster have been running for longer than 1 years and there was no 
problem until this year.
The cluster is write-intensive, consists of 70 nodes, and all rows have 2 hr 
TTL.
The only change is the read consistency from QUORUM to ONE. (I cannot revert 
this change because of the read latency)
Below is my compaction strategy.
```
compaction = {'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
'compaction_window_size': '3', 'compaction_window_unit': 'MINUTES', 'enabled': 
'true', 'max_threshold': '32', 'min_threshold': '4', 
'tombstone_compaction_interval': '60', 'tombstone_threshold': '0.2', 
'unchecked_tombstone_compaction': 'false'}
```
I've tried rolling restarting the cluster several times, 
but the memory usage of cassandra process always keeps going high.
I also tried Native Memory Tracking, but it only measured less memory usage 
than the system mesaures (RSS in /proc/{cassandra-pid}/status)

Is there any way that I could figure out the cause of this problem?


On 2019/01/26 20:53:26, Jeff Jirsa  wrote: 
> You’re running DSE so the OSS list may not be much help. Datastax May have 
> more insight
> 
> In open source, the only things offheap that vary significantly are bloom 
> filters and compression offsets - both scale with disk space, and both 
> increase during compaction. Large STCS compaction can cause pretty meaningful 
> allocations for these. Also, if you have an unusually low compression chunk 
> size or a very low bloom filter FP ratio, those will be larger.
> 
> 
> -- 
> Jeff Jirsa
> 
> 
> > On Jan 26, 2019, at 12:11 PM, Ayub M  wrote:
> > 
> > Cassandra node went down due to OOM, and checking the /var/log/message I 
> > see below.
> > 
> > ```
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer: 
> > gfp_mask=0x280da, order=0, oom_score_adj=0
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/ mems_allowed=0
> > 
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA: 1*4kB (U) 0*8kB 
> > 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 
> > 1*2048kB (M) 3*4096kB (M) = 15908kB
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 DMA32: 1294*4kB (UM) 
> > 932*8kB (UEM) 897*16kB (UEM) 483*32kB (UEM) 224*64kB (UEM) 114*128kB (UEM) 
> > 41*256kB (UEM) 12*512kB (UEM) 7*1024kB (UE
> > M) 2*2048kB (EM) 35*4096kB (UM) = 242632kB
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 Normal: 5319*4kB (UE) 
> > 3233*8kB (UEM) 960*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 
> > 0*2048kB 0*4096kB = 62500kB
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0 
> > hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Node 0 hugepages_total=0 
> > hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 38109 total pagecache pages
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages in swap cache
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Swap cache stats: add 0, delete 
> > 0, find 0/0
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Free swap  = 0kB
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: Total swap = 0kB
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 16394647 pages RAM
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 0 pages HighMem/MovableOnly
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: 310559 pages reserved
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ pid ]   uid  tgid total_vm 
> >  rss nr_ptes swapents oom_score_adj name
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2634] 0  263441614 
> >  326  820 0 systemd-journal
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2690] 0  269029793 
> >  541  270 0 lvmetad
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [ 2710] 0  271011892 
> >  762  250 -1000 systemd-udevd
> > .
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [13774] 0 13774   459778
> > 97729 4290 0 Scan Factory
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14506] 0 1450621628 
> > 5340  240 0 macompatsvc
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14586] 0 1458621628 
> > 5340  240 0 macompatsvc
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14588] 0 1458821628 
> > 5340  240 0 macompatsvc
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14589] 0 1458921628 
> > 5340  240 0 macompatsvc
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14598] 0 1459821628 
> > 5340  240 0 macompatsvc
> > Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: [14599] 0 1459921628 
> > 5340  240 0 macompatsvc
> > Jan 23 20:07:17 ip-xxx-xxx-xxx