Re: Very slow cluster

2017-05-05 Thread Eduardo Alonso
Thank you Anthony.

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*

2017-05-01 2:27 GMT+02:00 Anthony Grasso :

> Hi Eduardo,
>
> Please see my comment inline below regarding your third question.
>
> Regards,
> Anthony
>
> On 28 April 2017 at 21:26, Eduardo Alonso 
> wrote:
>
>> Hi to all:
>>
>> I am having some problems with two client's cassandra:3.0.8 clusters i
>> want to share with you. These clusters are for QA and DEV.
>>
>> The cluster 1 (1 DC) is composed by 3 vm (heap=4G, RAM=8G) sharing the
>> same physical machine and sharing one ssd. I know this is not the best
>> environment but it is only for testing purposes.
>>
>> The entire cluster runs very slow and sometimes have some failing inserts
>> causing saving hints and replaying them and some data inconsistency with 2i
>> queries.
>>
>> I know it is not the best environment (virtual machines sharing physical
>> machine and one physical disk) but it is very weird to me that just the
>> same test case works like a charm in a 3 docker container inside my
>> laptop(i7 16G ssd) but causes a lot of problems in their cluster.
>>
>> *listen_address* and *rpc_address* are set to external domain name (i.
>> e: NODE_NAME.clientdomain.com). I have activated TRACE logs and get some
>> strange messages
>>
>> So, my questions:
>>
>> *1.- It is posible that one node(with ) send a message to self triggering
>> READ_REPAIR?*
>>
>> TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558
>> MessagingService.java:750 - Message-to-self TYPE:MUTATION VERB:
>> READ_REPAIR going over MessagingService
>>
>> TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513
>> MessagingService.java:747 -01a.clientdomain.com/10.63.24.238
>>  sending
>> READ_REPAIR to 3426@/10.63.24.238"
>>
>> *Does this log line shows one node asking itself for a portion of data
>> that it has not? *
>>
>> *2.-* I have another suspicious log line about slow vms:
>>
>> -WARN  [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287
>> - Not marking nodes down due to local pause of 11195193520 > 50
>>
>> *Does this line says that there is a pause in JVM  of 11 secs*? There is
>> no garbage collector log lines. *Is it posible that this 11 secs pause
>> is caused by a dns lookup of the domain?*
>>
>>
>> *3.-* I know that listen_address must be the external IP (Inter node
>> communications will be faster, no need to dns lookup)
>>
>> *If i set listen_address to external ip, is it necessary that ip be
>> pingable from all the other datacenter nodes? *
>> *Does inter-data-center communications use 'rpc_address' or
>> 'listen_address'*?
>>
>>
> All nodes in the cluster should be configured so that they can contact
> each other. As far as being able to ping each other, enabling ICMP can be
> useful for debugging inter communication problems.
>
> Regarding internode communication; the *listen_address* is used for
> internode communication in the cluster. Note that if you don't want to
> manually specify an IP to *listen_address* for each node in your cluster,
> leave it blank and Cassandra will use *InetAddress.getLocalHost()* to
> pick an address.
>
>
>> Thank you in advance
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // 
>> *@stratiobd
>> *
>>
>
>


Re: [Cassandra] nodetool compactionstats not showing pending task.

2017-05-05 Thread Alain RODRIGUEZ
Hi,

Sorry to hear the restart did not help.

Maybe try to monitor through JMX with
'org.apache.cassandra.db:type=CompactionManager',
> attribute 'Compactions' or 'CompactionsSummary'


What is this attribute showing?

Here is the Apache Cassandra Jira:
https://issues.apache.org/jira/browse/CASSANDRA. You search here

 (
https://issues.apache.org/jira/browse/CASSANDRA-12529?jql=project%20%3D%20CASSANDRA%20AND%20text%20~%20%22pending%20compactions%22%20ORDER%20BY%20created%20DESC),
for example.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-05-05 6:01 GMT+01:00 Abhishek Kumar Maheshwari <
abhishek.maheshw...@timesinternet.in>:

> I just restart the cluster but still facing same issue. Please let me know
> where I can search on JIRA or will raise new ticket for the same?
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* kurt greaves [mailto:k...@instaclustr.com]
> *Sent:* Tuesday, May 2, 2017 11:30 AM
> *To:* Abhishek Kumar Maheshwari 
> *Cc:* Alain RODRIGUEZ ; user@cassandra.apache.org
>
> *Subject:* Re: [Cassandra] nodetool compactionstats not showing pending
> task.
>
>
>
> I believe this is a bug with the estimation of tasks, however not aware of
> any JIRA that covers the issue.
>
>
>
> On 28 April 2017 at 06:19, Abhishek Kumar Maheshwari  timesinternet.in> wrote:
>
> Hi ,
>
>
>
> I will try with JMX but I try with tpstats. In tpstats its showing pending
> compaction as 0 but in nodetool compactionstats its showing 3. So, for me
> its seems strange.
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Sent:* Thursday, April 27, 2017 4:45 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [Cassandra] nodetool compactionstats not showing pending
> task.
>
>
>
> Maybe try to monitor through JMX with 
> 'org.apache.cassandra.db:type=CompactionManager',
> attribute 'Compactions' or 'CompactionsSummary'
>
>
>
> C*heers
>
> ---
>
> Alain Rodriguez - @arodream - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
>
>
> 2017-04-27 12:27 GMT+02:00 Alain RODRIGUEZ :
>
> Hi,
>
>
>
> I am not sure about this one. It happened to me in the past as well. I
> never really wondered about it as it was gone after a while or a restart
> off the top of my head. To get rid of it, a restart might be enough.
>
>
>
> But if you feel like troubleshooting this, I think the first thing is to
> try to see if compactions are really happening. Maybe using JMX, I believe
> `org.apache.cassandra.metrics:type=Compaction,name=PendingTasks` is what
> is used by 'nodetool compactionstats' but they might be more info there.
> Actually I don't really know what the 'system.compactions_in_progress'
> was replaced by, but any way to double check you could think of would
> probably help understanding better what's happening.
>
>
>
> Does someone now the way to check pending compactions details in 3.0.9?
>
>
>
> C*heers,
>
> ---
>
> Alain Rodriguez - @arodream - al...@thelastpickle.com
>
> France
>
>
>
> The Last Pickle - Apache Cassandra Consulting
>
> http://www.thelastpickle.com
>
>
>
> 2017-04-25 15:13 GMT+02:00 Abhishek Kumar Maheshwari  timesinternet.in>:
>
> Hi All,
>
>
>
> In Production, I am using Cassandra 3.0.9.
>
>
>
> While I am running nodetool compactionstats command its just showing count
> not any other information like below:
>
>
>
> [mohit.kundra@AdtechApp bin]$ ./nodetool -h XXX.XX.XX.XX
> compactionstats
>
> pending tasks: 3
>
> [mohit.kundra@AdtechAppX bin]$
>
>
>
> So, this is some Cassandra bug or what? I am not able to understand.
>
>
>
>
>
> *Thanks & Regards,*
> *Abhishek Kumar Maheshwari*
> *+91- 805591 <+91%208%2005591> (Mobile)*
>
> Times Internet Ltd. | A Times of India Group Company
>
> FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
>
> *P** Please do not print this email unless it is absolutely necessary.
> Spread environmental awareness.*
>
>
>
> "Learn journalism at India's largest media house - The Times of India
> Group. Last Date 28 April,

Re: Totally unbalanced cluster

2017-05-05 Thread Alain RODRIGUEZ
Hi,


> but it's so easy to add nodes


Apache Cassandra has some kind of magic pieces ;-). Sometimes it is dark
magic though :p. Yet adding a node is indeed not harder when using
NetworkTopologyStrategy, as Jon mentioned above, once the configuration is
done once properly.

> Number of keys (estimate): 442779640
>
> Number of keys (estimate): 736380940
>
> Number of keys (estimate): 451097313
>

This is indeed possibly, and most certainly creating imbalances. But also
look at the partition size when using 'nodetool cfstats', using the
previous information and the 'Compacted partition mean bytes', you should
have an idea how much the disk space used is imbalanced. If you would like
more details on the partition size distribution, partition size percentile
are available using 'nodetool cfhistograms'.

Regarding the global load (CPU, GC, disk IO, etc), it also depends on the
workload (ie, what partitions are being read).

*Should I use nodetool rebuild?*


No, I see no reason. This command, 'nodetool rebuild' is meant to be used
when adding a new datacenter to the cluster. Which, by the way, will not
happen as long as you are using the 'SimpleStrategy', that basically
creates one big ring and consider all the nodes as being part of it, no
matter their placement in the network, if I remember correctly.

The nodetool repair by default in this C* version is incremental and since
> the repair is run in all nodes in different hours


Incremental repairs are quite new to me. But I heard they bring some
issues, often due to anti-compactions inducing a high number of SSTables
and a growing number of compactions pending. But it does not look bad in
your case.

Yet the '-pr' option should not be used when doing incremental repairs.
This thread mentions it and is probably worth reading:
https://groups.google.com/forum/#!topic/nosql-databases/peTArLfhXMU. Also I
believe it is mentioned in the video about repairs from Alexander I shared
in my last mail.

and I don't want snapshots that's why I'm cleaning twice a day (not sure
> that with -pr a snapshot is created).


So the option using snapshots is not '-pr', but '-seq' (sequential) or
'-par' (parallel), more info:
https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html

If you want to keep using sequential repairs, then you could check the
snpashots automaticly generated names, and aim at deleting them
specifically to prevent you from removing an other manually created and
possibly important snapshot.

I'm using garbagecollect to force the cleanup since I'm running out of
> space.


Oh that's a whole topic. These blogposts should hopefully be helpful:

- thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html (From
myself, how to handle tombstones)
- http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html (From Alexander,
a coworker @TLP - TWCS and expiring tables)

Hopefully some information picked from those 2 blog posts might help you
freeing some disk space.

It is probably not needed to use 'garbagecollect' as a routine operation.
Some tuning in the compaction strategy or options (using defaults
currently) might be enough to solve the issue.

Yet the data is not correctly distributed. Something in the data model
design is inducing it. The primary key (hashed) is what is used to affect
the data to a specific node. Also a variable partition size can also lead
to hotspots.

As a side note, I strongly believe that understanding internals is very
important to operate Apache Cassandra correctly, I mean playing with it to
learn can put you in some undesirable situations. That's why I keep
mentioning some blog posts, talks or documentations that I think could be
helpful to know Apache Cassandra internals and processes a bit more.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2017-05-04 17:10 GMT+01:00 Jon Haddad :

> Adding nodes with NTS is easier, in my opinion.  You don’t need to worry
> about replica placement, if you do it right.
>
> On May 4, 2017, at 7:43 AM, Cogumelos Maravilha <
> cogumelosmaravi...@sapo.pt> wrote:
>
> Hi Alain thanks for your kick reply.
>
>
> Regarding SimpleStrategy perhaps you are right but it's so easy to add
> nodes.
>
> I'm *not* using vnodes and the default 256. The information that I've
> posted it a regular nodetool status keyspace.
>
> My partition key is a sequencial big int but nodetool cfstatus shows that
> the number of keys are not balanced (data from 3 nodes):
>
> Number of keys (estimate): 442779640
>
> Number of keys (estimate): 736380940
>
> Number of keys (estimate): 451097313
>
> *Should I use nodetool rebuild?*
>
> Running:
>
> nodetool getendpoints mykeyspace data 9213395123941039285
>
> 10.1.1.52
> 10.1.1.185
>
> nodetool getendpoints mykeyspace data 9213395123941039286
>
> 10.1.1.161
> 10.1.1.19
>
> All nodes are working hard because my TTL is for 18 days and daily

Re: Totally unbalanced cluster

2017-05-05 Thread Cogumelos Maravilha
Hi,

Regarding the documentation I've already knew:

- thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

(From myself, how to handle tombstones)
- http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html (From
Alexander, a coworker @TLP - TWCS and expiring tables)

Anyway fantastic docs.

I desperately need to free up disk space. nodetool repair can do an
anticompaction.
In my case C* is only used to insert data that expires with a TTL of 18
days. No updates, or deletes. Some selects using the partition key.
gc_grace is defined to 3 hours.

Best practices do free up disk space please?

Thanks in advance



On 05/05/2017 03:09 PM, Alain RODRIGUEZ wrote:
> Hi,
>  
>
> but it's so easy to add nodes
>
>
> Apache Cassandra has some kind of magic pieces ;-). Sometimes it is
> dark magic though :p. Yet adding a node is indeed not harder when
> using NetworkTopologyStrategy, as Jon mentioned above, once the
> configuration is done once properly.
>
> Number of keys (estimate): 442779640
>
> Number of keys (estimate): 736380940
>
> Number of keys (estimate): 451097313
>
>
> This is indeed possibly, and most certainly creating imbalances. But
> also look at the partition size when using 'nodetool cfstats', using
> the previous information and the 'Compacted partition mean bytes', you
> should have an idea how much the disk space used is imbalanced. If you
> would like more details on the partition size distribution, partition
> size percentile are available using 'nodetool cfhistograms'.
>
> Regarding the global load (CPU, GC, disk IO, etc), it also depends on
> the workload (ie, what partitions are being read).
>
> *Should I use nodetool rebuild?*
>
>
> No, I see no reason. This command, 'nodetool rebuild' is meant to be
> used when adding a new datacenter to the cluster. Which, by the way,
> will not happen as long as you are using the 'SimpleStrategy', that
> basically creates one big ring and consider all the nodes as being
> part of it, no matter their placement in the network, if I remember
> correctly.
>
> The nodetool repair by default in this C* version is incremental
> and since the repair is run in all nodes in different hours 
>
>
> Incremental repairs are quite new to me. But I heard they bring some
> issues, often due to anti-compactions inducing a high number of
> SSTables and a growing number of compactions pending. But it does not
> look bad in your case. 
>
> Yet the '-pr' option should not be used when doing incremental
> repairs. This thread mentions it and is probably worth
> reading: https://groups.google.com/forum/#!topic/nosql-databases/peTArLfhXMU
> .
> Also I believe it is mentioned in the video about repairs from
> Alexander I shared in my last mail.
>
> and I don't want snapshots that's why I'm cleaning twice a day
> (not sure that with -pr a snapshot is created).
>
>
> So the option using snapshots is not '-pr', but '-seq' (sequential) or
> '-par' (parallel), more
> info: 
> https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html
>
> If you want to keep using sequential repairs, then you could check the
> snpashots automaticly generated names, and aim at deleting them
> specifically to prevent you from removing an other manually created
> and possibly important snapshot.
>
> I'm using garbagecollect to force the cleanup since I'm running
> out of space.
>
>
> Oh that's a whole topic. These blogposts should hopefully be helpful:
>
> - thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
> 
> (From myself, how to handle tombstones)
> - http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html (From
> Alexander, a coworker @TLP - TWCS and expiring tables)
>
> Hopefully some information picked from those 2 blog posts might help
> you freeing some disk space.
>
> It is probably not needed to use 'garbagecollect' as a routine
> operation. Some tuning in the compaction strategy or options (using
> defaults currently) might be enough to solve the issue.
>
> Yet the data is not correctly distributed. Something in the data model
> design is inducing it. The primary key (hashed) is what is used to
> affect the data to a specific node. Also a variable partition size can
> also lead to hotspots.
>
> As a side note, I strongly believe that understanding internals is
> very important to operate Apache Cassandra correctly, I mean playing
> with it to learn can put you in some undesirable situations. That's
> why I keep mentioning some blog posts, talks or documentations that I
> think could be helpful to know Apache Cassandra internals and
> processes a bit more.
>
> C*heers,
> ---
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> 

Cassandra Schema version mismatch

2017-05-05 Thread Nitan Kainth
Hi Experts,

We found schema version mismatch in our cluster. We fixed it by bouncing C* on 
nodes where version was mismatched. Can someone suggest, what are the possible 
reasons for this? We are trying to figure out the root cause.

thank you!
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Cassandra as a key/object store for many small (10-60k) files

2017-05-05 Thread Jonathan Guberman
Hello,

We’re currently testing Cassandra for use as a pure key-object store for data 
blobs around 10kB - 60kB each. Our use case is storing on the order of 10 
billion objects with about 5-20 million new writes per day. A written object 
will never be updated or deleted. Objects will be read at least once, some time 
within 10 days of being written. This will generally happen as a batch; that 
is, all of the images written on a particular day will be read together at the 
same time. This batch read will only happen one time; future reads will happen 
on individual objects, with no grouping, and they will follow a long-tail 
distribution, with popular objects read thousands of times per year but most 
read never or virtually never.

I’ve set up a small four node test cluster and have written test scripts to 
benchmark writing and reading our data. The table I’ve set up is very simple: 
an ascii primary key column with the object ID and a blob column for the data. 
All other settings were left at their defaults.
 
I’ve found write speeds to be very fast most of the time. However, 
periodically, writes will slow to a crawl for anywhere between half an hour to 
two hours, after which speeds recover to their previous levels. I assume this 
is some sort of data compaction or flushing to disk, but I haven’t been able to 
figure out the exact cause.

Read speeds have been more disappointing. Cached reads are very fast, but 
random read speed averages about 2 MB/sec, which is too slow when we need to 
read out a batch of several million objects. I don’t think it’s reasonable to 
assume that these rows will all still be cached by the time we need to read 
them for that first large batch read.

My general question is whether anyone has any suggestions for how to improve 
performance for our use case. More specifically:

- Is there a way to mitigate or eliminate the huge slowdowns I see when writing 
millions of rows?
- Are there settings I should be using in order to maximize read speeds for 
random reads?
- Is there a way to design our tables to improve the read speeds for the 
initial large batched reads? I was thinking of using a batch ID column that 
could be used to retrieve the data for the initial block. However, future reads 
would need to be done by the object ID, not the batch ID, so it seems to me I’d 
need to duplicate the data, one in a “objects by batch” table, and the other in 
a simple “objects” table. Is there a better approach than this?

Thank you!



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra as a key/object store for many small (10-60k) files

2017-05-05 Thread daemeon reiydelle
These numbers do not match e.g. AWS, so guessing you are using local
storage?


*...*

*Making a billion dollar startup is easy: "take a human desire, preferably
one that has been around for a really long time … Identify that desire and
use modern technology to take out steps."*


*...Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144
9872*

On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman  wrote:

> Hello,
>
> We’re currently testing Cassandra for use as a pure key-object store for
> data blobs around 10kB - 60kB each. Our use case is storing on the order of
> 10 billion objects with about 5-20 million new writes per day. A written
> object will never be updated or deleted. Objects will be read at least
> once, some time within 10 days of being written. This will generally happen
> as a batch; that is, all of the images written on a particular day will be
> read together at the same time. This batch read will only happen one time;
> future reads will happen on individual objects, with no grouping, and they
> will follow a long-tail distribution, with popular objects read thousands
> of times per year but most read never or virtually never.
>
> I’ve set up a small four node test cluster and have written test scripts
> to benchmark writing and reading our data. The table I’ve set up is very
> simple: an ascii primary key column with the object ID and a blob column
> for the data. All other settings were left at their defaults.
>
> I’ve found write speeds to be very fast most of the time. However,
> periodically, writes will slow to a crawl for anywhere between half an hour
> to two hours, after which speeds recover to their previous levels. I assume
> this is some sort of data compaction or flushing to disk, but I haven’t
> been able to figure out the exact cause.
>
> Read speeds have been more disappointing. Cached reads are very fast, but
> random read speed averages about 2 MB/sec, which is too slow when we need
> to read out a batch of several million objects. I don’t think it’s
> reasonable to assume that these rows will all still be cached by the time
> we need to read them for that first large batch read.
>
> My general question is whether anyone has any suggestions for how to
> improve performance for our use case. More specifically:
>
> - Is there a way to mitigate or eliminate the huge slowdowns I see when
> writing millions of rows?
> - Are there settings I should be using in order to maximize read speeds
> for random reads?
> - Is there a way to design our tables to improve the read speeds for the
> initial large batched reads? I was thinking of using a batch ID column that
> could be used to retrieve the data for the initial block. However, future
> reads would need to be done by the object ID, not the batch ID, so it seems
> to me I’d need to duplicate the data, one in a “objects by batch” table,
> and the other in a simple “objects” table. Is there a better approach than
> this?
>
> Thank you!
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Cassandra as a key/object store for many small (10-60k) files

2017-05-05 Thread Jonathan Guberman
Yes, local storage volumes on each machine.

> On May 5, 2017, at 3:25 PM, daemeon reiydelle  wrote:
> 
> These numbers do not match e.g. AWS, so guessing you are using local storage?
> 
> 
> ...
> Making a billion dollar startup is easy: "take a human desire, preferably one 
> that has been around for a really long time … Identify that desire and use 
> modern technology to take out steps."
> ...
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
> 
> On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman  > wrote:
> Hello,
> 
> We’re currently testing Cassandra for use as a pure key-object store for data 
> blobs around 10kB - 60kB each. Our use case is storing on the order of 10 
> billion objects with about 5-20 million new writes per day. A written object 
> will never be updated or deleted. Objects will be read at least once, some 
> time within 10 days of being written. This will generally happen as a batch; 
> that is, all of the images written on a particular day will be read together 
> at the same time. This batch read will only happen one time; future reads 
> will happen on individual objects, with no grouping, and they will follow a 
> long-tail distribution, with popular objects read thousands of times per year 
> but most read never or virtually never.
> 
> I’ve set up a small four node test cluster and have written test scripts to 
> benchmark writing and reading our data. The table I’ve set up is very simple: 
> an ascii primary key column with the object ID and a blob column for the 
> data. All other settings were left at their defaults.
> 
> I’ve found write speeds to be very fast most of the time. However, 
> periodically, writes will slow to a crawl for anywhere between half an hour 
> to two hours, after which speeds recover to their previous levels. I assume 
> this is some sort of data compaction or flushing to disk, but I haven’t been 
> able to figure out the exact cause.
> 
> Read speeds have been more disappointing. Cached reads are very fast, but 
> random read speed averages about 2 MB/sec, which is too slow when we need to 
> read out a batch of several million objects. I don’t think it’s reasonable to 
> assume that these rows will all still be cached by the time we need to read 
> them for that first large batch read.
> 
> My general question is whether anyone has any suggestions for how to improve 
> performance for our use case. More specifically:
> 
> - Is there a way to mitigate or eliminate the huge slowdowns I see when 
> writing millions of rows?
> - Are there settings I should be using in order to maximize read speeds for 
> random reads?
> - Is there a way to design our tables to improve the read speeds for the 
> initial large batched reads? I was thinking of using a batch ID column that 
> could be used to retrieve the data for the initial block. However, future 
> reads would need to be done by the object ID, not the batch ID, so it seems 
> to me I’d need to duplicate the data, one in a “objects by batch” table, and 
> the other in a simple “objects” table. Is there a better approach than this?
> 
> Thank you!
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
> 



Re: Cassandra Schema version mismatch

2017-05-05 Thread Carlos Rolo
Are you changing the schema in a dynamic fashion? If you get problems
(network, gc pauses, etc) during the schema changes it might lead to that.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 918 918 100
www.pythian.com

On Fri, May 5, 2017 at 7:00 PM, Nitan Kainth  wrote:

> Hi Experts,
>
> We found schema version mismatch in our cluster. We fixed it by bouncing
> C* on nodes where version was mismatched. Can someone suggest, what are the
> possible reasons for this? We are trying to figure out the root cause.
>
> thank you!
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 


--





Re: Cassandra as a key/object store for many small (10-60k) files

2017-05-05 Thread daemeon reiydelle
I would guess you have network overload issues, I have seen pretty much
exactly what you describe many times, (so far ;{) always this is the issue.
Especially with 1gbit networks, no jumbo frames, etc. Get your network guys
to monitor the error retry packets across ALL of the interfaces (all the
nodes, Top of Rack switch, network switches, etc.). If you see ANY retries,
timeouts, errors, you have found your problem.

Or it could be something like java stack garbage collection, cpu overload,
etc.


*...*

*Making a billion dollar startup is easy: "take a human desire, preferably
one that has been around for a really long time … Identify that desire and
use modern technology to take out steps."*


*...Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144
9872*

On Fri, May 5, 2017 at 12:26 PM, Jonathan Guberman  wrote:

> Yes, local storage volumes on each machine.
>
> On May 5, 2017, at 3:25 PM, daemeon reiydelle  wrote:
>
> These numbers do not match e.g. AWS, so guessing you are using local
> storage?
>
>
> *...*
>
> *Making a billion dollar startup is easy: "take a human desire, preferably
> one that has been around for a really long time … Identify that desire and
> use modern technology to take out steps."*
>
>
> *...Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <(415)%20501-0198>London (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Fri, May 5, 2017 at 12:19 PM, Jonathan Guberman  wrote:
>
>> Hello,
>>
>> We’re currently testing Cassandra for use as a pure key-object store for
>> data blobs around 10kB - 60kB each. Our use case is storing on the order of
>> 10 billion objects with about 5-20 million new writes per day. A written
>> object will never be updated or deleted. Objects will be read at least
>> once, some time within 10 days of being written. This will generally happen
>> as a batch; that is, all of the images written on a particular day will be
>> read together at the same time. This batch read will only happen one time;
>> future reads will happen on individual objects, with no grouping, and they
>> will follow a long-tail distribution, with popular objects read thousands
>> of times per year but most read never or virtually never.
>>
>> I’ve set up a small four node test cluster and have written test scripts
>> to benchmark writing and reading our data. The table I’ve set up is very
>> simple: an ascii primary key column with the object ID and a blob column
>> for the data. All other settings were left at their defaults.
>>
>> I’ve found write speeds to be very fast most of the time. However,
>> periodically, writes will slow to a crawl for anywhere between half an hour
>> to two hours, after which speeds recover to their previous levels. I assume
>> this is some sort of data compaction or flushing to disk, but I haven’t
>> been able to figure out the exact cause.
>>
>> Read speeds have been more disappointing. Cached reads are very fast, but
>> random read speed averages about 2 MB/sec, which is too slow when we need
>> to read out a batch of several million objects. I don’t think it’s
>> reasonable to assume that these rows will all still be cached by the time
>> we need to read them for that first large batch read.
>>
>> My general question is whether anyone has any suggestions for how to
>> improve performance for our use case. More specifically:
>>
>> - Is there a way to mitigate or eliminate the huge slowdowns I see when
>> writing millions of rows?
>> - Are there settings I should be using in order to maximize read speeds
>> for random reads?
>> - Is there a way to design our tables to improve the read speeds for the
>> initial large batched reads? I was thinking of using a batch ID column that
>> could be used to retrieve the data for the initial block. However, future
>> reads would need to be done by the object ID, not the batch ID, so it seems
>> to me I’d need to duplicate the data, one in a “objects by batch” table,
>> and the other in a simple “objects” table. Is there a better approach than
>> this?
>>
>> Thank you!
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>
>


Re: Cassandra Schema version mismatch

2017-05-05 Thread Nitan Kainth
No schematic change!

Sent from my iPhone

> On May 5, 2017, at 2:30 PM, Carlos Rolo  wrote:
> 
> Are you changing the schema in a dynamic fashion? If you get problems 
> (network, gc pauses, etc) during the schema changes it might lead to that.
> 
> Regards,
> 
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>  
> Pythian - Love your data
> 
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin: 
> linkedin.com/in/carlosjuzarterolo 
> Mobile: +351 918 918 100 
> www.pythian.com
> 
>> On Fri, May 5, 2017 at 7:00 PM, Nitan Kainth  wrote:
>> Hi Experts,
>> 
>> We found schema version mismatch in our cluster. We fixed it by bouncing C* 
>> on nodes where version was mismatched. Can someone suggest, what are the 
>> possible reasons for this? We are trying to figure out the root cause.
>> 
>> thank you!
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> 
> --
> 
> 
> 
> 


Smart Table creation for 2D range query

2017-05-05 Thread Lydia
Hi all,

I am new to Apache Cassandra and I would like to get some advice on how to 
tackle a table creation / indexing in a sophisticated way.

My aim is to store x- and y-coordinates, accompanied by some columns with meta 
information (m1, ... ,m5). There will be around 100,000,000 rows overall. Some 
rows might have the same (x,y) pairs but always distinct meta information. 

In the end I want to do a rather simple range query in the form of e.g. (0 >= x 
<= 1) AND (0 >= y <= 1).

What would be the best choice of variables to set as primary key, partition 
key. Or should I use a index? And if so on what column(s)?

Thanks in advance!
Best regards, 
Lydia
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Smart Table creation for 2D range query

2017-05-05 Thread Nitan Kainth
Make metadata as partition key and x,y as part of partition key i.e. Primary 
key. It should work

Sent from my iPhone

> On May 5, 2017, at 2:40 PM, Lydia  wrote:
> 
> Hi all,
> 
> I am new to Apache Cassandra and I would like to get some advice on how to 
> tackle a table creation / indexing in a sophisticated way.
> 
> My aim is to store x- and y-coordinates, accompanied by some columns with 
> meta information (m1, ... ,m5). There will be around 100,000,000 rows 
> overall. Some rows might have the same (x,y) pairs but always distinct meta 
> information. 
> 
> In the end I want to do a rather simple range query in the form of e.g. (0 >= 
> x <= 1) AND (0 >= y <= 1).
> 
> What would be the best choice of variables to set as primary key, partition 
> key. Or should I use a index? And if so on what column(s)?
> 
> Thanks in advance!
> Best regards, 
> Lydia
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



New Cassandra Luxembourg Meetup

2017-05-05 Thread Johnny Miller
Moein all!

I have just started a new Cassandra Luxembourg meetup if people are
interested (and why woul you not)!

https://www.meetup.com/Cassandra-Luxembourg/

My company will sponsor and schedule some talks etc.. if there is enough
demand.

So sign up - could be interesting!!

Johnny

-- 


--

Any views or opinions presented are solely those of the author and do not 
necessarily represent those of the company. digitalis.io is a trading name 
of Digitalis.io Ltd. Company Number: 98499457 Registered in England and 
Wales. Registered Office: Kemp House, 152 City Road, London, EC1V 2NX, 
United Kingddom


New Cassandra Luxembourg Meetup

2017-05-05 Thread Johnny Miller
Moein all!

I have just started a new Cassandra Luxembourg meetup if people are interested 
(and why would you not)!

https://www.meetup.com/Cassandra-Luxembourg/ 


My company will sponsor and schedule some talks etc.. if there is enough demand.

So sign up - could be interesting!!

Johnny

Re: Smart Table creation for 2D range query

2017-05-05 Thread Jon Haddad
I think you’ll want to model your table similar to how an R-Tree [1] / Quad 
tree [2] works.  Let’s suppose you had a 10x10 meter land area and you wanted 
to put stuff in there.  In order to find “all the things in point x,y”, you 
could break your land area into a grid.  A partition would contain all the 
items that are in that grid space.  In my simple example, I’d have 100 
partitions.

For example:

// space is a simple "x.y" text field
CREATE TABLE geospatial (
space text,
item text,
primary key (space, item)
);

insert into geospatial (space, item) values ('1.1', 'hat');
insert into geospatial (space, item) values ('1.1', 'bird');
insert into geospatial (space, item) values ('6.4', 'dog’);

This example is pretty trivial, and doesn’t take into account hot partitions.  
That’s where the process of subdividing a space occurs when it reaches a 
certain size.

[1] https://en.wikipedia.org/wiki/R-tree 
[2] https://en.wikipedia.org/wiki/Quadtree 

> On May 5, 2017, at 12:54 PM, Nitan Kainth  wrote:
> 
> Make metadata as partition key and x,y as part of partition key i.e. Primary 
> key. It should work
> 
> Sent from my iPhone
> 
>> On May 5, 2017, at 2:40 PM, Lydia  wrote:
>> 
>> Hi all,
>> 
>> I am new to Apache Cassandra and I would like to get some advice on how to 
>> tackle a table creation / indexing in a sophisticated way.
>> 
>> My aim is to store x- and y-coordinates, accompanied by some columns with 
>> meta information (m1, ... ,m5). There will be around 100,000,000 rows 
>> overall. Some rows might have the same (x,y) pairs but always distinct meta 
>> information. 
>> 
>> In the end I want to do a rather simple range query in the form of e.g. (0 
>> >= x <= 1) AND (0 >= y <= 1).
>> 
>> What would be the best choice of variables to set as primary key, partition 
>> key. Or should I use a index? And if so on what column(s)?
>> 
>> Thanks in advance!
>> Best regards, 
>> Lydia
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 



manual deletes with TWCS

2017-05-05 Thread John Sanda
How problematic is it to perform deletes when using TWCS? I am currently
using TWCS and have some new use cases for performing deletes. So far I
have avoided performing deletes, but I am wondering what issues I might run
into.


- John


Re: manual deletes with TWCS

2017-05-05 Thread Jon Haddad
You cannot.

From Alex’s TLP post: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html 


TWCS is no fit for workload that perform deletes on non TTLed data. Consider 
that SSTables from different time windows will never be compacted together, so 
data inserted on day 1 and deleted on day 2 will have the tombstone and the 
shadowed cells living in different time windows. Unless a major compaction is 
performed (which shouldn’t), and while the deletion will seem effective when 
running queries, space will never be reclaimed on disk.
Deletes can be performed on TTLed data if needed, but the partition will then 
exist in different time windows, which will postpone actual deletion from disk 
until both time windows fully expire.


> On May 5, 2017, at 1:54 PM, John Sanda  wrote:
> 
> How problematic is it to perform deletes when using TWCS? I am currently 
> using TWCS and have some new use cases for performing deletes. So far I have 
> avoided performing deletes, but I am wondering what issues I might run into.
> 
> 
> - John



Re: manual deletes with TWCS

2017-05-05 Thread John Sanda
This is involving TTLed data, and I actually would want to delete all
related partitions across all time windows. Let's say I have a time series
partitioned by day with a 7 day TTL and a window size of one day. If I
delete partitions for the past seven days, would I still run into the issue
of data purge being postponed?

On Fri, May 5, 2017 at 4:57 PM, Jon Haddad 
wrote:

> You cannot.
>
> From Alex’s TLP post: http://thelastpickle.com/blog/2016/12/08/TWCS-
> part1.html
>
> TWCS is no fit for workload that perform deletes on non TTLed data.
> Consider that SSTables from different time windows will never be compacted
> together, so data inserted on day 1 and deleted on day 2 will have the
> tombstone and the shadowed cells living in different time windows. Unless a
> major compaction is performed (which shouldn’t), and while the deletion
> will seem effective when running queries, space will never be reclaimed on
> disk.
> Deletes can be performed on TTLed data if needed, but the partition will
> then exist in different time windows, which will postpone actual deletion
> from disk until both time windows fully expire.
>
>
> On May 5, 2017, at 1:54 PM, John Sanda  wrote:
>
> How problematic is it to perform deletes when using TWCS? I am currently
> using TWCS and have some new use cases for performing deletes. So far I
> have avoided performing deletes, but I am wondering what issues I might run
> into.
>
>
> - John
>
>
>


-- 

- John


Re: Cassandra Schema version mismatch

2017-05-05 Thread Jeff Jirsa


On 2017-05-05 11:00 (-0700), Nitan Kainth  wrote: 
> Hi Experts,
> 
> We found schema version mismatch in our cluster. We fixed it by bouncing C* 
> on nodes where version was mismatched. Can someone suggest, what are the 
> possible reasons for this? We are trying to figure out the root cause.
> 

Do all of your versions match? You didn't accidentally upgrade half the cluster 
did you?



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: DTCS to TWCS

2017-05-05 Thread Jeff Jirsa


On 2017-05-04 14:08 (-0700), Jon Haddad  wrote: 
> We (The Last Pickle) wrote a blog post on using TWCS pre-3.0: 
> http://thelastpickle.com/blog/2017/01/10/twcs-part2.html 
> 
> 
> Alex Dejanovski wrote a very comprehensive guide to TWCS I recommend reading 
> before putting it in prod: 
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html 
> 
> 


Not to detract from the (well written) blog posts, here's the short answer for 
the list:

TWCS is officially in 3.0.8 and 3.8 

You can run it on recent versions of 2.0, 2.1, and 2.2 by following the steps 
in the blog (I suspect the largest users of TWCS use it on pre-3.0 versions, so 
while they're unofficial/unsupported, they're not untested). 




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra Schema version mismatch

2017-05-05 Thread Nitan Kainth
No, just two nodes have mismatch out of 18 nodes. We upgraded long back. 

Sent from my iPhone

> On May 5, 2017, at 5:17 PM, Jeff Jirsa  wrote:
> 
> 
> 
>> On 2017-05-05 11:00 (-0700), Nitan Kainth  wrote: 
>> Hi Experts,
>> 
>> We found schema version mismatch in our cluster. We fixed it by bouncing C* 
>> on nodes where version was mismatched. Can someone suggest, what are the 
>> possible reasons for this? We are trying to figure out the root cause.
>> 
> 
> Do all of your versions match? You didn't accidentally upgrade half the 
> cluster did you?
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: DTCS to TWCS

2017-05-05 Thread vasu gunja
Thanks Jeff 

Thanks,
Vasu

> On May 5, 2017, at 5:22 PM, Jeff Jirsa  wrote:
> 
> 
> 
>> On 2017-05-04 14:08 (-0700), Jon Haddad  wrote: 
>> We (The Last Pickle) wrote a blog post on using TWCS pre-3.0: 
>> http://thelastpickle.com/blog/2017/01/10/twcs-part2.html 
>> 
>> 
>> Alex Dejanovski wrote a very comprehensive guide to TWCS I recommend reading 
>> before putting it in prod: 
>> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html 
>> 
>> 
> 
> 
> Not to detract from the (well written) blog posts, here's the short answer 
> for the list:
> 
> TWCS is officially in 3.0.8 and 3.8 
> 
> You can run it on recent versions of 2.0, 2.1, and 2.2 by following the steps 
> in the blog (I suspect the largest users of TWCS use it on pre-3.0 versions, 
> so while they're unofficial/unsupported, they're not untested). 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Cassandra Schema version mismatch

2017-05-05 Thread Jeff Jirsa
Generally shouldn't happen in most modern versions of cassandra. Could be
simultaneous conflicting statements (two "CREATE TABLE" statements at the
same time, which can happen with programatic schema changes), or unhealthy
schema tables (lots and lots of changes create tombstones in the schema
tables, which can cause reading/calculating schema versions to be
inefficient).

In the future, you can work around this with 'nodetool resetlocalschema'



On Fri, May 5, 2017 at 3:45 PM, Nitan Kainth  wrote:

> No, just two nodes have mismatch out of 18 nodes. We upgraded long back.
>
> Sent from my iPhone
>
> > On May 5, 2017, at 5:17 PM, Jeff Jirsa  wrote:
> >
> >
> >
> >> On 2017-05-05 11:00 (-0700), Nitan Kainth  wrote:
> >> Hi Experts,
> >>
> >> We found schema version mismatch in our cluster. We fixed it by
> bouncing C* on nodes where version was mismatched. Can someone suggest,
> what are the possible reasons for this? We are trying to figure out the
> root cause.
> >>
> >
> > Do all of your versions match? You didn't accidentally upgrade half the
> cluster did you?
> >
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
>


Re: Cassandra Schema version mismatch

2017-05-05 Thread James Rothering
I've heard about this ... how did the problem present itself?

Sent from my iPhone

> On May 5, 2017, at 3:17 PM, Jeff Jirsa  wrote:
> 
> 
> 
>> On 2017-05-05 11:00 (-0700), Nitan Kainth  wrote: 
>> Hi Experts,
>> 
>> We found schema version mismatch in our cluster. We fixed it by bouncing C* 
>> on nodes where version was mismatched. Can someone suggest, what are the 
>> possible reasons for this? We are trying to figure out the root cause.
>> 
> 
> Do all of your versions match? You didn't accidentally upgrade half the 
> cluster did you?
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org