Re: snapshot strategy?

2018-11-05 Thread Lou DeGenaro
The issue really is how to manage disk space.  It is certainly possible to
take snapshots by name and delete them by name, perhaps one for each day of
the week.  But how do you clear the automatic ones (e.g. names unknown)
without clearing the named ones?

Thanks.

Lou.

On Fri, Nov 2, 2018 at 12:28 PM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Fri, Nov 2, 2018 at 5:15 PM Lou DeGenaro 
> wrote:
>
>> I'm looking to hear how others are coping with snapshots.
>>
>> According to the doc:
>> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupDeleteSnapshot.html
>>
>> *When taking a snapshot, previous snapshot files are not automatically
>> deleted. You should remove old snapshots that are no longer needed.*
>>
>> *The nodetool clearsnapshot
>> 
>> command removes all existing snapshot files from the snapshot directory of
>> each keyspace. You should make it part of your back-up process to clear old
>> snapshots before taking a new one.*
>>
>> But if you delete first, then there is a window of time when no snapshot
>> exists until the new one is created.  And with a single snapshot there is
>> no recovery further back than it.
>>
> You can also delete specific snapshot, by passing its name to the
> clearsnapshot command.  For example, you could use snapshot date as part of
> the name.  This will also prevent removing snapshots which were taken for
> reasons other than backup, like the automatic snapshot due to running
> TRUNCATE or DROP commands, or any other snapshots which might have been
> created manually by the operators.
>
> Regards,
> --
> Alex
>
>


Multiple cluster for a single application

2018-11-05 Thread onmstester onmstester
Hi, One of my applications requires to create a cluster with more than 100 
nodes, I've read documents recommended to use clusters with less than 50 or 100 
nodes (Netflix got hundreds of clusters with less 100 nodes on each). Is it a 
good idea to use multiple clusters for a single application, just to decrease 
maintenance problems and system complexity/performance? If So, which one of 
below policies is more suitable to distribute data among clusters and Why? 1. 
each cluster' would be responsible for a specific partial set of tables only 
(table sizes are almost equal so easy calculations here) for example inserts to 
table X would go to cluster Y 2. shard data at loader level by some business 
logic grouping of data, for example all rows with some column starting with X 
would go to cluster Y I would appreciate sharing your experiences working with 
big clusters, problem encountered and solutions. Thanks in Advance Sent using 
Zoho Mail

Info about sstableloader

2018-11-05 Thread Kalyan Chakravarthy
Hi, 

I’m new to Cassandra, please help me with sstableloader. Thank you in advance. 

I’m trying to migrate data between two clusters which are on different networks.
 Migrating data from ‘c1’ to ‘c2’
Which one will be the source and which one will be destination?? 
And where should I run sstableloader command?? On c1 or c2??

Cheers 
LAD
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Exception when running sstableloader

2018-11-05 Thread Kalyan Chakravarthy
I’m trying to migrate data between two clusters on different networks. Ports: 
7001,7199,9046,9160 are open between them. But port:7000 is not open. When I 
run sstableloader command, got the following exception. 
Command:

:/a/cassandra/bin# ./sstableloader -d 
192.168.98.99/abc/cassandra/data/apps/ads-0fdd9ff0a7d711e89107ff9c3da22254

Error/Exception: 

Could not retrieve endpoint ranges:
org.apache.thrift.transport.TTransportException: Frame size (352518912) larger 
than max length (15728640)!
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:342)
at 
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:109)
Caused by: org.apache.thrift.transport.TTransportException: Frame size 
(352518912) larger than max length (15728640)!
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1368)
at 
org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1356)
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:304)
... 2 more




In yaml file,’ thrift_framed_transport_size_in_mb:’ is set to 15. So I have 
increased its value to 40. Even after increasing the 
‘thrift_framed_transport_size_in_mb: ‘ in yaml file, I’m getting the same 
error. 

What could be the solution for this. Can somebody please help me with this??

Cheers 
LAD


Compacting more than the actual used space

2018-11-05 Thread Pedro Gordo
Hi

We have an ongoing compaction for roughly 2.5 TB, but "nodetool status"
reports a load of 1.09 TB. Even if we take into account that the load
presented by "nodetool status" is the compressed size, I very much doubt
that compression would work to reduce from 2.5 TB to 1.09.
We can also take into account that, even if this is the biggest table,
there are other tables in the system, so the 1.09 TB reported is not just
for the table being compacted.

What could lead to results like this? We have 4 attached volumes for data
directories. Could this be a likely cause for such discrepancy?

Bonus question: changing the compaction throughput to 0 (removing the
throttling), had no impacts in the current compaction. Do new compaction
throughput values only come into effect when a new compaction kicks in?

Cheers
Pedro Gordo


Re: Compacting more than the actual used space

2018-11-05 Thread Alexander Dejanovski
You can check cfstats to see what's the compression ratio.
It's totally possible to have the values you're reporting as a compression
ratio of 0.2 is quite common depending on the data you're storing
(compressed size is then 20% of the original data).

Compaction throughput changes are taken into account for running
compactions starting with Cassandra 2.2 if I'm correct. Your compaction
could be bound by cpu, not I/O in that case.

Cheers

Le lun. 5 nov. 2018 à 20:41, Pedro Gordo  a
écrit :

> Hi
>
> We have an ongoing compaction for roughly 2.5 TB, but "nodetool status"
> reports a load of 1.09 TB. Even if we take into account that the load
> presented by "nodetool status" is the compressed size, I very much doubt
> that compression would work to reduce from 2.5 TB to 1.09.
> We can also take into account that, even if this is the biggest table,
> there are other tables in the system, so the 1.09 TB reported is not just
> for the table being compacted.
>
> What could lead to results like this? We have 4 attached volumes for data
> directories. Could this be a likely cause for such discrepancy?
>
> Bonus question: changing the compaction throughput to 0 (removing the
> throttling), had no impacts in the current compaction. Do new compaction
> throughput values only come into effect when a new compaction kicks in?
>
> Cheers
>
> Pedro Gordo
>
-- 
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Compacting more than the actual used space

2018-11-05 Thread Pedro Gordo
Hi Alexander

Thanks. Using the compression ratio, the sizes check out.

Regarding the new values for compaction throughput, that explains it then.
We are using 2.1. :-)

Cheers
Pedro Gordo


On Mon, 5 Nov 2018 at 19:53, Alexander Dejanovski 
wrote:

> You can check cfstats to see what's the compression ratio.
> It's totally possible to have the values you're reporting as a compression
> ratio of 0.2 is quite common depending on the data you're storing
> (compressed size is then 20% of the original data).
>
> Compaction throughput changes are taken into account for running
> compactions starting with Cassandra 2.2 if I'm correct. Your compaction
> could be bound by cpu, not I/O in that case.
>
> Cheers
>
> Le lun. 5 nov. 2018 à 20:41, Pedro Gordo  a
> écrit :
>
>> Hi
>>
>> We have an ongoing compaction for roughly 2.5 TB, but "nodetool status"
>> reports a load of 1.09 TB. Even if we take into account that the load
>> presented by "nodetool status" is the compressed size, I very much doubt
>> that compression would work to reduce from 2.5 TB to 1.09.
>> We can also take into account that, even if this is the biggest table,
>> there are other tables in the system, so the 1.09 TB reported is not just
>> for the table being compacted.
>>
>> What could lead to results like this? We have 4 attached volumes for data
>> directories. Could this be a likely cause for such discrepancy?
>>
>> Bonus question: changing the compaction throughput to 0 (removing the
>> throttling), had no impacts in the current compaction. Do new compaction
>> throughput values only come into effect when a new compaction kicks in?
>>
>> Cheers
>>
>> Pedro Gordo
>>
> --
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: snapshot strategy?

2018-11-05 Thread Alain RODRIGUEZ
Hello Lou,

how do you clear the automatic ones (e.g. names unknown) without clearing
> the named ones?
>

The option '-t' might be what you are looking for: 'nodetool clearsnapshot
-t nameOfMySnapshot'.

>From the documentation here:
http://cassandra.apache.org/doc/latest/tools/nodetool/clearsnapshot.html?highlight=clearsnapshot

Le lun. 5 nov. 2018 à 13:38, Lou DeGenaro  a écrit :

> The issue really is how to manage disk space.  It is certainly possible to
> take snapshots by name and delete them by name, perhaps one for each day of
> the week.  But how do you clear the automatic ones (e.g. names unknown)
> without clearing the named ones?
>
> Thanks.
>
> Lou.
>
> On Fri, Nov 2, 2018 at 12:28 PM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Fri, Nov 2, 2018 at 5:15 PM Lou DeGenaro 
>> wrote:
>>
>>> I'm looking to hear how others are coping with snapshots.
>>>
>>> According to the doc:
>>> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupDeleteSnapshot.html
>>>
>>> *When taking a snapshot, previous snapshot files are not automatically
>>> deleted. You should remove old snapshots that are no longer needed.*
>>>
>>> *The nodetool clearsnapshot
>>> 
>>> command removes all existing snapshot files from the snapshot directory of
>>> each keyspace. You should make it part of your back-up process to clear old
>>> snapshots before taking a new one.*
>>>
>>> But if you delete first, then there is a window of time when no snapshot
>>> exists until the new one is created.  And with a single snapshot there is
>>> no recovery further back than it.
>>>
>> You can also delete specific snapshot, by passing its name to the
>> clearsnapshot command.  For example, you could use snapshot date as part of
>> the name.  This will also prevent removing snapshots which were taken for
>> reasons other than backup, like the automatic snapshot due to running
>> TRUNCATE or DROP commands, or any other snapshots which might have been
>> created manually by the operators.
>>
>> Regards,
>> --
>> Alex
>>
>>


Re: snapshot strategy?

2018-11-05 Thread Lou DeGenaro
Alain,

Thanks for the suggestion, but I think I did not make myself clear.  In
order to utilize disk space efficiently, we want to keep snapshots that are
no older than X days old while purging the older ones.   My understanding
is that there are 2 kinds of snapshots :  (a) those created on demand by
given name and (b) those create automatically, for example as a result of a
TRUNCATE, that do not have a well known name. To get rid of the given name
ones (a) seems straight forward.  How do I locate and get rid of the
automatically created  (b) ones?

Or if I am under some misconception, I'd be happily educated.

Thanks.

Lou.

On Mon, Nov 5, 2018 at 3:49 PM Alain RODRIGUEZ  wrote:

> Hello Lou,
>
> how do you clear the automatic ones (e.g. names unknown) without clearing
>> the named ones?
>>
>
> The option '-t' might be what you are looking for: 'nodetool clearsnapshot
> -t nameOfMySnapshot'.
>
> From the documentation here:
> http://cassandra.apache.org/doc/latest/tools/nodetool/clearsnapshot.html?highlight=clearsnapshot
>
> Le lun. 5 nov. 2018 à 13:38, Lou DeGenaro  a
> écrit :
>
>> The issue really is how to manage disk space.  It is certainly possible
>> to take snapshots by name and delete them by name, perhaps one for each day
>> of the week.  But how do you clear the automatic ones (e.g. names unknown)
>> without clearing the named ones?
>>
>> Thanks.
>>
>> Lou.
>>
>> On Fri, Nov 2, 2018 at 12:28 PM Oleksandr Shulgin <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>> On Fri, Nov 2, 2018 at 5:15 PM Lou DeGenaro 
>>> wrote:
>>>
 I'm looking to hear how others are coping with snapshots.

 According to the doc:
 https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupDeleteSnapshot.html

 *When taking a snapshot, previous snapshot files are not automatically
 deleted. You should remove old snapshots that are no longer needed.*

 *The nodetool clearsnapshot
 
 command removes all existing snapshot files from the snapshot directory of
 each keyspace. You should make it part of your back-up process to clear old
 snapshots before taking a new one.*

 But if you delete first, then there is a window of time when no
 snapshot exists until the new one is created.  And with a single snapshot
 there is no recovery further back than it.

>>> You can also delete specific snapshot, by passing its name to the
>>> clearsnapshot command.  For example, you could use snapshot date as part of
>>> the name.  This will also prevent removing snapshots which were taken for
>>> reasons other than backup, like the automatic snapshot due to running
>>> TRUNCATE or DROP commands, or any other snapshots which might have been
>>> created manually by the operators.
>>>
>>> Regards,
>>> --
>>> Alex
>>>
>>>


Query With Limit Clause

2018-11-05 Thread shalom sagges
Hi All,

If I run for example:
select * from myTable limit 3;

Does Cassandra do a full table scan regardless of the limit?

Thanks!