Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Hi,

i have created a POC cluster with 2 DC , each having 4 nodes with DSE 4.8.1
installed.

On issuing cassandra stress im getting an error  and data is not being
inserted:
*com.datastax.driver.core.exceptions.UnavailableException: Not enough
replica available for query at consistency LOCAL_ONE (1 required but only 0
alive)*

However, Im able to create keyspace, tables and insert data using cqlsh and
it is replicating fine to all the nodes.

Details of the cluster can be found below (all the nodes seem to be alive
and kicking):

$ nodetool status Datacenter: Analytics =
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving --
Address Load Tokens Owns Host ID Rack UN 10.41.55.15 209.4 KB 256 ?
ffc3b9a0-5d5c-4a3d-a99e-49d255731278 rack1 UN 10.41.55.21 227.44 KB 256 ?
c68deba4-b9a2-43fc-bb13-6af74c88c210 rack1 UN 10.41.55.23 222.71 KB 256 ?
8229aa87-af00-48fa-ad6b-3066d3dc0e58 rack1 UN 10.41.55.22 218.72 KB 256 ?
c7ba84fd-7992-41de-8c88-11574a72db99 rack1

Regards,
Bhuvan Rawal


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Alain RODRIGUEZ
Hi,

The the exact command you ran (stress-tool with options) could be useful to
help you on that.

However, Im able to create keyspace, tables and insert data using cqlsh and
> it is replicating fine to all the nodes.


Having the schema might be useful too.

Did you ran the cqlsh and the stress-tool from the same server ? If not,
you might want to check the port you use (9042/9160/...) are open.
Also, cqlsh uses local_one by default too. If both commands were run
against the same DC, from the same machine they should behave the same way.
Are they ?

C*heers,

-
Alain

The Last Pickle
http://www.thelastpickle.com


2016-01-22 9:57 GMT+01:00 Bhuvan Rawal :

> Hi,
>
> i have created a POC cluster with 2 DC , each having 4 nodes with DSE
> 4.8.1 installed.
>
> On issuing cassandra stress im getting an error  and data is not being
> inserted:
> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
> replica available for query at consistency LOCAL_ONE (1 required but only 0
> alive)*
>
> However, Im able to create keyspace, tables and insert data using cqlsh
> and it is replicating fine to all the nodes.
>
> Details of the cluster can be found below (all the nodes seem to be alive
> and kicking):
>
> $ nodetool status Datacenter: Analytics =
> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
> = Status=Up/Down |/
> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229aa87-af00-48fa-ad6b-3066d3dc0e58
> rack1 UN 10.41.55.22 218.72 KB 256 ? c7ba84fd-7992-41de-8c88-11574a72db99
> rack1
>
> Regards,
> Bhuvan Rawal
>
>
>


RE: Using TTL for data purge

2016-01-22 Thread SEAN_R_DURITY
An upsert is a second insert. Cassandra’s sstables are immutable. There are no 
real “overwrites” (of the data on disk). It is another record/row. Upon read, 
it acts like an overwrite, because Cassandra will read both inserts and take 
the last one in as the correct data. This strategy will work for changing the 
TTL (and anything else that changes in the data).

Compaction creates a new sstable from existing ones. It will (if the inserts 
are in the compacted sstables) write only the latest data, so the older insert 
is effectively deleted/dropped from the new sstable now on disk.

As I understand TTL, if there is a compaction of a cell (or row) with a TTL 
that has been reached, a tombstone will be written.

Sean Durity – Lead Cassandra Admin
Big DATA Team
For support, create a 
JIRA

From: Joseph TechMails [mailto:jaalex.t...@gmail.com]
Sent: Wednesday, December 30, 2015 3:59 AM
To: user@cassandra.apache.org
Subject: Re: Using TTL for data purge

Thanks, Sean. Our usecase is to delete records after few months of inactivity, 
and that period is fixed, but the TTL could get reset if the record is accessed 
within that timeframe - similar to extending a session. All reads are done 
based on the key, and there would be multiple upserts (all columns are 
re-INSERTed, including TTL) while it's active, so it's not exactly 
write-once/read-many. Are there any overheads for processes like compaction due 
to this overwriting of TTL? . I guess reads won't be affected since it's always 
done with the key, and won't have to filter out tombstones.

Regarding the data size, i could see a small decrease in the disk usage (du) of 
the "data" directory immediately after the rows with TTL expired, and still 
further reduction after running compaction on the CF (though this wasn't 
replicable always). Since the tombstones should ideally stay for 10 days, i 
assume this observation is not related to data expiry. Please confirm

Thanks,
Joseph


On Tue, Dec 29, 2015 at 11:20 PM, 
mailto:sean_r_dur...@homedepot.com>> wrote:
If you know how long the records should last, TTL is a good way to go. Remember 
that neither TTL or deletes are right-away purge strategies. Each inserts a 
special record called a tombstone to indicate a deleted record. After 
compaction (that is after gc_grace_seconds for the table, default 10 days), the 
data will be removed and you will regain disk space.

If the data is relatively volatile and read speeds are important, you might 
look at leveled compaction, though it can keep your nodes a bit busier than 
size-tiered. (An issue with size-tiered, over time, is that the tombstoned data 
in the larger and older sstables may rarely, if ever, get compacted out.)


Sean Durity – Lead Cassandra Admin
From: jaalex.tech [mailto:jaalex.t...@gmail.com]
Sent: Tuesday, December 22, 2015 4:36 AM
To: user@cassandra.apache.org
Subject: Using TTL for data purge

Hi,

I'm looking for suggestions/caveats on using TTL as a subsitute for a manual 
data purge job.

We have few tables that hold user information - this could be guest or 
registered users, and there could be between 500K to 1M records created per day 
per table. Currently, these tables have a secondary indexed updated_date column 
which is populated on each update. However, we have been getting timeouts when 
running queries using updated_date when the number of records are high, so i 
don't think this would be a reliable option in the long term when we need to 
purge records that have not been used for the last X days.

In this scenario, is it advisable to include a high enough TTL (i.e the amount 
of time we want these to last, could be 3 to 6 months) when inserting/updating 
records?

There could be cases where the TTL may get reset after couple of days/weeks, 
when the user visits the site again.

The tables have fixed number of columns, except for one which has a clustering 
key, and may have max 10 entries per  partition key.

I need to know the overhead of having so many rows with TTL hanging around for 
a relatively longer duration (weeks/months), and the impacts it could have on 
performance/storage. If this is not a recommended approach, what would be an 
alternate design which could be used for a manual purge job, without using 
secondary indices.

We are using Cassandra 2.0.x.

Thanks,
Joseph




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 

Re: Using TTL for data purge

2016-01-22 Thread Jeff Jirsa
"As I understand TTL, if there is a compaction of a cell (or row) with a TTL 
that has been reached, a tombstone will be written.”

The expiring cell is treated as a tombstone once it reaches it’s end of life, 
it does not write an additional tombstone to disk.



From:  "sean_r_dur...@homedepot.com"
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, January 22, 2016 at 7:27 AM
To:  "user@cassandra.apache.org"
Subject:  RE: Using TTL for data purge

An upsert is a second insert. Cassandra’s sstables are immutable. There are no 
real “overwrites” (of the data on disk). It is another record/row. Upon read, 
it acts like an overwrite, because Cassandra will read both inserts and take 
the last one in as the correct data. This strategy will work for changing the 
TTL (and anything else that changes in the data).

 

Compaction creates a new sstable from existing ones. It will (if the inserts 
are in the compacted sstables) write only the latest data, so the older insert 
is effectively deleted/dropped from the new sstable now on disk.

 

As I understand TTL, if there is a compaction of a cell (or row) with a TTL 
that has been reached, a tombstone will be written.

 

Sean Durity – Lead Cassandra Admin

Big DATA Team

For support, create a JIRA

 

From: Joseph TechMails [mailto:jaalex.t...@gmail.com] 
Sent: Wednesday, December 30, 2015 3:59 AM
To: user@cassandra.apache.org
Subject: Re: Using TTL for data purge

 

Thanks, Sean. Our usecase is to delete records after few months of inactivity, 
and that period is fixed, but the TTL could get reset if the record is accessed 
within that timeframe - similar to extending a session. All reads are done 
based on the key, and there would be multiple upserts (all columns are 
re-INSERTed, including TTL) while it's active, so it's not exactly 
write-once/read-many. Are there any overheads for processes like compaction due 
to this overwriting of TTL? . I guess reads won't be affected since it's always 
done with the key, and won't have to filter out tombstones.

 

Regarding the data size, i could see a small decrease in the disk usage (du) of 
the "data" directory immediately after the rows with TTL expired, and still 
further reduction after running compaction on the CF (though this wasn't 
replicable always). Since the tombstones should ideally stay for 10 days, i 
assume this observation is not related to data expiry. Please confirm

 

Thanks,

Joseph

 

 

On Tue, Dec 29, 2015 at 11:20 PM,  wrote:

If you know how long the records should last, TTL is a good way to go. Remember 
that neither TTL or deletes are right-away purge strategies. Each inserts a 
special record called a tombstone to indicate a deleted record. After 
compaction (that is after gc_grace_seconds for the table, default 10 days), the 
data will be removed and you will regain disk space.

 

If the data is relatively volatile and read speeds are important, you might 
look at leveled compaction, though it can keep your nodes a bit busier than 
size-tiered. (An issue with size-tiered, over time, is that the tombstoned data 
in the larger and older sstables may rarely, if ever, get compacted out.)

 

 

Sean Durity – Lead Cassandra Admin

From: jaalex.tech [mailto:jaalex.t...@gmail.com] 
Sent: Tuesday, December 22, 2015 4:36 AM
To: user@cassandra.apache.org
Subject: Using TTL for data purge

 

Hi,

 

I'm looking for suggestions/caveats on using TTL as a subsitute for a manual 
data purge job. 

 

We have few tables that hold user information - this could be guest or 
registered users, and there could be between 500K to 1M records created per day 
per table. Currently, these tables have a secondary indexed updated_date column 
which is populated on each update. However, we have been getting timeouts when 
running queries using updated_date when the number of records are high, so i 
don't think this would be a reliable option in the long term when we need to 
purge records that have not been used for the last X days. 

 

In this scenario, is it advisable to include a high enough TTL (i.e the amount 
of time we want these to last, could be 3 to 6 months) when inserting/updating 
records? 

 

There could be cases where the TTL may get reset after couple of days/weeks, 
when the user visits the site again.

 

The tables have fixed number of columns, except for one which has a clustering 
key, and may have max 10 entries per  partition key.

 

I need to know the overhead of having so many rows with TTL hanging around for 
a relatively longer duration (weeks/months), and the impacts it could have on 
performance/storage. If this is not a recommended approach, what would be an 
alternate design which could be used for a manual purge job, without using 
secondary indices.

 

We are using Cassandra 2.0.x.

 

Thanks,

Joseph

 

 


The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else 

Re: Using TTL for data purge

2016-01-22 Thread Anuj Wadehra
Hi Joseph,

I am personally in favour of Second approach because I dont want to do lot of 
IO just because a user is accessing a site several times a day. 
Options I see:

1.If you are on SSDs, Test LCS and update TTL of all columns at each access. 
This will make sure that the system can tolerate the extra IO. Advantage: No 
scheduling job needed. Deletion is seemless. Improved read performace than STCS.
Disadvantage: To reinsert records with new TTL you would do read before write 
which is an Anti oattern and slow thing. Active users will cause unnecessary IO 
for just updating TTL.High IO due to LCS too.
2.Create a new table with user id key and last access time instead of relying 
on inbuilt secondary indexes. Overwrite the last access time at each access. 
Schedule jobs to read this table at regular intervals may be once a week and 
manually delete users from the main table based on the last access time. You 
can test using LCS with new table.
Advantage: Light weight writes for updating access time. Flexibility to update 
deletion logic.
Disadvantage: Manual scheduling job and code needs to be implemented. Scheduler 
would need a slow full table scan of users to know last access time. Full table 
scans could be done via token based parallel CQL queries for achieving 
performance. Using a Apache Spark job to find users to be purged would do that 
at tremendous speeds.
Secondary indexes are not suitable and dont scale well. I would suggest 
dropping them.



ThanksAnuj




 
 
  On Tue, 22 Dec, 2015 at 3:06 pm, jaalex.tech wrote:   
Hi,
I'm looking for suggestions/caveats on using TTL as a subsitute for a manual 
data purge job. 
We have few tables that hold user information - this could be guest or 
registered users, and there could be between 500K to 1M records created per day 
per table. Currently, these tables have a secondary indexed updated_date column 
which is populated on each update. However, we have been getting timeouts when 
running queries using updated_date when the number of records are high, so i 
don't think this would be a reliable option in the long term when we need to 
purge records that have not been used for the last X days. 
In this scenario, is it advisable to include a high enough TTL (i.e the amount 
of time we want these to last, could be 3 to 6 months) when inserting/updating 
records? 
There could be cases where the TTL may get reset after couple of days/weeks, 
when the user visits the site again.
The tables have fixed number of columns, except for one which has a clustering 
key, and may have max 10 entries per  partition key.
I need to know the overhead of having so many rows with TTL hanging around for 
a relatively longer duration (weeks/months), and the impacts it could have on 
performance/storage. If this is not a recommended approach, what would be an 
alternate design which could be used for a manual purge job, without using 
secondary indices.
We are using Cassandra 2.0.x.
Thanks,Joseph
  


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Sebastian Estevez
The output of `nodetool status` would help us diagnose.

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jan 22, 2016 at 1:39 PM, Bhuvan Rawal  wrote:

> Thanks for the response Alain,
>
> cqlsh> create keyspace mykeyspace WITH replication =
> {'class':'NetworkTopologyStrategy', 'Analytics':2, 'Cassandra':3}
> cqlsh> use mykeyspace;
> cqlsh:mykeyspace>create table mytable (id int primary key, name text,
> address text, phone text);
> cqlsh:mykeyspace> insert into mytable (id, name, address, phone) values
> (1, 'Kiyu','Texas', '555-1212'); # and other similar statement
> I then issued the below command from every node and found consistent
> results.
> cqlsh:mykeyspace> select * from mytable;
>
> // Then i repeated the above steps for NetworkTopologyStrategy and found
> same results
>
> I ran basic cassandra stress
> seed1 - seed of datacenter 1
>  $ cassandra-stress write n=5 -rate threads=4 -node any_random_ip
>  $ cassandra-stress write n=5 -rate threads=4 -node seed1
>  $ cassandra-stress write n=5 -rate threads=4 -node seed1,seed2
>  $ cassandra-stress write n=5 -rate threads=4 -node
> all_8_ip_comma_seperated
>  $ cassandra-stress write n=100 cl=one -mode native cql3 -schema
> keyspace="keyspace1" -pop seq=1..100 -node ip1,ip2,ip3,ip4
>
> All of them threw the exception
> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
> replica available for query at consistency LOCAL_ONE (1 required but only 0
> alive)*
>
>
> I have a feeling that the issue is with datacenter name for some reason,
> because in some config files I found DC name to be like DC1/DC2/DC3 in some
> it is like Cassandra/Analytics (The ones I had specified while
> installation). Im unsure which yaml/property file to look for correct
> inconsistency.
>
> (C*heers :) - im so tempted to copy that)
>
> Regards,
> Bhuvan
>
> On Fri, Jan 22, 2016 at 8:47 PM, Alain RODRIGUEZ 
> wrote:
>
>> Hi,
>>
>> The the exact command you ran (stress-tool with options) could be useful
>> to help you on that.
>>
>> However, Im able to create keyspace, tables and insert data using cqlsh
>>> and it is replicating fine to all the nodes.
>>
>>
>> Having the schema might be useful too.
>>
>> Did you ran the cqlsh and the stress-tool from the same server ? If not,
>> you might want to check the port you use (9042/9160/...) are open.
>> Also, cqlsh uses local_one by default too. If both commands were run
>> against the same DC, from the same machine they should behave the same way.
>> Are they ?
>>
>> C*heers,
>>
>> -
>> Alain
>>
>> The Last Pickle
>> http://www.thelastpickle.com
>>
>>
>> 2016-01-22 9:57 GMT+01:00 Bhuvan Rawal :
>>
>>> Hi,
>>>
>>> i have created a POC cluster with 2 DC , each having 4 nodes with DSE
>>> 4.8.1 installed.
>>>
>>> On issuing cassandra stress im getting an error  and data is not being
>>> inserted:
>>> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
>>> replica available for query at consistency LOCAL_ONE (1 required but only 0
>>> alive)*
>>>
>>> However, Im able to create keyspace, tables and insert data using cqlsh
>>> and it is replicating fine to all the nodes.
>>>
>>> Details of the cluster can be found below (all the nodes seem to be
>>> alive and kicking):
>>>
>>> $ nodetool status Datacenter: Analytics =
>>> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
>>> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
>>> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
>>> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
>>> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
>>> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
>>> = Status=Up/Down |/
>>> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
>>> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
>>> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
>>> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229a

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Hi Sebastian,

I had attached nodetool status output in previous mail, pasting it again :

$ nodetool status Datacenter: Analytics =
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving --
Address Load Tokens Owns Host ID Rack UN 10.41.55.15 209.4 KB 256 ?
ffc3b9a0-5d5c-4a3d-a99e-49d255731278 rack1 UN 10.41.55.21 227.44 KB 256 ?
c68deba4-b9a2-43fc-bb13-6af74c88c210 rack1 UN 10.41.55.23 222.71 KB 256 ?
8229aa87-af00-48fa-ad6b-3066d3dc0e58 rack1 UN 10.41.55.22 218.72 KB 256 ?
c7ba84fd-7992-41de-8c88-11574a72db99 rack1

Regards,
Bhuvan Rawal

On Sat, Jan 23, 2016 at 12:11 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> The output of `nodetool status` would help us diagnose.
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 1:39 PM, Bhuvan Rawal  wrote:
>
>> Thanks for the response Alain,
>>
>> cqlsh> create keyspace mykeyspace WITH replication =
>> {'class':'NetworkTopologyStrategy', 'Analytics':2, 'Cassandra':3}
>> cqlsh> use mykeyspace;
>> cqlsh:mykeyspace>create table mytable (id int primary key, name text,
>> address text, phone text);
>> cqlsh:mykeyspace> insert into mytable (id, name, address, phone) values
>> (1, 'Kiyu','Texas', '555-1212'); # and other similar statement
>> I then issued the below command from every node and found consistent
>> results.
>> cqlsh:mykeyspace> select * from mytable;
>>
>> // Then i repeated the above steps for NetworkTopologyStrategy and found
>> same results
>>
>> I ran basic cassandra stress
>> seed1 - seed of datacenter 1
>>  $ cassandra-stress write n=5 -rate threads=4 -node any_random_ip
>>  $ cassandra-stress write n=5 -rate threads=4 -node seed1
>>  $ cassandra-stress write n=5 -rate threads=4 -node seed1,seed2
>>  $ cassandra-stress write n=5 -rate threads=4 -node
>> all_8_ip_comma_seperated
>>  $ cassandra-stress write n=100 cl=one -mode native cql3 -schema
>> keyspace="keyspace1" -pop seq=1..100 -node ip1,ip2,ip3,ip4
>>
>> All of them threw the exception
>> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
>> replica available for query at consistency LOCAL_ONE (1 required but only 0
>> alive)*
>>
>>
>> I have a feeling that the issue is with datacenter name for some reason,
>> because in some config files I found DC name to be like DC1/DC2/DC3 in some
>> it is like Cassandra/Analytics (The ones I had specified while
>> installation). Im unsure which yaml/property file to look for correct
>> inconsistency.
>>
>> (C*heers :) - im so tempted to copy that)
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Jan 22, 2016 at 8:47 PM, Alain RODRIGUEZ 
>> wrote:
>>
>>> Hi,
>>>
>>> The the exact command you ran (stress-tool with options) could be useful
>>> to help you on that.
>>>
>>> However, Im able to create keyspace, tables and insert data using cqlsh
 and it is replicating fine to all the nodes.
>>>
>>>
>>> Having the schema might be useful too.
>>>
>>> Did you ran the cqlsh and the stress-tool from the same server ? If
>>> not, you might want to check the port you use (9042/9160/...) are open.
>>> Also, cqlsh uses local_one by default too. If both commands were run
>>> against the same DC, from the same machine they should behave the same way.
>>> Are they ?
>>>
>>> C*heers,
>>>
>>> -
>>> Alain
>>>
>>> The Last Pickle
>>> http://www.thelastpickle.com
>>>
>>>
>>> 2016-01-22 9:57 GMT+01:00 Bhuvan Rawal :
>>>
 Hi,

 i have created a POC cluster with 2 DC , each having 4 nodes with DSE
 4.8.1 installed.

 On issuing cassandra stress im getting an error  and data is not being
 insert

Re: Using TTL for data purge

2016-01-22 Thread Anuj Wadehra

Give a deep thought on your use case. Different user tables/types may have 
different purge strategy based on how frequently a user account type is usually 
accessed, whats the user count for each user type and so on.
ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Fri, 22 Jan, 2016 at 11:37 pm, Anuj Wadehra wrote: 
  Hi Joseph,

I am personally in favour of Second approach because I dont want to do lot of 
IO just because a user is accessing a site several times a day. 
Options I see:

1.If you are on SSDs, Test LCS and update TTL of all columns at each access. 
This will make sure that the system can tolerate the extra IO. Advantage: No 
scheduling job needed. Deletion is seemless. Improved read performace than STCS.
Disadvantage: To reinsert records with new TTL you would do read before write 
which is an Anti oattern and slow thing. Active users will cause unnecessary IO 
for just updating TTL.High IO due to LCS too.
2.Create a new table with user id key and last access time instead of relying 
on inbuilt secondary indexes. Overwrite the last access time at each access. 
Schedule jobs to read this table at regular intervals may be once a week and 
manually delete users from the main table based on the last access time. You 
can test using LCS with new table.
Advantage: Light weight writes for updating access time. Flexibility to update 
deletion logic.
Disadvantage: Manual scheduling job and code needs to be implemented. Scheduler 
would need a slow full table scan of users to know last access time. Full table 
scans could be done via token based parallel CQL queries for achieving 
performance. Using a Apache Spark job to find users to be purged would do that 
at tremendous speeds.
Secondary indexes are not suitable and dont scale well. I would suggest 
dropping them.



ThanksAnuj




 
 
  On Tue, 22 Dec, 2015 at 3:06 pm, jaalex.tech wrote:   
Hi,
I'm looking for suggestions/caveats on using TTL as a subsitute for a manual 
data purge job. 
We have few tables that hold user information - this could be guest or 
registered users, and there could be between 500K to 1M records created per day 
per table. Currently, these tables have a secondary indexed updated_date column 
which is populated on each update. However, we have been getting timeouts when 
running queries using updated_date when the number of records are high, so i 
don't think this would be a reliable option in the long term when we need to 
purge records that have not been used for the last X days. 
In this scenario, is it advisable to include a high enough TTL (i.e the amount 
of time we want these to last, could be 3 to 6 months) when inserting/updating 
records? 
There could be cases where the TTL may get reset after couple of days/weeks, 
when the user visits the site again.
The tables have fixed number of columns, except for one which has a clustering 
key, and may have max 10 entries per  partition key.
I need to know the overhead of having so many rows with TTL hanging around for 
a relatively longer duration (weeks/months), and the impacts it could have on 
performance/storage. If this is not a recommended approach, what would be an 
alternate design which could be used for a manual purge job, without using 
secondary indices.
We are using Cassandra 2.0.x.
Thanks,Joseph
  
  


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Thanks for the response Alain,

cqlsh> create keyspace mykeyspace WITH replication =
{'class':'NetworkTopologyStrategy', 'Analytics':2, 'Cassandra':3}
cqlsh> use mykeyspace;
cqlsh:mykeyspace>create table mytable (id int primary key, name text,
address text, phone text);
cqlsh:mykeyspace> insert into mytable (id, name, address, phone) values (1,
'Kiyu','Texas', '555-1212'); # and other similar statement
I then issued the below command from every node and found consistent
results.
cqlsh:mykeyspace> select * from mytable;

// Then i repeated the above steps for NetworkTopologyStrategy and found
same results

I ran basic cassandra stress
seed1 - seed of datacenter 1
 $ cassandra-stress write n=5 -rate threads=4 -node any_random_ip
 $ cassandra-stress write n=5 -rate threads=4 -node seed1
 $ cassandra-stress write n=5 -rate threads=4 -node seed1,seed2
 $ cassandra-stress write n=5 -rate threads=4 -node
all_8_ip_comma_seperated
 $ cassandra-stress write n=100 cl=one -mode native cql3 -schema
keyspace="keyspace1" -pop seq=1..100 -node ip1,ip2,ip3,ip4

All of them threw the exception
*com.datastax.driver.core.exceptions.UnavailableException: Not enough
replica available for query at consistency LOCAL_ONE (1 required but only 0
alive)*


I have a feeling that the issue is with datacenter name for some reason,
because in some config files I found DC name to be like DC1/DC2/DC3 in some
it is like Cassandra/Analytics (The ones I had specified while
installation). Im unsure which yaml/property file to look for correct
inconsistency.

(C*heers :) - im so tempted to copy that)

Regards,
Bhuvan

On Fri, Jan 22, 2016 at 8:47 PM, Alain RODRIGUEZ  wrote:

> Hi,
>
> The the exact command you ran (stress-tool with options) could be useful
> to help you on that.
>
> However, Im able to create keyspace, tables and insert data using cqlsh
>> and it is replicating fine to all the nodes.
>
>
> Having the schema might be useful too.
>
> Did you ran the cqlsh and the stress-tool from the same server ? If not,
> you might want to check the port you use (9042/9160/...) are open.
> Also, cqlsh uses local_one by default too. If both commands were run
> against the same DC, from the same machine they should behave the same way.
> Are they ?
>
> C*heers,
>
> -
> Alain
>
> The Last Pickle
> http://www.thelastpickle.com
>
>
> 2016-01-22 9:57 GMT+01:00 Bhuvan Rawal :
>
>> Hi,
>>
>> i have created a POC cluster with 2 DC , each having 4 nodes with DSE
>> 4.8.1 installed.
>>
>> On issuing cassandra stress im getting an error  and data is not being
>> inserted:
>> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
>> replica available for query at consistency LOCAL_ONE (1 required but only 0
>> alive)*
>>
>> However, Im able to create keyspace, tables and insert data using cqlsh
>> and it is replicating fine to all the nodes.
>>
>> Details of the cluster can be found below (all the nodes seem to be alive
>> and kicking):
>>
>> $ nodetool status Datacenter: Analytics =
>> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
>> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
>> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
>> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
>> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
>> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
>> = Status=Up/Down |/
>> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
>> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
>> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
>> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229aa87-af00-48fa-ad6b-3066d3dc0e58
>> rack1 UN 10.41.55.22 218.72 KB 256 ? c7ba84fd-7992-41de-8c88-11574a72db99
>> rack1
>>
>> Regards,
>> Bhuvan Rawal
>>
>>
>>
>


Re: Using TTL for data purge

2016-01-22 Thread Anuj Wadehra
On second thought, If you are anyways reading the user table on each website 
access and can afford extra IO, first option looks more appropriate as it will 
ease out the pain of manual purging maintenance and wont need full table scans.

ThanksAnuj


Sent from Yahoo Mail on Android 
 
  On Sat, 23 Jan, 2016 at 12:16 am, Anuj Wadehra wrote: 
  
Give a deep thought on your use case. Different user tables/types may have 
different purge strategy based on how frequently a user account type is usually 
accessed, whats the user count for each user type and so on.
ThanksAnuj

Sent from Yahoo Mail on Android 
 
 On Fri, 22 Jan, 2016 at 11:37 pm, Anuj Wadehra wrote:  
Hi Joseph,

I am personally in favour of Second approach because I dont want to do lot of 
IO just because a user is accessing a site several times a day. 
Options I see:

1.If you are on SSDs, Test LCS and update TTL of all columns at each access. 
This will make sure that the system can tolerate the extra IO. Advantage: No 
scheduling job needed. Deletion is seemless. Improved read performace than STCS.
Disadvantage: To reinsert records with new TTL you would do read before write 
which is an Anti oattern and slow thing. Active users will cause unnecessary IO 
for just updating TTL.High IO due to LCS too.
2.Create a new table with user id key and last access time instead of relying 
on inbuilt secondary indexes. Overwrite the last access time at each access. 
Schedule jobs to read this table at regular intervals may be once a week and 
manually delete users from the main table based on the last access time. You 
can test using LCS with new table.
Advantage: Light weight writes for updating access time. Flexibility to update 
deletion logic.
Disadvantage: Manual scheduling job and code needs to be implemented. Scheduler 
would need a slow full table scan of users to know last access time. Full table 
scans could be done via token based parallel CQL queries for achieving 
performance. Using a Apache Spark job to find users to be purged would do that 
at tremendous speeds.
Secondary indexes are not suitable and dont scale well. I would suggest 
dropping them.



ThanksAnuj




 
 
 On Tue, 22 Dec, 2015 at 3:06 pm, jaalex.tech wrote:  Hi,
I'm looking for suggestions/caveats on using TTL as a subsitute for a manual 
data purge job. 
We have few tables that hold user information - this could be guest or 
registered users, and there could be between 500K to 1M records created per day 
per table. Currently, these tables have a secondary indexed updated_date column 
which is populated on each update. However, we have been getting timeouts when 
running queries using updated_date when the number of records are high, so i 
don't think this would be a reliable option in the long term when we need to 
purge records that have not been used for the last X days. 
In this scenario, is it advisable to include a high enough TTL (i.e the amount 
of time we want these to last, could be 3 to 6 months) when inserting/updating 
records? 
There could be cases where the TTL may get reset after couple of days/weeks, 
when the user visits the site again.
The tables have fixed number of columns, except for one which has a clustering 
key, and may have max 10 entries per  partition key.
I need to know the overhead of having so many rows with TTL hanging around for 
a relatively longer duration (weeks/months), and the impacts it could have on 
performance/storage. If this is not a recommended approach, what would be an 
alternate design which could be used for a manual purge job, without using 
secondary indices.
We are using Cassandra 2.0.x.
Thanks,Joseph
  
  
  


Production with Single Node

2016-01-22 Thread John Lammers
After deploying a number of production systems with up to 10 Cassandra
nodes each, we are looking at deploying a small, all-in-one-server system
with only a single, local node (Cassandra 2.1.11).

What are the risks of such a configuration?

The virtual disk would be running RAID 5 and the disk controller would have
a flash backed write-behind cache.

What's the best way to configure Cassandra and/or respecify the hardware
for an all-in-one-box solution?

Thanks-in-advance!

--John


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Sebastian Estevez
Sorry I missed that.

Both your nodetool status and keyspace replication settings say Cassandra
and Analytics for the DC names. I'm not sure where you're seeing DC1, DC2,
etc. and why you suspect that is the problem.

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jan 22, 2016 at 1:45 PM, Bhuvan Rawal  wrote:

> Hi Sebastian,
>
> I had attached nodetool status output in previous mail, pasting it again :
>
> $ nodetool status Datacenter: Analytics =
> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
> = Status=Up/Down |/
> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229aa87-af00-48fa-ad6b-3066d3dc0e58
> rack1 UN 10.41.55.22 218.72 KB 256 ? c7ba84fd-7992-41de-8c88-11574a72db99
> rack1
>
> Regards,
> Bhuvan Rawal
>
> On Sat, Jan 23, 2016 at 12:11 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> The output of `nodetool status` would help us diagnose.
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Fri, Jan 22, 2016 at 1:39 PM, Bhuvan Rawal 
>> wrote:
>>
>>> Thanks for the response Alain,
>>>
>>> cqlsh> create keyspace mykeyspace WITH replication =
>>> {'class':'NetworkTopologyStrategy', 'Analytics':2, 'Cassandra':3}
>>> cqlsh> use mykeyspace;
>>> cqlsh:mykeyspace>create table mytable (id int primary key, name text,
>>> address text, phone text);
>>> cqlsh:mykeyspace> insert into mytable (id, name, address, phone) values
>>> (1, 'Kiyu','Texas', '555-1212'); # and other similar statement
>>> I then issued the below command from every node and found consistent
>>> results.
>>> cqlsh:mykeyspace> select * from mytable;
>>>
>>> // Then i repeated the above steps for NetworkTopologyStrategy and found
>>> same results
>>>
>>> I ran basic cassandra stress
>>> seed1 - seed of datacenter 1
>>>  $ cassandra-stress write n=5 -rate threads=4 -node any_random_ip
>>>  $ cassandra-stress write n=5 -rate threads=4 -node seed1
>>>  $ cassandra-stress write n=5 -rate threads=4 -node seed1,seed2
>>>  $ cassandra-stress write n=5 -rate threads=4 -node
>>> all_8_ip_comma_seperated
>>>  $ cassandra-stress write n=100 cl=one -mode native cql3 -schema
>>> keyspace="keyspace1" -pop seq=1..100 -node ip1,ip2,ip3,ip4
>>>
>>> All of them threw the exception
>>> *com.datastax.driver.core.exceptions.UnavailableException: Not enough
>>> replica available for query at consistency LOCAL_ONE (1 required but only 0
>>> alive)*
>>>
>>>
>>> I have a feeling that the issue is with datacenter name for some reason,
>>> because in some config files I fou

Re: Production with Single Node

2016-01-22 Thread Jack Krupansky
The risks would be about the same as with a single-node Postgres or MySQL
database, except that you wouldn't have the benefit of full SQL.

How much data (rows, columns), what kind of load pattern (heavy write,
heavy update, heavy query), and what types of queries (primary key-only,
slices, filtering, secondary indexes, etc.)?

-- Jack Krupansky

On Fri, Jan 22, 2016 at 3:24 PM, John Lammers 
wrote:

> After deploying a number of production systems with up to 10 Cassandra
> nodes each, we are looking at deploying a small, all-in-one-server system
> with only a single, local node (Cassandra 2.1.11).
>
> What are the risks of such a configuration?
>
> The virtual disk would be running RAID 5 and the disk controller would
> have a flash backed write-behind cache.
>
> What's the best way to configure Cassandra and/or respecify the hardware
> for an all-in-one-box solution?
>
> Thanks-in-advance!
>
> --John
>
>


Re: Production with Single Node

2016-01-22 Thread Jonathan Haddad
My opinion:
http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/

TL;DR: the only reason to run 1 node in prod is if you're super broke but
know you'll need to scale up almost immediately after going to prod (maybe
after getting some funding).

If you're planning on doing it as a more permanent solution, you've chosen
the wrong database.

On Fri, Jan 22, 2016 at 12:30 PM Jack Krupansky 
wrote:

> The risks would be about the same as with a single-node Postgres or MySQL
> database, except that you wouldn't have the benefit of full SQL.
>
> How much data (rows, columns), what kind of load pattern (heavy write,
> heavy update, heavy query), and what types of queries (primary key-only,
> slices, filtering, secondary indexes, etc.)?
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 3:24 PM, John Lammers <
> john.lamm...@karoshealth.com> wrote:
>
>> After deploying a number of production systems with up to 10 Cassandra
>> nodes each, we are looking at deploying a small, all-in-one-server system
>> with only a single, local node (Cassandra 2.1.11).
>>
>> What are the risks of such a configuration?
>>
>> The virtual disk would be running RAID 5 and the disk controller would
>> have a flash backed write-behind cache.
>>
>> What's the best way to configure Cassandra and/or respecify the hardware
>> for an all-in-one-box solution?
>>
>> Thanks-in-advance!
>>
>> --John
>>
>>
>


Fwd: Production with Single Node

2016-01-22 Thread John Lammers
Thanks for your reply Jonathan.

We chose Cassandra for its incredible performance and robustness for large
sites.  Our application is designed from the ground up to take full
advantage of its column oriented data store (giving up the ability to also
run with a relational database backend).

The challenge now is a new market consisting of many small sites that
reportedly can't afford a multi-server solution.  These would be permanent,
one node systems.

--John

-- Forwarded message --
From: Jonathan Haddad 
Date: Fri, Jan 22, 2016 at 3:34 PM
Subject: Re: Production with Single Node
To: user@cassandra.apache.org


My opinion:
http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/

TL;DR: the only reason to run 1 node in prod is if you're super broke but
know you'll need to scale up almost immediately after going to prod (maybe
after getting some funding).

If you're planning on doing it as a more permanent solution, you've chosen
the wrong database.


Re: Production with Single Node

2016-01-22 Thread Dan Kinder
I could see this being desirable if you are deploying the exact same
application as you deploy in other places with many nodes, and you know the
load will be low. It may be a rare situation but in such a case you save
big effort by not having to change your application logic.

Not that I necessarily recommend it but to answer John's question: my
understanding is that you want to keep it snappy and low-latency you should
watch out for GC pause and consider your GC tuning carefully, it being a
single node will cause the whole show to stop. Presumably your load won't
be very high.

Also if you are concerned with durability you may want to consider changing
commitlog_sync

to
batch. I believe this is the only way to guarantee write durability with
one node. Again with the performance caveat; under high load it could cause
problems.

On Fri, Jan 22, 2016 at 12:34 PM, Jonathan Haddad  wrote:

> My opinion:
> http://rustyrazorblade.com/2013/09/cassandra-faq-can-i-start-with-a-single-node/
>
> TL;DR: the only reason to run 1 node in prod is if you're super broke but
> know you'll need to scale up almost immediately after going to prod (maybe
> after getting some funding).
>
> If you're planning on doing it as a more permanent solution, you've chosen
> the wrong database.
>
> On Fri, Jan 22, 2016 at 12:30 PM Jack Krupansky 
> wrote:
>
>> The risks would be about the same as with a single-node Postgres or MySQL
>> database, except that you wouldn't have the benefit of full SQL.
>>
>> How much data (rows, columns), what kind of load pattern (heavy write,
>> heavy update, heavy query), and what types of queries (primary key-only,
>> slices, filtering, secondary indexes, etc.)?
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 22, 2016 at 3:24 PM, John Lammers <
>> john.lamm...@karoshealth.com> wrote:
>>
>>> After deploying a number of production systems with up to 10 Cassandra
>>> nodes each, we are looking at deploying a small, all-in-one-server system
>>> with only a single, local node (Cassandra 2.1.11).
>>>
>>> What are the risks of such a configuration?
>>>
>>> The virtual disk would be running RAID 5 and the disk controller would
>>> have a flash backed write-behind cache.
>>>
>>> What's the best way to configure Cassandra and/or respecify the hardware
>>> for an all-in-one-box solution?
>>>
>>> Thanks-in-advance!
>>>
>>> --John
>>>
>>>
>>


-- 
Dan Kinder
Principal Software Engineer
Turnitin – www.turnitin.com
dkin...@turnitin.com


Fwd: Production with Single Node

2016-01-22 Thread John Lammers
Thanks for your reply Sebastian.

They are specialized data storage & retrieval systems.  The Cassandra
database is mainly used to store meta-data for searching.

Jonathan, I had seen your article.  But what are some of the technical
reasons why a one node Cassandra cluster is a bad idea?  I need ammo to
convince others.  Or failing that, what can be done to make this
configuration as safe & robust as possible?

Thanks!

--John

-- Forwarded message --
From: Sebastian Estevez 
Date: Fri, Jan 22, 2016 at 3:41 PM
Subject: Fwd: Production with Single Node
To: john.lamm...@karoshealth.com


Hi John,

Can you share a bit more about your use case? What's the purpose of these
little clusters? Jon has good points but I'm cautious to dismiss your idea
without hearing specifics about your plans.


All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

-- Forwarded message --
From: Jack Krupansky 
Date: Fri, Jan 22, 2016 at 3:30 PM
Subject: Re: Production with Single Node
To: user@cassandra.apache.org


The risks would be about the same as with a single-node Postgres or MySQL
database, except that you wouldn't have the benefit of full SQL.

How much data (rows, columns), what kind of load pattern (heavy write,
heavy update, heavy query), and what types of queries (primary key-only,
slices, filtering, secondary indexes, etc.)?

-- Jack Krupansky

On Fri, Jan 22, 2016 at 3:24 PM, John Lammers 
wrote:

> After deploying a number of production systems with up to 10 Cassandra
> nodes each, we are looking at deploying a small, all-in-one-server system
> with only a single, local node (Cassandra 2.1.11).
>
> What are the risks of such a configuration?
>
> The virtual disk would be running RAID 5 and the disk controller would
> have a flash backed write-behind cache.
>
> What's the best way to configure Cassandra and/or respecify the hardware
> for an all-in-one-box solution?
>
> Thanks-in-advance!
>
> --John
>
>


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
I had a look at the jira below:
https://issues.apache.org/jira/browse/CASSANDRA-7905

when i opened my cassandra-rackdc.properties i saw that DC names were DC1 &
DC2, rack name was RAC1 . Please note that this is the default
configuration, I have not modified any file.

There is another point of concern here which might be relevant to previous
one as well, im not able to login to cqlsh directly, i.e. I have to specify
ip as well even when im logged in to that machine.

$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1':
error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
Connection refused")})

whereas
$ cqlsh 
works fine

is that the reason why the cassandra-stress is not able to communicate with
other replicas?

On Sat, Jan 23, 2016 at 1:37 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Sorry I missed that.
>
> Both your nodetool status and keyspace replication settings say Cassandra
> and Analytics for the DC names. I'm not sure where you're seeing DC1, DC2,
> etc. and why you suspect that is the problem.
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 1:45 PM, Bhuvan Rawal  wrote:
>
>> Hi Sebastian,
>>
>> I had attached nodetool status output in previous mail, pasting it again :
>>
>> $ nodetool status Datacenter: Analytics =
>> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
>> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
>> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
>> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
>> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
>> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
>> = Status=Up/Down |/
>> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
>> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
>> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
>> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229aa87-af00-48fa-ad6b-3066d3dc0e58
>> rack1 UN 10.41.55.22 218.72 KB 256 ? c7ba84fd-7992-41de-8c88-11574a72db99
>> rack1
>>
>> Regards,
>> Bhuvan Rawal
>>
>> On Sat, Jan 23, 2016 at 12:11 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> The output of `nodetool status` would help us diagnose.
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image: twitter.png]
>>>  [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>>
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Fri, Jan 22, 2016 at 1:39 PM, Bhuvan Rawal 
>>> wrote:
>>>
 Thanks for the response Alain,

 cqlsh> create keyspace mykeyspace WITH replication =
 {'class':'NetworkTopologyStrategy', 'Analytics':2, 'Cassandra':3}
 cqlsh> use mykeyspace;
 cqlsh:mykeyspace>create table mytable (id int primary key, name text,
 address text, phone text);
 cqlsh:mykeyspace> insert into mytable (id, name, address, phone) values
 (1, 'Kiyu','Texas', '555-1212'); # and other similar statement
 I then issued the below command from every node and found consistent
 results.
>>>

Re: Production with Single Node

2016-01-22 Thread Jeff Jirsa
The value of cassandra is in its replication – as a single node solution, it’s 
slower and less flexible than alternatives

From:  John Lammers
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, January 22, 2016 at 12:57 PM
To:  Cassandra Mailing List
Subject:  Fwd: Production with Single Node

Thanks for your reply Sebastian.

They are specialized data storage & retrieval systems.  The Cassandra database 
is mainly used to store meta-data for searching.

Jonathan, I had seen your article.  But what are some of the technical reasons 
why a one node Cassandra cluster is a bad idea?  I need ammo to convince 
others.  Or failing that, what can be done to make this configuration as safe & 
robust as possible?

Thanks!

--John

-- Forwarded message --
From: Sebastian Estevez 
Date: Fri, Jan 22, 2016 at 3:41 PM
Subject: Fwd: Production with Single Node
To: john.lamm...@karoshealth.com


Hi John, 

Can you share a bit more about your use case? What's the purpose of these 
little clusters? Jon has good points but I'm cautious to dismiss your idea 
without hearing specifics about your plans.


All the best,



Sebastián Estévez

Solutions Architect |954 905 8615 | sebastian.este...@datastax.com






DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay.

-- Forwarded message --
From: Jack Krupansky 
Date: Fri, Jan 22, 2016 at 3:30 PM
Subject: Re: Production with Single Node
To: user@cassandra.apache.org


The risks would be about the same as with a single-node Postgres or MySQL 
database, except that you wouldn't have the benefit of full SQL. 

How much data (rows, columns), what kind of load pattern (heavy write, heavy 
update, heavy query), and what types of queries (primary key-only, slices, 
filtering, secondary indexes, etc.)?

-- Jack Krupansky

On Fri, Jan 22, 2016 at 3:24 PM, John Lammers  
wrote:
After deploying a number of production systems with up to 10 Cassandra nodes 
each, we are looking at deploying a small, all-in-one-server system with only a 
single, local node (Cassandra 2.1.11).

What are the risks of such a configuration?

The virtual disk would be running RAID 5 and the disk controller would have a 
flash backed write-behind cache.

What's the best way to configure Cassandra and/or respecify the hardware for an 
all-in-one-box solution?

Thanks-in-advance!

--John







smime.p7s
Description: S/MIME cryptographic signature


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Sebastian Estevez
>
> when i opened my cassandra-rackdc.properties i saw that DC names were DC1
> & DC2, rack name was RAC1 . Please note that this is the default
> configuration, I have not modified any file.


cassandra-rackdc.properties is only respected based on your snitch

.

$ cqlsh
> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
> Connection refused")})
> whereas
> $ cqlsh 
> works fine
> is that the reason why the cassandra-stress is not able to communicate
> with other replicas?


Are you providing the -node parameter to stress

?



All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jan 22, 2016 at 4:07 PM, Bhuvan Rawal  wrote:

> I had a look at the jira below:
> https://issues.apache.org/jira/browse/CASSANDRA-7905
>
> when i opened my cassandra-rackdc.properties i saw that DC names were DC1
> & DC2, rack name was RAC1 . Please note that this is the default
> configuration, I have not modified any file.
>
> There is another point of concern here which might be relevant to previous
> one as well, im not able to login to cqlsh directly, i.e. I have to specify
> ip as well even when im logged in to that machine.
>
> $ cqlsh
> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
> Connection refused")})
>
> whereas
> $ cqlsh 
> works fine
>
> is that the reason why the cassandra-stress is not able to communicate
> with other replicas?
>
> On Sat, Jan 23, 2016 at 1:37 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> Sorry I missed that.
>>
>> Both your nodetool status and keyspace replication settings say Cassandra
>> and Analytics for the DC names. I'm not sure where you're seeing DC1, DC2,
>> etc. and why you suspect that is the problem.
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Fri, Jan 22, 2016 at 1:45 PM, Bhuvan Rawal 
>> wrote:
>>
>>> Hi Sebastian,
>>>
>>> I had attached nodetool status output in previous mail, pasting it again
>>> :
>>>
>>> $ nodetool status Datacenter: Analytics =
>>> Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
>>> Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
>>> 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
>>> 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
>>> b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
>>> fb5c4fed-6e1e-4ea8-838d-358106906830 rack1 Datacenter: Cassandra
>>> = Status=Up/Down |/
>>> State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID
>>> Rack UN 10.41.55.15 209.4 KB 256 ? ffc3b9a0-5d5c-4a3d-a99e-49d255731278
>>> rack1 UN 10.41.55.21 227.44 KB 256 ? c68deba4-b9a2-43fc-bb13-6af74c88c210
>>> rack1 UN 10.41.55.23 222.71 KB 256 ? 8229aa87-af00-48fa-ad6b-3066d3dc0e58
>>> rack1 UN 10.41.55.22 218

Re: Production with Single Node

2016-01-22 Thread John Lammers
Jeff, that may be true for many ... but for our application, the
performance of a single Cassandra node blows the doors off Oracle and
PostgreSQL.


On Fri, Jan 22, 2016 at 4:24 PM, Jeff Jirsa 
wrote:

> The value of cassandra is in its replication – as a single node solution,
> it’s slower and less flexible than alternatives
>


>
> From: John Lammers
> Reply-To: "user@cassandra.apache.org"
> Date: Friday, January 22, 2016 at 12:57 PM
> To: Cassandra Mailing List
>
> Subject: Fwd: Production with Single Node
>
> Thanks for your reply Sebastian.
>
> They are specialized data storage & retrieval systems.  The Cassandra
> database is mainly used to store meta-data for searching.
>
> Jonathan, I had seen your article.  But what are some of the technical
> reasons why a one node Cassandra cluster is a bad idea?  I need ammo to
> convince others.  Or failing that, what can be done to make this
> configuration as safe & robust as possible?
>
> Thanks!
>
> --John
>
> -- Forwarded message --
> From: Sebastian Estevez 
> Date: Fri, Jan 22, 2016 at 3:41 PM
> Subject: Fwd: Production with Single Node
> To: john.lamm...@karoshealth.com
>
>
> Hi John,
>
> Can you share a bit more about your use case? What's the purpose of these
> little clusters? Jon has good points but I'm cautious to dismiss your idea
> without hearing specifics about your plans.
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect |954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] [image:
> facebook.png] [image: twitter.png]
> [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> -- Forwarded message --
> From: Jack Krupansky 
> Date: Fri, Jan 22, 2016 at 3:30 PM
> Subject: Re: Production with Single Node
> To: user@cassandra.apache.org
>
>
> The risks would be about the same as with a single-node Postgres or MySQL
> database, except that you wouldn't have the benefit of full SQL.
>
> How much data (rows, columns), what kind of load pattern (heavy write,
> heavy update, heavy query), and what types of queries (primary key-only,
> slices, filtering, secondary indexes, etc.)?
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 3:24 PM, John Lammers <
> john.lamm...@karoshealth.com> wrote:
>
>> After deploying a number of production systems with up to 10 Cassandra
>> nodes each, we are looking at deploying a small, all-in-one-server system
>> with only a single, local node (Cassandra 2.1.11).
>>
>> What are the risks of such a configuration?
>>
>> The virtual disk would be running RAID 5 and the disk controller would
>> have a flash backed write-behind cache.
>>
>> What's the best way to configure Cassandra and/or respecify the hardware
>> for an all-in-one-box solution?
>>
>> Thanks-in-advance!
>>
>> --John
>>
>>
>
>
>


Re: Production with Single Node

2016-01-22 Thread Jonathan Haddad
If you're going to go with a bunch of smaller, single node servers, use
Postgres.  It's going to be more flexible with a smaller memory footprint.
You could even use sqlite.

Would you run a single node zookeeper cluster?   Single node map reduce?
Single node HDFS?  I hope not.

Cassandra's strengths are high availability and linear scalability.  If
you're not planning on taking advantage of either of those you're using the
wrong tool for the job.

On Fri, Jan 22, 2016 at 1:25 PM Jeff Jirsa 
wrote:

> The value of cassandra is in its replication – as a single node solution,
> it’s slower and less flexible than alternatives
>
> From: John Lammers
> Reply-To: "user@cassandra.apache.org"
> Date: Friday, January 22, 2016 at 12:57 PM
> To: Cassandra Mailing List
>
> Subject: Fwd: Production with Single Node
>
> Thanks for your reply Sebastian.
>
> They are specialized data storage & retrieval systems.  The Cassandra
> database is mainly used to store meta-data for searching.
>
> Jonathan, I had seen your article.  But what are some of the technical
> reasons why a one node Cassandra cluster is a bad idea?  I need ammo to
> convince others.  Or failing that, what can be done to make this
> configuration as safe & robust as possible?
>
> Thanks!
>
> --John
>
> -- Forwarded message --
> From: Sebastian Estevez 
> Date: Fri, Jan 22, 2016 at 3:41 PM
> Subject: Fwd: Production with Single Node
> To: john.lamm...@karoshealth.com
>
>
> Hi John,
>
> Can you share a bit more about your use case? What's the purpose of these
> little clusters? Jon has good points but I'm cautious to dismiss your idea
> without hearing specifics about your plans.
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect |954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png] [image:
> facebook.png] [image: twitter.png]
> [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> -- Forwarded message --
> From: Jack Krupansky 
> Date: Fri, Jan 22, 2016 at 3:30 PM
> Subject: Re: Production with Single Node
> To: user@cassandra.apache.org
>
>
> The risks would be about the same as with a single-node Postgres or MySQL
> database, except that you wouldn't have the benefit of full SQL.
>
> How much data (rows, columns), what kind of load pattern (heavy write,
> heavy update, heavy query), and what types of queries (primary key-only,
> slices, filtering, secondary indexes, etc.)?
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 3:24 PM, John Lammers <
> john.lamm...@karoshealth.com> wrote:
>
>> After deploying a number of production systems with up to 10 Cassandra
>> nodes each, we are looking at deploying a small, all-in-one-server system
>> with only a single, local node (Cassandra 2.1.11).
>>
>> What are the risks of such a configuration?
>>
>> The virtual disk would be running RAID 5 and the disk controller would
>> have a flash backed write-behind cache.
>>
>> What's the best way to configure Cassandra and/or respecify the hardware
>> for an all-in-one-box solution?
>>
>> Thanks-in-advance!
>>
>> --John
>>
>>
>
>
>


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Yes im specifying -node parameter to stress, otherwise it throws network
connection failed.

Can you point me to a sample java application to test pushing data from
external server? Let's see if that works

On Sat, Jan 23, 2016 at 2:55 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> when i opened my cassandra-rackdc.properties i saw that DC names were DC1
>> & DC2, rack name was RAC1 . Please note that this is the default
>> configuration, I have not modified any file.
>
>
> cassandra-rackdc.properties is only respected based on your snitch
> 
> .
>
> $ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>> Connection refused")})
>> whereas
>> $ cqlsh 
>> works fine
>> is that the reason why the cassandra-stress is not able to communicate
>> with other replicas?
>
>
> Are you providing the -node parameter to stress
> 
> ?
>
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 4:07 PM, Bhuvan Rawal  wrote:
>
>> I had a look at the jira below:
>> https://issues.apache.org/jira/browse/CASSANDRA-7905
>>
>> when i opened my cassandra-rackdc.properties i saw that DC names were DC1
>> & DC2, rack name was RAC1 . Please note that this is the default
>> configuration, I have not modified any file.
>>
>> There is another point of concern here which might be relevant to
>> previous one as well, im not able to login to cqlsh directly, i.e. I have
>> to specify ip as well even when im logged in to that machine.
>>
>> $ cqlsh
>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>> Connection refused")})
>>
>> whereas
>> $ cqlsh 
>> works fine
>>
>> is that the reason why the cassandra-stress is not able to communicate
>> with other replicas?
>>
>> On Sat, Jan 23, 2016 at 1:37 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> Sorry I missed that.
>>>
>>> Both your nodetool status and keyspace replication settings say
>>> Cassandra and Analytics for the DC names. I'm not sure where you're seeing
>>> DC1, DC2, etc. and why you suspect that is the problem.
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image: twitter.png]
>>>  [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>>
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Fri, Jan 22, 2016 at 1:45 PM, Bhuvan Rawal 
>>> wrote:
>>>
 Hi Sebastian,

 I had attached nodetool status output in previous mail, pasting it
 again :

 $ nodetool status Datacenter: Analytics =
 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load
 Tokens Owns Host ID Rack UN 10.41.55.17 428.5 KB 256 ?
 39d6d585-e641-4046-9d0b-797356597b5e rack1 UN 10.41.55.19 404.44 KB 256 ?
 69edf930-efd9-4d74-a798-f3d4ac02e516 rack1 UN 10.41.55.18 423.21 KB 256 ?
 b74bab13-09b2-4760-bce9-c8ef05e50f6d rack1 UN 10.41.55.20 683.23 KB 256 ?
>

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Sebastian Estevez
https://github.com/brianmhess/cassandra-loader

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jan 22, 2016 at 4:37 PM, Bhuvan Rawal  wrote:

> Yes im specifying -node parameter to stress, otherwise it throws network
> connection failed.
>
> Can you point me to a sample java application to test pushing data from
> external server? Let's see if that works
>
> On Sat, Jan 23, 2016 at 2:55 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> when i opened my cassandra-rackdc.properties i saw that DC names were DC1
>>> & DC2, rack name was RAC1 . Please note that this is the default
>>> configuration, I have not modified any file.
>>
>>
>> cassandra-rackdc.properties is only respected based on your snitch
>> 
>> .
>>
>> $ cqlsh
>>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>>> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>>> Connection refused")})
>>> whereas
>>> $ cqlsh 
>>> works fine
>>> is that the reason why the cassandra-stress is not able to communicate
>>> with other replicas?
>>
>>
>> Are you providing the -node parameter to stress
>> 
>> ?
>>
>>
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Fri, Jan 22, 2016 at 4:07 PM, Bhuvan Rawal 
>> wrote:
>>
>>> I had a look at the jira below:
>>> https://issues.apache.org/jira/browse/CASSANDRA-7905
>>>
>>> when i opened my cassandra-rackdc.properties i saw that DC names were
>>> DC1 & DC2, rack name was RAC1 . Please note that this is the default
>>> configuration, I have not modified any file.
>>>
>>> There is another point of concern here which might be relevant to
>>> previous one as well, im not able to login to cqlsh directly, i.e. I have
>>> to specify ip as well even when im logged in to that machine.
>>>
>>> $ cqlsh
>>> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
>>> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
>>> Connection refused")})
>>>
>>> whereas
>>> $ cqlsh 
>>> works fine
>>>
>>> is that the reason why the cassandra-stress is not able to communicate
>>> with other replicas?
>>>
>>> On Sat, Jan 23, 2016 at 1:37 AM, Sebastian Estevez <
>>> sebastian.este...@datastax.com> wrote:
>>>
 Sorry I missed that.

 Both your nodetool status and keyspace replication settings say
 Cassandra and Analytics for the DC names. I'm not sure where you're seeing
 DC1, DC2, etc. and why you suspect that is the problem.

 All the best,


 [image: datastax_logo.png] 

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png]  [image:
 facebook.png]  [image: twitter.png]
  [image: g+.png]
 
 
 

Re: Production with Single Node

2016-01-22 Thread John Lammers
Thanks for your reply Jonathan.

We usually deploy clusters of application nodes running on a Cassandra
database cluster, often with two data centers.  Our application is married
to / designed for Cassandra and we can't support any relational database
without rearchitecting and rewriting a lot of code.

For these small sites, we need to scale *down*, not up.

Like it says in Sebastián's email signature "predictably scalable to any
size," only the size this time is smaller, not larger.

--
John Lammers | karoshealth

+1 519 594 0940 x225 | Skype: johnatkaros
7 Father David Bauer Drive
Waterloo, ON, N2L 0A2, Canada
www.karoshealth.com

On Fri, Jan 22, 2016 at 4:32 PM, Jonathan Haddad  wrote:

> If you're going to go with a bunch of smaller, single node servers, use
> Postgres.  It's going to be more flexible with a smaller memory footprint.
> You could even use sqlite.
>
> Would you run a single node zookeeper cluster?   Single node map reduce?
> Single node HDFS?  I hope not.
>
> Cassandra's strengths are high availability and linear scalability.  If
> you're not planning on taking advantage of either of those you're using the
> wrong tool for the job.
>
> On Fri, Jan 22, 2016 at 1:25 PM Jeff Jirsa 
> wrote:
>
>> The value of cassandra is in its replication – as a single node solution,
>> it’s slower and less flexible than alternatives
>>
>> From: John Lammers
>> Reply-To: "user@cassandra.apache.org"
>> Date: Friday, January 22, 2016 at 12:57 PM
>> To: Cassandra Mailing List
>>
>> Subject: Fwd: Production with Single Node
>>
>> Thanks for your reply Sebastian.
>>
>> They are specialized data storage & retrieval systems.  The Cassandra
>> database is mainly used to store meta-data for searching.
>>
>> Jonathan, I had seen your article.  But what are some of the technical
>> reasons why a one node Cassandra cluster is a bad idea?  I need ammo to
>> convince others.  Or failing that, what can be done to make this
>> configuration as safe & robust as possible?
>>
>> Thanks!
>>
>> --John
>>
>> -- Forwarded message --
>> From: Sebastian Estevez 
>> Date: Fri, Jan 22, 2016 at 3:41 PM
>> Subject: Fwd: Production with Single Node
>> To: john.lamm...@karoshealth.com
>>
>>
>> Hi John,
>>
>> Can you share a bit more about your use case? What's the purpose of these
>> little clusters? Jon has good points but I'm cautious to dismiss your idea
>> without hearing specifics about your plans.
>>
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect |954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png] [image:
>> facebook.png] [image: twitter.png]
>> [image: g+.png]
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> -- Forwarded message --
>> From: Jack Krupansky 
>> Date: Fri, Jan 22, 2016 at 3:30 PM
>> Subject: Re: Production with Single Node
>> To: user@cassandra.apache.org
>>
>>
>> The risks would be about the same as with a single-node Postgres or MySQL
>> database, except that you wouldn't have the benefit of full SQL.
>>
>> How much data (rows, columns), what kind of load pattern (heavy write,
>> heavy update, heavy query), and what types of queries (primary key-only,
>> slices, filtering, secondary indexes, etc.)?
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 22, 2016 at 3:24 PM, John Lammers <
>> john.lamm...@karoshealth.com> wrote:
>>
>>> After deploying a number of production systems with up to 10 Cassandra
>>> nodes each, we are looking at deploying a small, all-in-one-server system
>>> with only a single, local node (Cassandra 2.1.11).
>>>
>>> What are the risks of such a configuration?
>>>
>>> The virtual disk would be running RAID 5 and the disk controller would
>>> have a flash backed write-behind cache.
>>>
>>> What's the best way to configure Cassandra and/or respecify the hardware
>>> for an all-in-one-box solution?
>>>
>>> Thanks-in-advance!
>>>
>>> --John
>>>
>>>
>>
>>
>>


Re: Production with Single Node

2016-01-22 Thread Jack Krupansky
Is single-node Cassandra has the performance (and capacity) you need and
the NoSQL data model and API are sufficient for your app, and your dev and
ops and support teams are already familiar with and committed to Cassandra,
and you don't need HA or scaling, then it sounds like you are set.

You asked about risks, and normally lack of HA and scaling are unacceptable
risks when people are looking at distributed databases.

Most people on this list are dedicated to and passionate about distributed
databases, HA, and scaling, so it is distinctly unsettling when somebody
comes along who isn't interested in and committed to those same three
qualities. But if single-node happens to work for you, then that's great.

-- Jack Krupansky

On Fri, Jan 22, 2016 at 4:32 PM, John Lammers 
wrote:

> Jeff, that may be true for many ... but for our application, the
> performance of a single Cassandra node blows the doors off Oracle and
> PostgreSQL.
>
>
> On Fri, Jan 22, 2016 at 4:24 PM, Jeff Jirsa 
> wrote:
>
>> The value of cassandra is in its replication – as a single node solution,
>> it’s slower and less flexible than alternatives
>>
>
>
>>
>> From: John Lammers
>> Reply-To: "user@cassandra.apache.org"
>> Date: Friday, January 22, 2016 at 12:57 PM
>> To: Cassandra Mailing List
>>
>> Subject: Fwd: Production with Single Node
>>
>> Thanks for your reply Sebastian.
>>
>> They are specialized data storage & retrieval systems.  The Cassandra
>> database is mainly used to store meta-data for searching.
>>
>> Jonathan, I had seen your article.  But what are some of the technical
>> reasons why a one node Cassandra cluster is a bad idea?  I need ammo to
>> convince others.  Or failing that, what can be done to make this
>> configuration as safe & robust as possible?
>>
>> Thanks!
>>
>> --John
>>
>> -- Forwarded message --
>> From: Sebastian Estevez 
>> Date: Fri, Jan 22, 2016 at 3:41 PM
>> Subject: Fwd: Production with Single Node
>> To: john.lamm...@karoshealth.com
>>
>>
>> Hi John,
>>
>> Can you share a bit more about your use case? What's the purpose of these
>> little clusters? Jon has good points but I'm cautious to dismiss your idea
>> without hearing specifics about your plans.
>>
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect |954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png] [image:
>> facebook.png] [image: twitter.png]
>> [image: g+.png]
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> -- Forwarded message --
>> From: Jack Krupansky 
>> Date: Fri, Jan 22, 2016 at 3:30 PM
>> Subject: Re: Production with Single Node
>> To: user@cassandra.apache.org
>>
>>
>> The risks would be about the same as with a single-node Postgres or MySQL
>> database, except that you wouldn't have the benefit of full SQL.
>>
>> How much data (rows, columns), what kind of load pattern (heavy write,
>> heavy update, heavy query), and what types of queries (primary key-only,
>> slices, filtering, secondary indexes, etc.)?
>>
>> -- Jack Krupansky
>>
>> On Fri, Jan 22, 2016 at 3:24 PM, John Lammers <
>> john.lamm...@karoshealth.com> wrote:
>>
>>> After deploying a number of production systems with up to 10 Cassandra
>>> nodes each, we are looking at deploying a small, all-in-one-server system
>>> with only a single, local node (Cassandra 2.1.11).
>>>
>>> What are the risks of such a configuration?
>>>
>>> The virtual disk would be running RAID 5 and the disk controller would
>>> have a flash backed write-behind cache.
>>>
>>> What's the best way to configure Cassandra and/or respecify the hardware
>>> for an all-in-one-box solution?
>>>
>>> Thanks-in-advance!
>>>
>>> --John
>>>
>>>
>>
>>
>>
>


Re: Production with Single Node

2016-01-22 Thread John Lammers
Thanks for your response Jack.

We are already sold on distributed databases, HA and scaling.  We just have
some small deployments coming up where there's no money for servers to run
multiple Cassandra nodes.

So, aside from the lack of HA, I'm asking if a single Cassandra node would
be viable in a production environment.  (There would be RAID 5 and the RAID
controller cache is backed by flash memory).

I'm asking because I'm concerned about using Cassandra in a way that it's
not designed for.  That to me is the unsettling aspect.

If this is a bad idea, give me the ammo I need to shoot it down.  I need
specific technical reasons.

Thanks!

--John

On Fri, Jan 22, 2016 at 4:47 PM, Jack Krupansky 
wrote:

> Is single-node Cassandra has the performance (and capacity) you need and
> the NoSQL data model and API are sufficient for your app, and your dev and
> ops and support teams are already familiar with and committed to Cassandra,
> and you don't need HA or scaling, then it sounds like you are set.
>
> You asked about risks, and normally lack of HA and scaling are
> unacceptable risks when people are looking at distributed databases.
>
> Most people on this list are dedicated to and passionate about distributed
> databases, HA, and scaling, so it is distinctly unsettling when somebody
> comes along who isn't interested in and committed to those same three
> qualities. But if single-node happens to work for you, then that's great.
>
> -- Jack Krupansky
>


Re: Production with Single Node

2016-01-22 Thread Jonathan Haddad
Have you considered running smaller clusters with 1 customer per keyspace?

If you're going to run 1 node (and you want to benchmark it properly) then
you probably want to switch commitlog_sync to 'batch' and redo your
performance tests.  Without it, you're risking data loss and you aren't
comparing apples to apples.  Something like postgres is giving your durable
writes by default.   Cassandra doesn't do that by default because you've
got redundant commit logs.

On Fri, Jan 22, 2016 at 1:48 PM Jack Krupansky 
wrote:

> Is single-node Cassandra has the performance (and capacity) you need and
> the NoSQL data model and API are sufficient for your app, and your dev and
> ops and support teams are already familiar with and committed to Cassandra,
> and you don't need HA or scaling, then it sounds like you are set.
>
> You asked about risks, and normally lack of HA and scaling are
> unacceptable risks when people are looking at distributed databases.
>
> Most people on this list are dedicated to and passionate about distributed
> databases, HA, and scaling, so it is distinctly unsettling when somebody
> comes along who isn't interested in and committed to those same three
> qualities. But if single-node happens to work for you, then that's great.
>
> -- Jack Krupansky
>
> On Fri, Jan 22, 2016 at 4:32 PM, John Lammers <
> john.lamm...@karoshealth.com> wrote:
>
>> Jeff, that may be true for many ... but for our application, the
>> performance of a single Cassandra node blows the doors off Oracle and
>> PostgreSQL.
>>
>>
>> On Fri, Jan 22, 2016 at 4:24 PM, Jeff Jirsa 
>> wrote:
>>
>>> The value of cassandra is in its replication – as a single node
>>> solution, it’s slower and less flexible than alternatives
>>>
>>
>>
>>>
>>> From: John Lammers
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Friday, January 22, 2016 at 12:57 PM
>>> To: Cassandra Mailing List
>>>
>>> Subject: Fwd: Production with Single Node
>>>
>>> Thanks for your reply Sebastian.
>>>
>>> They are specialized data storage & retrieval systems.  The Cassandra
>>> database is mainly used to store meta-data for searching.
>>>
>>> Jonathan, I had seen your article.  But what are some of the technical
>>> reasons why a one node Cassandra cluster is a bad idea?  I need ammo to
>>> convince others.  Or failing that, what can be done to make this
>>> configuration as safe & robust as possible?
>>>
>>> Thanks!
>>>
>>> --John
>>>
>>> -- Forwarded message --
>>> From: Sebastian Estevez 
>>> Date: Fri, Jan 22, 2016 at 3:41 PM
>>> Subject: Fwd: Production with Single Node
>>> To: john.lamm...@karoshealth.com
>>>
>>>
>>> Hi John,
>>>
>>> Can you share a bit more about your use case? What's the purpose of
>>> these little clusters? Jon has good points but I'm cautious to dismiss your
>>> idea without hearing specifics about your plans.
>>>
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect |954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png] [image:
>>> facebook.png] [image: twitter.png]
>>> [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>>
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> -- Forwarded message --
>>> From: Jack Krupansky 
>>> Date: Fri, Jan 22, 2016 at 3:30 PM
>>> Subject: Re: Production with Single Node
>>> To: user@cassandra.apache.org
>>>
>>>
>>> The risks would be about the same as with a single-node Postgres or
>>> MySQL database, except that you wouldn't have the benefit of full SQL.
>>>
>>> How much data (rows, columns), what kind of load pattern (heavy write,
>>> heavy update, heavy query), and what types of queries (primary key-only,
>>> slices, filtering, secondary indexes, etc.)?
>>>
>>> -- Jack Krupansky
>>>
>>> On Fri, Jan 22, 2016 at 3:24 PM, John Lammers <
>>> john.lamm...@karoshealth.com> wrote:
>>>
 After deploying a number of production systems with up to 10 Cassandra
 nodes each, we are looking at deploying a small, all-in-one-server system
 with only a single, local node (Cassandra 2.1.11).

 What are the risks of such a configuration?

 The virtual disk would be running RAID 5 and the disk controller would
 have a flash backed write-behind cache.

 What's the best way to configure

Re: Production with Single Node

2016-01-22 Thread Jack Krupansky
You do of course have the simple technical matters, most of which need to
be addressed with a proof of concept implementation, related to memory,
storage, latency, and throughput. I mean, with a scaled cluster you can
always add nodes to increase capacity and throughput, and reduce latency,
but with a single node you have limited flexibility.

Just to be clear, Cassandra is still not recommended for "fat nodes" - even
if you can fit tons of data on the node, you may not have the computes to
satisfy throughput and latency requirements. And if you don't have enough
system memory the amount of storage is irrelevant.

Back to my original question:
How much data (rows, columns), what kind of load pattern (heavy write,
heavy update, heavy query), and what types of queries (primary key-only,
slices, filtering, secondary indexes, etc.)?

I do recall a customer who ran into problems because they had SSD but only
a very limited amount so they were running out of storage. Having enough
system memory for file system caching and offheap data is important as well.


-- Jack Krupansky

On Fri, Jan 22, 2016 at 5:07 PM, John Lammers 
wrote:

> Thanks for your response Jack.
>
> We are already sold on distributed databases, HA and scaling.  We just
> have some small deployments coming up where there's no money for servers to
> run multiple Cassandra nodes.
>
> So, aside from the lack of HA, I'm asking if a single Cassandra node would
> be viable in a production environment.  (There would be RAID 5 and the RAID
> controller cache is backed by flash memory).
>
> I'm asking because I'm concerned about using Cassandra in a way that it's
> not designed for.  That to me is the unsettling aspect.
>
> If this is a bad idea, give me the ammo I need to shoot it down.  I need
> specific technical reasons.
>
> Thanks!
>
> --John
>
> On Fri, Jan 22, 2016 at 4:47 PM, Jack Krupansky 
> wrote:
>
>> Is single-node Cassandra has the performance (and capacity) you need and
>> the NoSQL data model and API are sufficient for your app, and your dev and
>> ops and support teams are already familiar with and committed to Cassandra,
>> and you don't need HA or scaling, then it sounds like you are set.
>>
>> You asked about risks, and normally lack of HA and scaling are
>> unacceptable risks when people are looking at distributed databases.
>>
>> Most people on this list are dedicated to and passionate about
>> distributed databases, HA, and scaling, so it is distinctly unsettling when
>> somebody comes along who isn't interested in and committed to those same
>> three qualities. But if single-node happens to work for you, then that's
>> great.
>>
>> -- Jack Krupansky
>>
>
>


Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Alain RODRIGUEZ
Hi Bhuvan,

I guess this info will be useful -->
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html#task_ds_rnm_r53_gk__view-schema-help

You could try defining you own settings for the schema
(NetworkTopologyStrategy,
RF of your choice).

Yet SimpleStrategy should work just fine with local_one since
https://issues.apache.org/jira/browse/CASSANDRA-6238. So your version
should be ok. I might have missed a lot of info recently though. That's the
flip side of holidays...

Anyway, there is no point in not using NTS if you have 2 DC, as they will
be considered as one. Also you should change this
https://github.com/apache/cassandra/blob/cassandra-2.1.12/conf/cassandra.yaml#L696
before going to production. I think it is fine for testing purpose.

I still think something else might be wrong. Maybe about your network, just
a guess. You could try this on the node you run the stress tool from:

$ telnet 9042 

There is another point of concern here which might be relevant to previous
> one as well, im not able to login to cqlsh directly, i.e. I have to specify
> ip as well even when im logged in to that machine.


This should help you with that (the "cqlsh environment variables" par at
the bottom)
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlsh.html#refCqlsh__using-files-as-input

You have to use your listen_address to connect with cqlsh / native off the
top of my head. You'll find out soon enough.

(C*heers :) - im so tempted to copy that)


Feel free, there is no trademark ™ © ® or whatever, I am glad to share.

Who knows, we might be starting today, a new trend :p.

C*heers ;-)

-
Alain

The Last Pickle
http://www.thelastpickle.com

2016-01-22 22:40 GMT+01:00 Sebastian Estevez :

> https://github.com/brianmhess/cassandra-loader
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 4:37 PM, Bhuvan Rawal  wrote:
>
>> Yes im specifying -node parameter to stress, otherwise it throws network
>> connection failed.
>>
>> Can you point me to a sample java application to test pushing data from
>> external server? Let's see if that works
>>
>> On Sat, Jan 23, 2016 at 2:55 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> when i opened my cassandra-rackdc.properties i saw that DC names were
 DC1 & DC2, rack name was RAC1 . Please note that this is the default
 configuration, I have not modified any file.
>>>
>>>
>>> cassandra-rackdc.properties is only respected based on your snitch
>>> 
>>> .
>>>
>>> $ cqlsh
 Connection error: ('Unable to connect to any servers', {'127.0.0.1':
 error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
 Connection refused")})
 whereas
 $ cqlsh 
 works fine
 is that the reason why the cassandra-stress is not able to communicate
 with other replicas?
>>>
>>>
>>> Are you providing the -node parameter to stress
>>> 
>>> ?
>>>
>>>
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image: twitter.png]
>>>  [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>>
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of ch

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Bhuvan Rawal
Getting same exception again. Should I use nodetool repair utility?

On Sat, Jan 23, 2016 at 3:10 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> https://github.com/brianmhess/cassandra-loader
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Fri, Jan 22, 2016 at 4:37 PM, Bhuvan Rawal  wrote:
>
>> Yes im specifying -node parameter to stress, otherwise it throws network
>> connection failed.
>>
>> Can you point me to a sample java application to test pushing data from
>> external server? Let's see if that works
>>
>> On Sat, Jan 23, 2016 at 2:55 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> when i opened my cassandra-rackdc.properties i saw that DC names were
 DC1 & DC2, rack name was RAC1 . Please note that this is the default
 configuration, I have not modified any file.
>>>
>>>
>>> cassandra-rackdc.properties is only respected based on your snitch
>>> 
>>> .
>>>
>>> $ cqlsh
 Connection error: ('Unable to connect to any servers', {'127.0.0.1':
 error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
 Connection refused")})
 whereas
 $ cqlsh 
 works fine
 is that the reason why the cassandra-stress is not able to communicate
 with other replicas?
>>>
>>>
>>> Are you providing the -node parameter to stress
>>> 
>>> ?
>>>
>>>
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image: twitter.png]
>>>  [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>>
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Fri, Jan 22, 2016 at 4:07 PM, Bhuvan Rawal 
>>> wrote:
>>>
 I had a look at the jira below:
 https://issues.apache.org/jira/browse/CASSANDRA-7905

 when i opened my cassandra-rackdc.properties i saw that DC names were
 DC1 & DC2, rack name was RAC1 . Please note that this is the default
 configuration, I have not modified any file.

 There is another point of concern here which might be relevant to
 previous one as well, im not able to login to cqlsh directly, i.e. I have
 to specify ip as well even when im logged in to that machine.

 $ cqlsh
 Connection error: ('Unable to connect to any servers', {'127.0.0.1':
 error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
 Connection refused")})

 whereas
 $ cqlsh 
 works fine

 is that the reason why the cassandra-stress is not able to communicate
 with other replicas?

 On Sat, Jan 23, 2016 at 1:37 AM, Sebastian Estevez <
 sebastian.este...@datastax.com> wrote:

> Sorry I missed that.
>
> Both your nodetool status and keyspace replication settings say
> Cassandra and Analytics for the DC names. I'm not sure where you're seeing
> DC1, DC2, etc. and why you suspect that is the problem.
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

Re: Getting error while issuing Cassandra stress

2016-01-22 Thread Alain RODRIGUEZ
>
> Should I use nodetool repair utility
>

That wouldn't help, this an anti-entropy mechanism (see
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsRepair.html#toolsRepair__description_unique_27
).

It is something really important too often left aside.

Yet, your issue here is not about consistency. The client can't find any
node in charge of the read / written tokens in the ring. This depends on
the topology, the replication factor and your network mainly. I think there
is something wrong in your setup. I would try this:

- Make sure connection / port are ok
- Try increasing the RF / Strategy in the stress tool
- Try with an other consistency level (not LOCAL_*, as mentioned here :
http://stackoverflow.com/questions/32055251/not-enough-replica-available-for-query-at-consistency-local-one-1-required-but
)

Good luck,

-
Alain

The Last Pickle
http://www.thelastpickle.com

2016-01-22 23:02 GMT+01:00 Bhuvan Rawal :

> Getting same exception again. Should I use nodetool repair utility?
>
> On Sat, Jan 23, 2016 at 3:10 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> https://github.com/brianmhess/cassandra-loader
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Fri, Jan 22, 2016 at 4:37 PM, Bhuvan Rawal 
>> wrote:
>>
>>> Yes im specifying -node parameter to stress, otherwise it throws network
>>> connection failed.
>>>
>>> Can you point me to a sample java application to test pushing data from
>>> external server? Let's see if that works
>>>
>>> On Sat, Jan 23, 2016 at 2:55 AM, Sebastian Estevez <
>>> sebastian.este...@datastax.com> wrote:
>>>
 when i opened my cassandra-rackdc.properties i saw that DC names were
> DC1 & DC2, rack name was RAC1 . Please note that this is the default
> configuration, I have not modified any file.


 cassandra-rackdc.properties is only respected based on your snitch
 
 .

 $ cqlsh
> Connection error: ('Unable to connect to any servers', {'127.0.0.1':
> error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error:
> Connection refused")})
> whereas
> $ cqlsh 
> works fine
> is that the reason why the cassandra-stress is not able to communicate
> with other replicas?


 Are you providing the -node parameter to stress
 
 ?



 All the best,


 [image: datastax_logo.png] 

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png]  [image:
 facebook.png]  [image: twitter.png]
  [image: g+.png]
 
 
 


 

 DataStax is the fastest, most scalable distributed database
 technology, delivering Apache Cassandra to the world’s most innovative
 enterprises. Datastax is built to be agile, always-on, and predictably
 scalable to any size. With more than 500 customers in 45 countries, 
 DataStax
 is the database technology and transactional backbone of choice for the
 worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Fri, Jan 22, 2016 at 4:07 PM, Bhuvan Rawal 
 wrote:

> I had a look at the jira below:
> https://issues.apache.org/jira/browse/CASSANDRA-7905
>
> when i opened my cassandra-rackdc.properties i saw that DC names were
> DC1 & DC2, rack name was RAC1 . Please note that this is the default
> configuration, I have not modified any file.
>
> There is another point of concern here which

automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Kevin Burton
Not sure if this is a bug or not or kind of a *fuzzy* area.

In 2.0 this worked fine.

We have a bunch of automated scripts that go through and create tables...
one per day.

at midnight UTC our entire CQL went offline.. .took down our whole app.  ;-/

The resolution was a full CQL shut down and then a drop table to remove the
bad tables...

pretty sure the issue was with schema disagreement.

All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS
only checks locally?

My work around is going to be to use zookeeper to create a mutex lock
during this operation.

Any other things I should avoid?


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Jonathan Haddad
Instead of using ZK, why not solve your concurrency problem by removing
it?  By that, I mean simply have 1 process that creates all your tables
instead of creating a race condition intentionally?

On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton  wrote:

> Not sure if this is a bug or not or kind of a *fuzzy* area.
>
> In 2.0 this worked fine.
>
> We have a bunch of automated scripts that go through and create tables...
> one per day.
>
> at midnight UTC our entire CQL went offline.. .took down our whole app.
>  ;-/
>
> The resolution was a full CQL shut down and then a drop table to remove
> the bad tables...
>
> pretty sure the issue was with schema disagreement.
>
> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS
> only checks locally?
>
> My work around is going to be to use zookeeper to create a mutex lock
> during this operation.
>
> Any other things I should avoid?
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Kevin Burton
I sort of agree.. but we are also considering migrating to hourly tables..
and what if the single script doesn't run.

I like having N nodes make changes like this because in my experience that
central / single box will usually fail at the wrong time :-/



On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad  wrote:

> Instead of using ZK, why not solve your concurrency problem by removing
> it?  By that, I mean simply have 1 process that creates all your tables
> instead of creating a race condition intentionally?
>
> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton  wrote:
>
>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>
>> In 2.0 this worked fine.
>>
>> We have a bunch of automated scripts that go through and create tables...
>> one per day.
>>
>> at midnight UTC our entire CQL went offline.. .took down our whole app.
>>  ;-/
>>
>> The resolution was a full CQL shut down and then a drop table to remove
>> the bad tables...
>>
>> pretty sure the issue was with schema disagreement.
>>
>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS
>> only checks locally?
>>
>> My work around is going to be to use zookeeper to create a mutex lock
>> during this operation.
>>
>> Any other things I should avoid?
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Production with Single Node

2016-01-22 Thread Anuj Wadehra
I think Jonathan said it earlier. You may be happy with the performance for now 
as you are using the same commitlog settings that you use in large clusters. 
Test the new setting recommended so that you know the real picture. Or be 
prepared to lose some data in case of failure.
Other than durability, you single node cluster would be Single Point of Failure 
for your site. RAID 5 will only protect you against a disk failure. But a 
server may be down for other reasons too. Question is :Are you ok with site 
going down?
I would suggest you to use hardware with smaller configuration to save on cost 
for smaller sites and go ahead with a 3 node minimum.That ways you will provide 
all the good features of your design irrespective of the site. Cassandra is 
known to work on commodity servers too. 


ThanksAnuj



Sent from Yahoo Mail on Android 
 
  On Sat, 23 Jan, 2016 at 4:23 am, Jack Krupansky 
wrote:   You do of course have the simple technical matters, most of which need 
to be addressed with a proof of concept implementation, related to memory, 
storage, latency, and throughput. I mean, with a scaled cluster you can always 
add nodes to increase capacity and throughput, and reduce latency, but with a 
single node you have limited flexibility.
Just to be clear, Cassandra is still not recommended for "fat nodes" - even if 
you can fit tons of data on the node, you may not have the computes to satisfy 
throughput and latency requirements. And if you don't have enough system memory 
the amount of storage is irrelevant.
Back to my original question:How much data (rows, columns), what kind of load 
pattern (heavy write, heavy update, heavy query), and what types of queries 
(primary key-only, slices, filtering, secondary indexes, etc.)?

I do recall a customer who ran into problems because they had SSD but only a 
very limited amount so they were running out of storage. Having enough system 
memory for file system caching and offheap data is important as well.

-- Jack Krupansky
On Fri, Jan 22, 2016 at 5:07 PM, John Lammers  
wrote:

Thanks for your response Jack.
We are already sold on distributed databases, HA and scaling.  We just have 
some small deployments coming up where there's no money for servers to run 
multiple Cassandra nodes.
So, aside from the lack of HA, I'm asking if a single Cassandra node would be 
viable in a production environment.  (There would be RAID 5 and the RAID 
controller cache is backed by flash memory).
I'm asking because I'm concerned about using Cassandra in a way that it's not 
designed for.  That to me is the unsettling aspect.
If this is a bad idea, give me the ammo I need to shoot it down.  I need 
specific technical reasons.
Thanks!
--John
On Fri, Jan 22, 2016 at 4:47 PM, Jack Krupansky  
wrote:

Is single-node Cassandra has the performance (and capacity) you need and the 
NoSQL data model and API are sufficient for your app, and your dev and ops and 
support teams are already familiar with and committed to Cassandra, and you 
don't need HA or scaling, then it sounds like you are set.
You asked about risks, and normally lack of HA and scaling are unacceptable 
risks when people are looking at distributed databases.
Most people on this list are dedicated to and passionate about distributed 
databases, HA, and scaling, so it is distinctly unsettling when somebody comes 
along who isn't interested in and committed to those same three qualities. But 
if single-node happens to work for you, then that's great.
-- Jack Krupansky



  


Re: Production with Single Node

2016-01-22 Thread Anuj Wadehra
And I think in a 3 node cluster, RAID 0 would do the job instead of RAID 5 . So 
you will need less storage to get same disk space. But you will get protection 
against disk failures and infact entire node failure.
Anuj

Sent from Yahoo Mail on Android 
 
  On Sat, 23 Jan, 2016 at 10:30 am, Anuj Wadehra wrote: 
  I think Jonathan said it earlier. You may be happy with the performance for 
now as you are using the same commitlog settings that you use in large 
clusters. Test the new setting recommended so that you know the real picture. 
Or be prepared to lose some data in case of failure.
Other than durability, you single node cluster would be Single Point of Failure 
for your site. RAID 5 will only protect you against a disk failure. But a 
server may be down for other reasons too. Question is :Are you ok with site 
going down?
I would suggest you to use hardware with smaller configuration to save on cost 
for smaller sites and go ahead with a 3 node minimum.That ways you will provide 
all the good features of your design irrespective of the site. Cassandra is 
known to work on commodity servers too. 


ThanksAnuj



Sent from Yahoo Mail on Android 
 
  On Sat, 23 Jan, 2016 at 4:23 am, Jack Krupansky 
wrote:   You do of course have the simple technical matters, most of which need 
to be addressed with a proof of concept implementation, related to memory, 
storage, latency, and throughput. I mean, with a scaled cluster you can always 
add nodes to increase capacity and throughput, and reduce latency, but with a 
single node you have limited flexibility.
Just to be clear, Cassandra is still not recommended for "fat nodes" - even if 
you can fit tons of data on the node, you may not have the computes to satisfy 
throughput and latency requirements. And if you don't have enough system memory 
the amount of storage is irrelevant.
Back to my original question:How much data (rows, columns), what kind of load 
pattern (heavy write, heavy update, heavy query), and what types of queries 
(primary key-only, slices, filtering, secondary indexes, etc.)?

I do recall a customer who ran into problems because they had SSD but only a 
very limited amount so they were running out of storage. Having enough system 
memory for file system caching and offheap data is important as well.

-- Jack Krupansky
On Fri, Jan 22, 2016 at 5:07 PM, John Lammers  
wrote:

Thanks for your response Jack.
We are already sold on distributed databases, HA and scaling.  We just have 
some small deployments coming up where there's no money for servers to run 
multiple Cassandra nodes.
So, aside from the lack of HA, I'm asking if a single Cassandra node would be 
viable in a production environment.  (There would be RAID 5 and the RAID 
controller cache is backed by flash memory).
I'm asking because I'm concerned about using Cassandra in a way that it's not 
designed for.  That to me is the unsettling aspect.
If this is a bad idea, give me the ammo I need to shoot it down.  I need 
specific technical reasons.
Thanks!
--John
On Fri, Jan 22, 2016 at 4:47 PM, Jack Krupansky  
wrote:

Is single-node Cassandra has the performance (and capacity) you need and the 
NoSQL data model and API are sufficient for your app, and your dev and ops and 
support teams are already familiar with and committed to Cassandra, and you 
don't need HA or scaling, then it sounds like you are set.
You asked about risks, and normally lack of HA and scaling are unacceptable 
risks when people are looking at distributed databases.
Most people on this list are dedicated to and passionate about distributed 
databases, HA, and scaling, so it is distinctly unsettling when somebody comes 
along who isn't interested in and committed to those same three qualities. But 
if single-node happens to work for you, then that's great.
-- Jack Krupansky



  
  


Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-01-22 Thread Jack Krupansky
I recall that there was some discussion last year about this issue of how
risky it is to do an automated CREATE TABLE IF NOT EXISTS due to the
unpredictable amount of time it takes for the table creation to fully
propagate around the full cluster. I think it was recognized as a real
problem, but without an immediate solution, so the recommended practice for
now is to only manually perform the operation (sure, it can be scripted,
but only under manual control) to assure that the operation completes and
that only one attempt is made to create the table. I don't recall if there
was a specific Jira assigned, and the antipattern doc doesn't appear to
reference this scenario. Maybe a committer can shed some more light.

-- Jack Krupansky

On Fri, Jan 22, 2016 at 10:29 PM, Kevin Burton  wrote:

> I sort of agree.. but we are also considering migrating to hourly tables..
> and what if the single script doesn't run.
>
> I like having N nodes make changes like this because in my experience that
> central / single box will usually fail at the wrong time :-/
>
>
>
> On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad 
> wrote:
>
>> Instead of using ZK, why not solve your concurrency problem by removing
>> it?  By that, I mean simply have 1 process that creates all your tables
>> instead of creating a race condition intentionally?
>>
>> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton  wrote:
>>
>>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>>
>>> In 2.0 this worked fine.
>>>
>>> We have a bunch of automated scripts that go through and create
>>> tables... one per day.
>>>
>>> at midnight UTC our entire CQL went offline.. .took down our whole app.
>>>  ;-/
>>>
>>> The resolution was a full CQL shut down and then a drop table to remove
>>> the bad tables...
>>>
>>> pretty sure the issue was with schema disagreement.
>>>
>>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT EXISTS
>>> only checks locally?
>>>
>>> My work around is going to be to use zookeeper to create a mutex lock
>>> during this operation.
>>>
>>> Any other things I should avoid?
>>>
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>>
>>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>