Primary/secondary index question / best practices?

2012-12-11 Thread Stephen.M.Thompson
Hi folks - I'm doing an informal proof-of-concept with Cassandra and I've been 
getting some conflicting information about how my data layout should go.  
Perhaps somebody could point me in the right direction.

I have a column family that will have billions of rows of data.  The data do 
not have any unique identifier intrinsically.  A given row will have, say, 50 
columns, and I'll need to be able to efficiently query on 8-10 of them.

I've been told that I should just pick the most common search item and make 
that my primary key, even though it will not be unique.  That seems contrary to 
the documentation I am seeing online.

>From my reading, it seems like I need a UUID column that will be my primary 
>index, and then I should set up secondary indexes on the 8-10 primary search 
>columns.  Am I understanding this correctly?  Any advice you can offer on this 
>would be tremendously helpful.  I'm quite limited in how specific I can be 
>about the data, of course.

Steve


RE: Primary/secondary index question / best practices?

2012-12-11 Thread Stephen.M.Thompson
Dean, thank you for your response.  To the second half of the query, I'm a 
little concerned about the secondary index approach since the indexes that I 
want to create are columns with high entropy.



For example, I would like to query by User name and IP address, values which 
are decidedly NOT like the pattern recommended in the Secondary Index field.   
The 8-10 columns I need to search by are all high a similar scatter rate.  
Since the documentation seems to suggest that this is a bad idea, what would 
the correct pattern look like?



In an RDBMS I would just slap an alternate key index on the table and let it 
roll.   It seems like maybe that is not the right approach for Cassandra?



Thanks again,

Steve



-Original Message-
From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
Sent: Tuesday, December 11, 2012 4:57 PM
To: user@cassandra.apache.org
Subject: Re: Primary/secondary index question / best practices?



Hard to help out on a design without specifics but here is some advice based on 
the limited information



Primary key : yes, must be cluster unique.  TimeUUID or UUIDPlayOrm has 
very unique TimeUUID like keys as in this one 7AL2S8Y.b1 (b1 is the hostname 
and the prefix is a "unique" timestamp but generated to a shorter string(ah, 
nice readable primary keys).



There are some patterns you can look into here that may help 
https://github.com/deanhiller/playorm/wiki/Patterns-Page



If you can partition your data virtually, it may help a lot so you can query 
into the partitions.



Later,

Dean



From: 
"stephen.m.thomp...@wellsfargo.com"
 
mailto:stephen.m.thomp...@wellsfargo.com>>

Reply-To: 
"user@cassandra.apache.org"
 
mailto:user@cassandra.apache.org>>

Date: Tuesday, December 11, 2012 2:49 PM

To: 
"user@cassandra.apache.org"
 
mailto:user@cassandra.apache.org>>

Subject: Primary/secondary index question / best practices?



m my reading, it seems like I need a UUID column that will be my primary index, 
and then I should set up secondary indexes on the 8-10 primary search columns.  
Am I understanding this correctly?  Any advice you can offer on this would be 
tremendously helpful.  I'm quite limited in how specific I can be about the 
data, of course.


Best Java Driver for Cassandra?

2012-12-13 Thread Stephen.M.Thompson
There seem to be a number of good options listed ... FireBrand and Hector seem 
to have the most attractive sites, but that doesn't necessarily mean anything.  
:)  Can anybody make a case for one of the drivers over another, especially in 
terms of which ones seem to be most used in major implementations?

Thanks
Steve


Partition maintenance

2012-12-18 Thread Stephen.M.Thompson
Hi folks.  Still working through the details of building out a Cassandra 
solution and I have an interesting requirement that I'm not sure how to 
implement in Cassandra:

In our current Oracle world, we have the data for this system partitioned by 
month, and each month the data that are now 18-months old are archived to 
tape/cold storage and then the partition for that month is dropped.  Is there a 
way to do something similar with Cassandra without destroying our overall 
performance?

Thanks in advance,
Steve


RE: Partition maintenance

2012-12-18 Thread Stephen.M.Thompson
Michael - That is one approach I have considered, but that also makes querying 
the system particularly onerous since every column family would require its own 
query – I don’t think there is any good way to “join” those, right?

Chris – that is an interesting concept, but as Viktor and Keith note, it seems 
to have problems.

Could we do this simply by mass deletes?  For example, if I created a column 
which was just /MM, then during our maintenance we could spool off records 
that match the month we are archiving, then do a bulk delete by that key.  We 
would need to have a secondary index for that, I would assume.


From: Michael Kjellman [mailto:mkjell...@barracuda.com]
Sent: Tuesday, December 18, 2012 11:15 AM
To: user@cassandra.apache.org
Subject: Re: Partition maintenance

You could make a column family for each period of time and then drop the column 
family when you want to destroy it. Before you drop it you could use the 
sstabletojson converter and write the json files out to tape.

Might make your life difficult however if you need an input split for map 
reduce between each time period because you would be limited to working on one 
column family at a time.

On Dec 18, 2012, at 8:09 AM, 
"stephen.m.thomp...@wellsfargo.com" 
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:
Hi folks.  Still working through the details of building out a Cassandra 
solution and I have an interesting requirement that I’m not sure how to 
implement in Cassandra:

In our current Oracle world, we have the data for this system partitioned by 
month, and each month the data that are now 18-months old are archived to 
tape/cold storage and then the partition for that month is dropped.  Is there a 
way to do something similar with Cassandra without destroying our overall 
performance?

Thanks in advance,
Steve

--
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: 
http://on.fb.me/UAdL4f
  ­­


Cassandra / Windows Server 2008

2013-01-04 Thread Stephen.M.Thompson
Hi folks - I have a Windows 2008 server that I'm trying to get Cassandra 
working on.  I have disabled the Windows Firewall for the moment but I still 
cannot connect to the server.

I have tried editing the cassandra.yaml to update the listen_address to the 
machine address as well as blank or commented out altogether - no change found 
at all.

Any suggestion at all would be most welcome!

-steve

SERVER STARTUP
(* snip *)
INFO 13:58:47,161 Binding thrift service to localhost/127.0.0.1:9160
(* snip *)


LOCAL CLIENT
(default/localhost)

C:\Java\apache-cassandra-1.1.7\bin>cassandra-cli
Starting Cassandra Client
Column Family assumptions read from x\assumptions.json
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.1.6

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown]

(Success!)

LOCAL CLIENT USING IP ADDRESS
(connecting to localhost but using ip address)

C:\Java\apache-cassandra-1.1.7\bin>cassandra-cli -h xxx.xxx.xxx.xxx
Starting Cassandra Client
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at 
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:79)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:255)
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 3 more
Exception connecting to xxx.xxx.xxx.xxx/9160. Reason: Connection refused: 
connect.
Column Family assumptions read from xxx\assumptions.json
Welcome to Cassandra CLI version 1.1.6

I get the same result trying to connect from a remote machine.


RE: Cassandra / Windows Server 2008

2013-01-04 Thread Stephen.M.Thompson
Good suggestion ... I added -Djava.net.preferIPv4Stack=true as a JVM arg 
cassandra.bat and got exactly the same result though.

Stephen Thompson
Wells Fargo Corporation
Internet Authentication & Fraud Prevention
704.427.3137 (W) | 704.807.3431 (C)

UPCOMING PTO:  JAN 14-18

This message may contain confidential and/or privileged information, and is 
intended for the use of the addressee only. If you are not the addressee or 
authorized to receive this for the addressee, you must not use, copy, disclose, 
or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply 
e-mail and delete this message. Thank you for your cooperation.

From: Michael Kjellman [mailto:mkjell...@barracuda.com]
Sent: Friday, January 04, 2013 2:26 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra / Windows Server 2008

Use linux ;)

More seriously, I'm wondering if it is binding to the IPV6 address? Is that 
enabled on that NIC? You could try disabling IPv6 and seeing if RPC binds 
correctly..

From: 
"stephen.m.thomp...@wellsfargo.com" 
mailto:stephen.m.thomp...@wellsfargo.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Friday, January 4, 2013 11:23 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Cassandra / Windows Server 2008

Hi folks - I have a Windows 2008 server that I'm trying to get Cassandra 
working on.  I have disabled the Windows Firewall for the moment but I still 
cannot connect to the server.

I have tried editing the cassandra.yaml to update the listen_address to the 
machine address as well as blank or commented out altogether - no change found 
at all.

Any suggestion at all would be most welcome!

-steve

SERVER STARTUP
(* snip *)
INFO 13:58:47,161 Binding thrift service to localhost/127.0.0.1:9160
(* snip *)


LOCAL CLIENT
(default/localhost)

C:\Java\apache-cassandra-1.1.7\bin>cassandra-cli
Starting Cassandra Client
Column Family assumptions read from x\assumptions.json
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.1.6

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown]

(Success!)

LOCAL CLIENT USING IP ADDRESS
(connecting to localhost but using ip address)

C:\Java\apache-cassandra-1.1.7\bin>cassandra-cli -h xxx.xxx.xxx.xxx
Starting Cassandra Client
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused: connect
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at 
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:79)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:255)
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 3 more
Exception connecting to xxx.xxx.xxx.xxx/9160. Reason: Connection refused: 
connect.
Column Family assumptions read from xxx\assumptions.json
Welcome to Cassandra CLI version 1.1.6

I get the same result trying to connect from a remote machine.

--
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: 
http://on.fb.me/UAdL4f
  


Date Index?

2013-01-08 Thread Stephen.M.Thompson
Hi folks -

Question about secondary indexes.  How are people doing date indexes?I have 
a date column in my tables in RDBMS that we use frequently, such as look at all 
records recorded in the last month.  What is the best practice for being able 
to do such a query?  It seems like there could be an advantage to adding a 
couple of columns like this:

{timestamp=2013/01/08 12:32:01 -0500}
{month=201301}
{day=08}

And then I could do secondary index on the month and day columns?  Would that 
be the best way to do something like this?  Is there any accepted "best 
practice" on this yet?

Thanks!
Steve


RE: Date Index?

2013-01-09 Thread Stephen.M.Thompson
Thanks Aaron, that helps.  So is there anything approaching a "consensus" of 
how to do something like this?

You mention a custom index ... is there a good document on creating a custom 
index?  Google doesn't show me much.

Steve

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, January 08, 2013 9:35 PM
To: user@cassandra.apache.org
Subject: Re: Date Index?

There has to be one equality clause in there, and thats the thing to cassandra 
uses to select of disk. The others are in memory filters.

So if you have one on the year+month you can have a simple select clause and it 
limits the amount of data that has to be read.

If you have like many 10's to 100's millions of things in the same month you 
may want to do some performance testing. There can still be times when you want 
to support common read paths by using custom / hand rolled indexes.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 6:05 AM, 
stephen.m.thomp...@wellsfargo.com 
wrote:


Hi folks -

Question about secondary indexes.  How are people doing date indexes?I have 
a date column in my tables in RDBMS that we use frequently, such as look at all 
records recorded in the last month.  What is the best practice for being able 
to do such a query?  It seems like there could be an advantage to adding a 
couple of columns like this:

{timestamp=2013/01/08 12:32:01 -0500}
{month=201301}
{day=08}

And then I could do secondary index on the month and day columns?  Would that 
be the best way to do something like this?  Is there any accepted "best 
practice" on this yet?

Thanks!
Steve



RE: Date Index?

2013-01-09 Thread Stephen.M.Thompson
OK ... I think I understand these.  So the idea is that you would use the time 
as the column key?

So when I might have something like this:

 | time=2013/01/03 08:19:01 | user=john | site=Chicago
 | time=2013/01/05 01:55:34 | user=john | site=Chicago
 | time=2013/01/09 16:21:42 | user=john | site=New York
 | time=2013/01/09 17:27:41 | user=susan | site=Boston
 | time=2013/01/09 17:27:41 | user=asok | site=Dallas

Instead it would be better to do something like this:

 | 2013/01/03 08:19:01= {user=john, site=Chicago} | 2013/01/05 
01:55:34={user=john, site=Chicago } | 2013/01/09 16:21:42={user=john, site=New 
York}
 | time=2013/01/09 17:27:41 = {user=susan, site=Boston}
 | time=2013/01/09 17:27:41={user=asok,site=Dallas}

Am I understanding this correctly?  This seems to have the HUGE disadvantage 
that I am no longer going to be able to create secondary indexes on user and 
site.  Is that right?

This seems like an impossible solution for my requirements.

Steve

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, January 09, 2013 2:21 PM
To: user@cassandra.apache.org
Subject: Re: Date Index?

If you're going to be looking data up by date ranges frequently, I strongly 
suggest you go with a typical time-series pattern (what Aaron described as 
hand-rolled indexes):

http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

If you're just running these date-based queries occasionally and the result set 
won't be huge, then using secondary indexes as you described is a convenient 
but not terribly efficient way to do that.

On Wed, Jan 9, 2013 at 10:04 AM, Michael Kjellman 
mailto:mkjell...@barracuda.com>> wrote:
ElasticSearch is a nice option for ordered lists. In 2.0 triggers would fit 
updates to elastic search much easier as right now it's in your application 
logic to detect changes and update.

On Jan 9, 2013, at 7:55 AM, 
"stephen.m.thomp...@wellsfargo.com" 
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:
Thanks Aaron, that helps.  So is there anything approaching a "consensus" of 
how to do something like this?

You mention a custom index ... is there a good document on creating a custom 
index?  Google doesn't show me much.

Steve

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, January 08, 2013 9:35 PM
To: user@cassandra.apache.org
Subject: Re: Date Index?

There has to be one equality clause in there, and thats the thing to cassandra 
uses to select of disk. The others are in memory filters.

So if you have one on the year+month you can have a simple select clause and it 
limits the amount of data that has to be read.

If you have like many 10's to 100's millions of things in the same month you 
may want to do some performance testing. There can still be times when you want 
to support common read paths by using custom / hand rolled indexes.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/01/2013, at 6:05 AM, 
stephen.m.thomp...@wellsfargo.com 
wrote:

Hi folks -

Question about secondary indexes.  How are people doing date indexes?I have 
a date column in my tables in RDBMS that we use frequently, such as look at all 
records recorded in the last month.  What is the best practice for being able 
to do such a query?  It seems like there could be an advantage to adding a 
couple of columns like this:

{timestamp=2013/01/08 12:32:01 -0500}
{month=201301}
{day=08}

And then I could do secondary index on the month and day columns?  Would that 
be the best way to do something like this?  Is there any accepted "best 
practice" on this yet?

Thanks!
Steve


--
Join Barracuda Networks in the fight against hunger.
To learn how you can help in your community, please visit: 
http://on.fb.me/UAdL4f
  



--
Tyler Hobbs
DataStax


initial_token

2013-01-31 Thread Stephen.M.Thompson
Hi folks, I'm trying to get a multimode setup working, which seems like it 
should be really simple from the documentation.

ERROR 11:41:20,773 Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: For input string: 
"85070591730234615865843651857942052864"
at 
org.apache.cassandra.dht.Murmur3Partitioner$1.validate(Murmur3Partitioner.java:180)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:433)
at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:121)
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:178)
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:397)
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:440)
For input string: "85070591730234615865843651857942052864"
Fatal configuration error; unable to start server.  See log for stacktrace.

>From my cassandra.yaml ...

initial_token: 85070591730234615865843651857942052864

>From the wiki this certainly looks correct:
http://www.datastax.com/docs/1.1/initialize/cluster_init

I've tried this with a couple of values but seem to always get the same result 
... am I missing something?

Thanks,
Steve


Not enough replicas???

2013-02-01 Thread Stephen.M.Thompson
I need to offer my profound thanks to this community which has been so helpful 
in trying to figure this system out.

I've setup a simple ring with two nodes and I'm trying to insert data to them.  
I get failures 100% with this error:

me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough 
replicas present to handle consistency level.

I'm not doing anything fancy - this is just from setting up the cluster 
following the basic instructions from datastax for a simple one data center 
cluster.  My config is basically the default except for the changes they 
discuss (except that I have configured for my IP addresses... my two boxes are 
.126 and .127)

cluster_name: 'MyDemoCluster'
num_tokens: 256
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
 - seeds: "10.28.205.126"
listen_address: 10.28.205.126
rpc_address: 0.0.0.0
endpoint_snitch: RackInferringSnitch

Nodetool shows both nodes active in the ring, status = up, state = normal.

For the CF:

   ColumnFamily: SystemEvent
 Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
 Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
 GC grace seconds: 864000
 Compaction min/max thresholds: 4/32
 Read repair chance: 0.1
 DC Local Read repair chance: 0.0
 Replicate on write: true
 Caching: KEYS_ONLY
 Bloom Filter FP chance: default
 Built indexes: [SystemEvent.IdxName]
 Column Metadata:
   Column Name: eventTimeStamp
 Validation Class: org.apache.cassandra.db.marshal.DateType
   Column Name: name
 Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Index Name: IdxName
 Index Type: KEYS
 Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
 Compression Options:
   sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

Any ideas?


RE: Not enough replicas???

2013-02-04 Thread Stephen.M.Thompson
Hi Edward - thanks for responding.   The keyspace could not have been created 
more simply:



create keyspace KEYSPACE_NAME;



According to the help, this should have created a replication factor of 1:



Keyspace Attributes (all are optional):

- placement_strategy: Class used to determine how replicas

  are distributed among nodes. Defaults to NetworkTopologyStrategy with

  one datacenter defined with a replication factor of 1 ("[datacenter1:1]").



Steve



-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Friday, February 01, 2013 5:49 PM
To: user@cassandra.apache.org
Subject: Re: Not enough replicas???



Please include the information on how your keyspace was created. This may 
indicate you set the replication factor to 3, when you only have 1 node, or 
some similar condition.



On Fri, Feb 1, 2013 at 4:57 PM,  
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:

> I need to offer my profound thanks to this community which has been so

> helpful in trying to figure this system out.

>

>

>

> I've setup a simple ring with two nodes and I'm trying to insert data

> to them.  I get failures 100% with this error:

>

>

>

> me.prettyprint.hector.api.exceptions.HUnavailableException: : May not

> be enough replicas present to handle consistency level.

>

>

>

> I'm not doing anything fancy - this is just from setting up the

> cluster following the basic instructions from datastax for a simple

> one data center cluster.  My config is basically the default except

> for the changes they discuss (except that I have configured for my IP

> addresses... my two boxes are

> .126 and .127)

>

>

>

> cluster_name: 'MyDemoCluster'

>

> num_tokens: 256

>

> seed_provider:

>

>   - class_name: org.apache.cassandra.locator.SimpleSeedProvider

>

> parameters:

>

>  - seeds: "10.28.205.126"

>

> listen_address: 10.28.205.126

>

> rpc_address: 0.0.0.0

>

> endpoint_snitch: RackInferringSnitch

>

>

>

> Nodetool shows both nodes active in the ring, status = up, state = normal.

>

>

>

> For the CF:

>

>

>

>ColumnFamily: SystemEvent

>

>  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type

>

>  Default column value validator:

> org.apache.cassandra.db.marshal.UTF8Type

>

>  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type

>

>  GC grace seconds: 864000

>

>  Compaction min/max thresholds: 4/32

>

>  Read repair chance: 0.1

>

>  DC Local Read repair chance: 0.0

>

>  Replicate on write: true

>

>  Caching: KEYS_ONLY

>

>  Bloom Filter FP chance: default

>

>  Built indexes: [SystemEvent.IdxName]

>

>  Column Metadata:

>

>Column Name: eventTimeStamp

>

>  Validation Class: org.apache.cassandra.db.marshal.DateType

>

>Column Name: name

>

>  Validation Class: org.apache.cassandra.db.marshal.UTF8Type

>

>  Index Name: IdxName

>

>  Index Type: KEYS

>

>  Compaction Strategy:

> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

>

>  Compression Options:

>

>sstable_compression:

> org.apache.cassandra.io.compress.SnappyCompressor

>

>

>

> Any ideas?


RE: Not enough replicas???

2013-02-04 Thread Stephen.M.Thompson
Thanks Tyler ... so I created my keyspace to explicitly indicate the datacenter 
and replication, as follows:

create keyspace KEYSPACE_NAME
  with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
  and strategy_options={DC28:2};

And yet I still get the exact same error message:

me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough 
replicas present to handle consistency level.

It certainly is showing that it took my change:

[default@KEYSPACE_NAME] describe;
Keyspace: KEYSPACE_NAME:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [DC28:2]

Looking at the ring 

[root@Config3482VM1 apache-cassandra-1.2.0]# bin/nodetool -h localhost ring

Datacenter: 28
==
Replicas: 0

Address RackStatus State   LoadOwns
Token
   
9187343239835811839
10.28.205.126   205 Up Normal  95.89 KB0.00%   
-9187343239835811840
10.28.205.126   205 Up Normal  95.89 KB0.00%   
-9151314442816847872
10.28.205.126   205 Up Normal  95.89 KB0.00%   
-9115285645797883904

( HUGE SNIP )

10.28.205.127   205 Up Normal  84.63 KB0.00%   
9115285645797883903
10.28.205.127   205 Up Normal  84.63 KB0.00%   
9151314442816847871
10.28.205.127   205 Up Normal  84.63 KB0.00%   
9187343239835811839

So both boxes are showing up in the ring.

Thank you guys SO MUCH for helping me figure this stuff out.


From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Monday, February 04, 2013 11:17 AM
To: user@cassandra.apache.org
Subject: Re: Not enough replicas???

RackInferringSnitch determines each node's DC and rack by looking at the second 
and third octets in its IP address 
(http://www.datastax.com/docs/1.0/cluster_architecture/replication#rackinferringsnitch),
 so your nodes are in DC "28".

Your replication strategy says to put one replica in DC "datacenter1", but 
doesn't mention DC "28" at all, so you don't have any replicas for your 
keyspace.

On Mon, Feb 4, 2013 at 7:55 AM, 
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:

Hi Edward - thanks for responding.   The keyspace could not have been created 
more simply:



create keyspace KEYSPACE_NAME;



According to the help, this should have created a replication factor of 1:



Keyspace Attributes (all are optional):

- placement_strategy: Class used to determine how replicas

  are distributed among nodes. Defaults to NetworkTopologyStrategy with

  one datacenter defined with a replication factor of 1 ("[datacenter1:1]").



Steve



-Original Message-
From: Edward Capriolo 
[mailto:edlinuxg...@gmail.com]
Sent: Friday, February 01, 2013 5:49 PM
To: user@cassandra.apache.org
Subject: Re: Not enough replicas???



Please include the information on how your keyspace was created. This may 
indicate you set the replication factor to 3, when you only have 1 node, or 
some similar condition.



On Fri, Feb 1, 2013 at 4:57 PM,  
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:

> I need to offer my profound thanks to this community which has been so

> helpful in trying to figure this system out.

>

>

>

> I've setup a simple ring with two nodes and I'm trying to insert data

> to them.  I get failures 100% with this error:

>

>

>

> me.prettyprint.hector.api.exceptions.HUnavailableException: : May not

> be enough replicas present to handle consistency level.

>

>

>

> I'm not doing anything fancy - this is just from setting up the

> cluster following the basic instructions from datastax for a simple

> one data center cluster.  My config is basically the default except

> for the changes they discuss (except that I have configured for my IP

> addresses... my two boxes are

> .126 and .127)

>

>

>

> cluster_name: 'MyDemoCluster'

>

> num_tokens: 256

>

> seed_provider:

>

>   - class_name: org.apache.cassandra.locator.SimpleSeedProvider

>

> parameters:

>

>  - seeds: "10.28.205.126"

>

> listen_address: 10.28.205.126

>

> rpc_address: 0.0.0.0

>

> endpoint_snitch: RackInferringSnitch

>

>

>

> Nodetool shows both nodes active in the ring, status = up, state = normal.

>

>

>

> For the CF:

>

>

>

>ColumnFamily: SystemEvent

>

>  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type

>

>  Default column value validator:

> org.apache.cassandra.db.marshal.UTF8Type

>

>  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type

>

>  GC grace seconds: 864000

>

>  Compaction min/max thresholds: 4/32

>

>  Read repair chance: 0.1

>

>  DC Local Read repair chance: 0.0

>

>  Replicate on write: true

>

>

RE: Not enough replicas???

2013-02-04 Thread Stephen.M.Thompson
Sweet!  That worked!  THANK YOU!

Stephen Thompson
Wells Fargo Corporation
Internet Authentication & Fraud Prevention
704.427.3137 (W) | 704.807.3431 (C)

This message may contain confidential and/or privileged information, and is 
intended for the use of the addressee only. If you are not the addressee or 
authorized to receive this for the addressee, you must not use, copy, disclose, 
or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply 
e-mail and delete this message. Thank you for your cooperation.

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Monday, February 04, 2013 1:43 PM
To: user@cassandra.apache.org
Subject: Re: Not enough replicas???

Sorry, to be more precise, the name of the datacenter is just the string "28", 
not "DC28".

On Mon, Feb 4, 2013 at 12:07 PM, 
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:
Thanks Tyler ... so I created my keyspace to explicitly indicate the datacenter 
and replication, as follows:

create keyspace KEYSPACE_NAME
  with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy'
  and strategy_options={DC28:2};

And yet I still get the exact same error message:

me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough 
replicas present to handle consistency level.

It certainly is showing that it took my change:

[default@KEYSPACE_NAME] describe;
Keyspace: KEYSPACE_NAME:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [DC28:2]

Looking at the ring 

[root@Config3482VM1 apache-cassandra-1.2.0]# bin/nodetool -h localhost ring

Datacenter: 28
==
Replicas: 0

Address RackStatus State   LoadOwns
Token
   
9187343239835811839
10.28.205.126   205 Up Normal  95.89 KB0.00%   
-9187343239835811840
10.28.205.126   205 Up Normal  95.89 KB0.00%   
-9151314442816847872
10.28.205.126   205 Up Normal  95.89 KB0.00%   
-9115285645797883904

( HUGE SNIP )

10.28.205.127   205 Up Normal  84.63 KB0.00%   
9115285645797883903
10.28.205.127   205 Up Normal  84.63 KB0.00%   
9151314442816847871
10.28.205.127   205 Up Normal  84.63 KB0.00%   
9187343239835811839

So both boxes are showing up in the ring.

Thank you guys SO MUCH for helping me figure this stuff out.


From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Monday, February 04, 2013 11:17 AM

To: user@cassandra.apache.org
Subject: Re: Not enough replicas???

RackInferringSnitch determines each node's DC and rack by looking at the second 
and third octets in its IP address 
(http://www.datastax.com/docs/1.0/cluster_architecture/replication#rackinferringsnitch),
 so your nodes are in DC "28".

Your replication strategy says to put one replica in DC "datacenter1", but 
doesn't mention DC "28" at all, so you don't have any replicas for your 
keyspace.

On Mon, Feb 4, 2013 at 7:55 AM, 
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:

Hi Edward - thanks for responding.   The keyspace could not have been created 
more simply:



create keyspace KEYSPACE_NAME;



According to the help, this should have created a replication factor of 1:



Keyspace Attributes (all are optional):

- placement_strategy: Class used to determine how replicas

  are distributed among nodes. Defaults to NetworkTopologyStrategy with

  one datacenter defined with a replication factor of 1 ("[datacenter1:1]").



Steve



-Original Message-
From: Edward Capriolo 
[mailto:edlinuxg...@gmail.com]
Sent: Friday, February 01, 2013 5:49 PM
To: user@cassandra.apache.org
Subject: Re: Not enough replicas???



Please include the information on how your keyspace was created. This may 
indicate you set the replication factor to 3, when you only have 1 node, or 
some similar condition.



On Fri, Feb 1, 2013 at 4:57 PM,  
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:

> I need to offer my profound thanks to this community which has been so

> helpful in trying to figure this system out.

>

>

>

> I've setup a simple ring with two nodes and I'm trying to insert data

> to them.  I get failures 100% with this error:

>

>

>

> me.prettyprint.hector.api.exceptions.HUnavailableException: : May not

> be enough replicas present to handle consistency level.

>

>

>

> I'm not doing anything fancy - this is just from setting up the

> cluster following the basic instructions from datastax for a simple

> one data center cluster.  My config is basically the default except

> for the changes they discus

unbalanced ring

2013-02-05 Thread Stephen.M.Thompson
So I have three nodes in a ring in one data center.  My configuration has 
num_tokens: 256 set and initial_token commented out.  When I look at the ring, 
it shows me all of the token ranges of course, and basically identical data for 
each range on each node.  Here is the Cliff's Notes version of what I see:

[root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring

Datacenter: 28
==
Replicas: 1

Address RackStatus State   LoadOwns
Token
   
9187343239835811839
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026347817059713363
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026276684526453414
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026205551993193465
  (etc)
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9187343239835811840
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9151314442816847872
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9115285645797883904
  (etc)
10.28.205.127   205 Up Normal  69.13 KB66.30%  
-9223372036854775808
10.28.205.127   205 Up Normal  69.13 KB66.30%  
36028797018963967
10.28.205.127   205 Up Normal  69.13 KB66.30%  
72057594037927935
  (etc)

So at this point I have a number of questions.   The biggest question is of 
Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has only 
0.69 GB?  These boxes are all comparable and all configured identically.

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

I'm sorry to ask so many questions - I'm having a hard time finding 
documentation that explains this stuff.

Stephen


RE: unbalanced ring

2013-02-06 Thread Stephen.M.Thompson
Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and compact on 
each of the nodes.

[root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
Datacenter: 28
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
UN  10.28.205.125 1.7 GB 255 33.7% 
3daab184-61f0-49a0-b076-863f10bc8c6c  205
UN  10.28.205.126 591.44 MB  256 99.9% 
55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
UN  10.28.205.127 112.28 MB  257 66.4% 
d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

So this is a little better.  At last node 3 has some content, but they are 
still far from balanced.  If I am understand this correctly, this is the 
distribution I would expect if the tokens were set at 15/5/1 rather than equal. 
 As configured, I would expect roughly equal amounts of data on each node. Is 
that right?  Do you have any suggestions for what I can look at to get there?

I have about 11M rows of data in this keyspace and none of them are 
exceptionally long ... it's data pulled from Oracle and didn't include any 
BLOB, etc.

Stephen Thompson
Wells Fargo Corporation
Internet Authentication & Fraud Prevention
704.427.3137 (W) | 704.807.3431 (C)

This message may contain confidential and/or privileged information, and is 
intended for the use of the addressee only. If you are not the addressee or 
authorized to receive this for the addressee, you must not use, copy, disclose, 
or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply 
e-mail and delete this message. Thank you for your cooperation.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, February 05, 2013 3:41 PM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

Use nodetool status with vnodes 
http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes

The different load can be caused by rack affinity, are all the nodes in the 
same rack ? Another simple check is have you created some very big rows?
Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/02/2013, at 8:40 AM, 
stephen.m.thomp...@wellsfargo.com 
wrote:


So I have three nodes in a ring in one data center.  My configuration has 
num_tokens: 256 set andinitial_token commented out.  When I look at the ring, 
it shows me all of the token ranges of course, and basically identical data for 
each range on each node.  Here is the Cliff's Notes version of what I see:

[root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring

Datacenter: 28
==
Replicas: 1

Address RackStatus State   LoadOwns
Token
   
9187343239835811839
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026347817059713363
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026276684526453414
10.28.205.125   205 Up Normal  2.85 GB 33.69%  
-3026205551993193465
  (etc)
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9187343239835811840
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9151314442816847872
10.28.205.126   205 Up Normal  1.15 GB 100.00% 
-9115285645797883904
  (etc)
10.28.205.127   205 Up Normal  69.13 KB66.30%  
-9223372036854775808
10.28.205.127   205 Up Normal  69.13 KB66.30%  
36028797018963967
10.28.205.127   205 Up Normal  69.13 KB66.30%  
72057594037927935
  (etc)

So at this point I have a number of questions.   The biggest question is of 
Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has only 
0.69 GB?  These boxes are all comparable and all configured identically.

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

I'm sorry to ask so many questions - I'm having a hard time finding 
documentation that explains this stuff.

Stephen



RE: unbalanced ring

2013-02-11 Thread Stephen.M.Thompson
Aaron, thanks for your feedback.

.125
num_tokens: 256
# initial_token:

.126
num_tokens: 256
#initial_token:

.127
num_tokens: 256
# initial_token:

This all looks correct.  So when you say to do this with a "clean" setup, what 
are you asking me to do?  Is it enough to blow away /var/lib/cassandra and 
reload the data?  Also destroy my Cassandra install (which is just un-tar) and 
reinstall from nothing?

Stephen Thompson
Wells Fargo Corporation
Internet Authentication & Fraud Prevention
704.427.3137 (W) | 704.807.3431 (C)

This message may contain confidential and/or privileged information, and is 
intended for the use of the addressee only. If you are not the addressee or 
authorized to receive this for the addressee, you must not use, copy, disclose, 
or take any action based on this message or any information herein. If you have 
received this message in error, please advise the sender immediately by reply 
e-mail and delete this message. Thank you for your cooperation.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, February 11, 2013 12:51 PM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

The tokens are not right, not right at all. Some are too short and some are too 
tall.

More technically they do not appear to be randomly arranged. The tokens for the 
.125 node all start with -3, the 126 node only has negative tokens and the 127 
node mostly has positive tokens.

Check that on each node the initial_token yaml setting is commented out, and 
that num_tokens is set to 256.

If you can reproduce this fault with a clean setup please raise a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/02/2013, at 10:36 AM, 
stephen.m.thomp...@wellsfargo.com 
wrote:


I found when I tried to do queries after sending this that although it shows a 
ton of data, it would no longer return ANYTHING for any query ... always 0 
rows.  So something was severely hosed.  I blew away the data and reloaded from 
database ... the data set is a little smaller than before.  It shows up 
somewhat more balanced, although I'm still curious why the third node is so 
much smaller than the first two.

[root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
Datacenter: 28
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
UN  10.28.205.125 994.89 MB  255 33.7% 
3daab184-61f0-49a0-b076-863f10bc8c6c  205
UN  10.28.205.126 966.17 MB  256 99.9% 
55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
UN  10.28.205.127 699.79 MB  257 66.4% 
d240c91f-4901-40ad-bd66-d374a0ccf0b9  205
[root@Config3482VM1 apache-cassandra-1.2.1]#

And yes, that is the entire content of the output from the status call, 
unedited.   I have attached the output from nodetool ring.  To answer a couple 
of the questions from below from Eric:

* One data center (28)?  One rack (205)? Three nodes?
Yes, that's right.  We're just doing a proof of concept at the 
moment so this is three VMWare servers.

* How many keyspaces, and what are the replication strategies?
There is one keyspace, and it has only one CF at this point.

[default@KEYSPACE_NAME] describe;
Keyspace: KEYSPACE_NAME:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [28:2]

* TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your 
allocation is suspicious.

I'm not sure what you mean by this.

Steve

-Original Message-
From: Eric Evans [mailto:eev...@acunu.com]
Sent: Thursday, February 07, 2013 9:56 AM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

On Wed, Feb 6, 2013 at 2:02 PM,  
mailto:stephen.m.thomp...@wellsfargo.com>> 
wrote:
> Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and
> compact on each of the nodes.
>
>
>
> [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
>
> Datacenter: 28
>
> ==
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address   Load   Tokens  Owns (effective)  Host ID
> Rack
>
> UN  10.28.205.125 1.7 GB 255 33.7%
> 3daab184-61f0-49a0-b076-863f10bc8c6c  205
>
> UN  10.28.205.126 591.44 MB  256 99.9%
> 55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
>
> UN  10.28.205.127 112.28 MB  257 66.4%
> d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

Sorry, I have to ask, Is this the complete output?  Have you perhaps sanitized 
it in some way?

It seems like there is some piece of missing context here.  Can you tell us:

* Is this a cluster that was upgraded to virtual nodes (that would include a 
1.2.x cluster initialized wi