Re: Multi-column range scans

2014-07-14 Thread DuyHai Doan
Hello Mathew

 Since Cassandra 2.0.6 it is possible to query over composites:
https://issues.apache.org/jira/browse/CASSANDRA-4851

For your example:

select * from skill_count where skill='Complaints' and
(interval_id,skill_level) >= (140235930,5) and interval_id <
140235990;


On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen 
wrote:

> Hi,
>
> We have a roll-up table that as follows.
>
> CREATE TABLE SKILL_COUNT (
>   skill text,
>   interval_id bigint,
>   skill_level int,
>   skill_count int,
>   PRIMARY KEY (skill, interval_id, skill_level));
>
> Essentially,
>   skill = a names skill i.e. "Complaints"
>   interval_id = a rounded epoch time (15 minute intervals)
>   skill_level = a number/rating from 1-10
>   skill_count = the number of people with the specified skill, with the
> specified skill level, logged in at the interval_id
>
> We'd like to run the following query against it
>
> select * from skill_count where skill='Complaints' and interval_id >=
> 140235930 and interval_id < 140235990 and skill_level >= 5;
>
> to get a count of people with the relevant skill and level at the
> appropriate time.  However I am getting the following message.
>
> Bad Request: PRIMARY KEY part skill_level cannot be restricted (preceding
> part interval_id is either not restricted or by a non-EQ relation)
>
> Looking at how the data is stored ...
>
> ---
> RowKey: Complaints
> => (name=140235930:2:, value=, timestamp=1405308260403000)
> => (name=140235930:2:skill_count, value=000a,
> timestamp=1405308260403000)
> => (name=140235930:5:, value=, timestamp=1405308260403001)
> => (name=140235930:5:skill_count, value=0014,
> timestamp=1405308260403001)
> => (name=140235930:8:, value=, timestamp=1405308260419000)
> => (name=140235930:8:skill_count, value=001e,
> timestamp=1405308260419000)
> => (name=140235930:10:, value=, timestamp=1405308260419001)
> => (name=140235930:10:skill_count, value=0001,
> timestamp=1405308260419001)
>
> Should cassandra be able to allow for an extra level of filtering ? or is
> this something that should be performed from within the application.
>
> We have a solution working in Oracle, but would like to store this data in
> Cassandra, as all the other data that this solution relies on already sits
> within Cassandra.
>
> Appreciate any guidance on this matter.
>
> Matt
>


[RELEASE] Achilles 3.0.4

2014-07-14 Thread DuyHai Doan
Hello all

 We are happy to announce the release of Achilles 3.0.4. Among the biggest
changes:

 - support for static columns: http://goo.gl/o7D5yo
 - dynamic statements logging & tracing at runtime: http://goo.gl/w4jlqZ
 - SchemaBuilder, the mirror of QueryBuilder for creating schema
programmatically: http://goo.gl/DspJQq

 Link to the changelog: http://goo.gl/tKqpFT

  Regards

 Duy Hai DOAN


Re: keyspace with hundreds of columnfamilies

2014-07-14 Thread Ilya Sviridov
Tommaso, looking at your description of the architecture the idea came up.

You can perform sharding on cassandra client and write to different
cassandra clusters to keep the number of column families reasonable.

With best regards,
Ilya


On Thu, Jul 3, 2014 at 10:55 PM, tommaso barbugli 
wrote:

> thank you for the replies; I am rethinking the schema design, one possible
> solution is to "implode" one dimension and get N times less CFs.
> With this approach I would come up with (cql) tables with up to 100
> columns; would that be a problem?
>
> Thank You,
> Tommaso
>
>
> 2014-07-02 23:43 GMT+02:00 Jack Krupansky :
>
>   The official answer, engraved in stone tablets, and carried down from
>> the mountain: “Although having more than dozens or hundreds of tables
>> defined is almost certainly a Bad Idea (just as it is a design smell in a
>> relational database), it's relatively straightforward to allow disabling
>> the SlabAllocator.” Emphasis on “almost certainly a Bad Idea.”
>>
>> See:
>> https://issues.apache.org/jira/browse/CASSANDRA-5935
>> “Allow disabling slab allocation”
>>
>> IOW, this is considered an anti-pattern, but...
>>
>> -- Jack Krupansky
>>
>>  *From:* tommaso barbugli 
>> *Sent:* Wednesday, July 2, 2014 2:16 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: keyspace with hundreds of columnfamilies
>>
>>  Hi,
>> thank you for you replies on this; regarding the arena memory is this a
>> fixed memory allocation or is some sort of in memory caching? I ask because
>> I think that a substantial portion of the column families created will not
>> be queried that frequently (and some will become inactive and stay like
>> that really long time)
>>
>> Thank you,
>> Tommaso
>>
>>
>> 2014-07-02 18:35 GMT+02:00 Romain HARDOUIN :
>>
>>> Arena allocation is an improvement feature, not a limitation.
>>> It was introduced in Cassandra 1.0 in order to lower memory
>>> fragmentation (and therefore promotion failure).
>>> AFAIK It's not intended to be tweaked so it might not be a good idea to
>>> change it.
>>>
>>> Best,
>>> Romain
>>>
>>> tommaso barbugli  a écrit sur 02/07/2014 17:40:18 :
>>>
>>> > De : tommaso barbugli 
>>> > A : user@cassandra.apache.org,
>>> > Date : 02/07/2014 17:40
>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>>  >
>>> > 1MB per column family sounds pretty bad to me; is this something I
>>> > can tweak/workaround somehow?
>>> >
>>> > Thanks
>>> > Tommaso
>>> >
>>>
>>> > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN >> >:
>>> > The trap is that each CF will consume 1 MB of memory due to arena
>>> allocation.
>>> > This might seem harmless but if you plan thousands of CF it means
>>> > thousands of mega bytes...
>>> > Up to 1,000 CF I think it could be doable, but not 10,000.
>>> >
>>> > Best,
>>> >
>>> > Romain
>>> >
>>> >
>>> > tommaso barbugli  a écrit sur 02/07/2014
>>> 10:13:41 :
>>> >
>>> > > De : tommaso barbugli 
>>> > > A : user@cassandra.apache.org,
>>> > > Date : 02/07/2014 10:14
>>> > > Objet : keyspace with hundreds of columnfamilies
>>> > >
>>> > > Hi,
>>> > > Are there any known issues, shortcomings about organising data in
>>> > > hundreds of column families?
>>> > > At this present I am running with 300 column families but I expect
>>> > > that to get to a couple of thousands.
>>> > > Is this something discouraged / unsupported (I am using Cassandra
>>> 2.0).
>>> > >
>>> > > Thanks
>>> > > Tommaso
>>>
>>
>>
>
>


Re: Multi-column range scans

2014-07-14 Thread DuyHai Doan
Sorry, I've just checked, the correct query should be:

select * from skill_count where skill='Complaints' and
(interval_id,skill_level) >= (140235930,5) and
(interval_id,skill_level) < (140235990,11)


On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan  wrote:

> Hello Mathew
>
>  Since Cassandra 2.0.6 it is possible to query over composites:
> https://issues.apache.org/jira/browse/CASSANDRA-4851
>
> For your example:
>
> select * from skill_count where skill='Complaints' and
> (interval_id,skill_level) >= (140235930,5) and interval_id <
> 140235990;
>
>
> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen 
> wrote:
>
>> Hi,
>>
>> We have a roll-up table that as follows.
>>
>> CREATE TABLE SKILL_COUNT (
>>   skill text,
>>   interval_id bigint,
>>   skill_level int,
>>   skill_count int,
>>   PRIMARY KEY (skill, interval_id, skill_level));
>>
>> Essentially,
>>   skill = a names skill i.e. "Complaints"
>>   interval_id = a rounded epoch time (15 minute intervals)
>>   skill_level = a number/rating from 1-10
>>   skill_count = the number of people with the specified skill, with the
>> specified skill level, logged in at the interval_id
>>
>> We'd like to run the following query against it
>>
>> select * from skill_count where skill='Complaints' and interval_id >=
>> 140235930 and interval_id < 140235990 and skill_level >= 5;
>>
>> to get a count of people with the relevant skill and level at the
>> appropriate time.  However I am getting the following message.
>>
>> Bad Request: PRIMARY KEY part skill_level cannot be restricted (preceding
>> part interval_id is either not restricted or by a non-EQ relation)
>>
>> Looking at how the data is stored ...
>>
>> ---
>> RowKey: Complaints
>> => (name=140235930:2:, value=, timestamp=1405308260403000)
>> => (name=140235930:2:skill_count, value=000a,
>> timestamp=1405308260403000)
>> => (name=140235930:5:, value=, timestamp=1405308260403001)
>> => (name=140235930:5:skill_count, value=0014,
>> timestamp=1405308260403001)
>> => (name=140235930:8:, value=, timestamp=1405308260419000)
>> => (name=140235930:8:skill_count, value=001e,
>> timestamp=1405308260419000)
>> => (name=140235930:10:, value=, timestamp=1405308260419001)
>> => (name=140235930:10:skill_count, value=0001,
>> timestamp=1405308260419001)
>>
>> Should cassandra be able to allow for an extra level of filtering ? or is
>> this something that should be performed from within the application.
>>
>> We have a solution working in Oracle, but would like to store this data
>> in Cassandra, as all the other data that this solution relies on already
>> sits within Cassandra.
>>
>> Appreciate any guidance on this matter.
>>
>> Matt
>>
>
>


Re: Multi-column range scans

2014-07-14 Thread DuyHai Doan
or :

select * from skill_count where skill='Complaints'
and (interval_id,skill_level) >= (140235930,5)
and (interval_id) < (140235990)

Strange enough, when starting using tuple notation you'll need to stick to
it even if there is only one element in the tuple


On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan  wrote:

> Sorry, I've just checked, the correct query should be:
>
> select * from skill_count where skill='Complaints' and
> (interval_id,skill_level) >= (140235930,5) and
> (interval_id,skill_level) < (140235990,11)
>
>
> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan  wrote:
>
>> Hello Mathew
>>
>>  Since Cassandra 2.0.6 it is possible to query over composites:
>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>
>> For your example:
>>
>> select * from skill_count where skill='Complaints' and
>> (interval_id,skill_level) >= (140235930,5) and interval_id <
>> 140235990;
>>
>>
>> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen > > wrote:
>>
>>> Hi,
>>>
>>> We have a roll-up table that as follows.
>>>
>>> CREATE TABLE SKILL_COUNT (
>>>   skill text,
>>>   interval_id bigint,
>>>   skill_level int,
>>>   skill_count int,
>>>   PRIMARY KEY (skill, interval_id, skill_level));
>>>
>>> Essentially,
>>>   skill = a names skill i.e. "Complaints"
>>>   interval_id = a rounded epoch time (15 minute intervals)
>>>   skill_level = a number/rating from 1-10
>>>   skill_count = the number of people with the specified skill, with the
>>> specified skill level, logged in at the interval_id
>>>
>>> We'd like to run the following query against it
>>>
>>> select * from skill_count where skill='Complaints' and interval_id >=
>>> 140235930 and interval_id < 140235990 and skill_level >= 5;
>>>
>>> to get a count of people with the relevant skill and level at the
>>> appropriate time.  However I am getting the following message.
>>>
>>> Bad Request: PRIMARY KEY part skill_level cannot be restricted
>>> (preceding part interval_id is either not restricted or by a non-EQ
>>> relation)
>>>
>>> Looking at how the data is stored ...
>>>
>>> ---
>>> RowKey: Complaints
>>> => (name=140235930:2:, value=, timestamp=1405308260403000)
>>> => (name=140235930:2:skill_count, value=000a,
>>> timestamp=1405308260403000)
>>> => (name=140235930:5:, value=, timestamp=1405308260403001)
>>> => (name=140235930:5:skill_count, value=0014,
>>> timestamp=1405308260403001)
>>> => (name=140235930:8:, value=, timestamp=1405308260419000)
>>> => (name=140235930:8:skill_count, value=001e,
>>> timestamp=1405308260419000)
>>> => (name=140235930:10:, value=, timestamp=1405308260419001)
>>> => (name=140235930:10:skill_count, value=0001,
>>> timestamp=1405308260419001)
>>>
>>> Should cassandra be able to allow for an extra level of filtering ? or
>>> is this something that should be performed from within the application.
>>>
>>> We have a solution working in Oracle, but would like to store this data
>>> in Cassandra, as all the other data that this solution relies on already
>>> sits within Cassandra.
>>>
>>> Appreciate any guidance on this matter.
>>>
>>> Matt
>>>
>>
>>
>


Re: Multi-column range scans

2014-07-14 Thread Ken Hancock
I don't think your query is doing what he wants.  Your query will correctly
set the starting point, but will also return larger interval_id's but with
lower skill_levels:

cqlsh:test> select * from skill_count where skill='Complaints' and
(interval_id, skill_level) >= (140235930, 5);

 skill  | interval_id   | skill_level | skill_count
+---+-+-
 Complaints | 140235930 |   5 |  20
 Complaints | 140235930 |   8 |  30
 Complaints | 140235930 |  10 |   1
 Complaints | 140235940 |   2 |  10
 Complaints | 140235940 |   8 |  30

(5 rows)

cqlsh:test> select * from skill_count where skill='Complaints' and
(interval_id, skill_level) >= (140235930, 5) and (interval_id) <
(140235990);

 skill  | interval_id   | skill_level | skill_count
+---+-+-
 Complaints | 140235930 |   5 |  20  <- desired
 Complaints | 140235930 |   8 |  30  <- desired
 Complaints | 140235930 |  10 |   1  <- desired
 Complaints | 140235940 |   2 |  10  <- SKIP
 Complaints | 140235940 |   8 |  30  <- desired

The query results in a discontinuous range slice so isn't supported --
Essentially, the client will have to read the entire range and perform
client-side filtering.  Whether this is efficient depends on the
cardinality of skill_level.

I tried playing with the "allow filtering" cql clause, but it would appear
from the documentation it's very restrictive...





On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan  wrote:

> or :
>
>
> select * from skill_count where skill='Complaints'
> and (interval_id,skill_level) >= (140235930,5)
> and (interval_id) < (140235990)
>
> Strange enough, when starting using tuple notation you'll need to stick to
> it even if there is only one element in the tuple
>
>
> On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan  wrote:
>
>> Sorry, I've just checked, the correct query should be:
>>
>> select * from skill_count where skill='Complaints' and
>> (interval_id,skill_level) >= (140235930,5) and
>> (interval_id,skill_level) < (140235990,11)
>>
>>
>> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan 
>> wrote:
>>
>>> Hello Mathew
>>>
>>>  Since Cassandra 2.0.6 it is possible to query over composites:
>>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>>
>>> For your example:
>>>
>>> select * from skill_count where skill='Complaints' and
>>> (interval_id,skill_level) >= (140235930,5) and interval_id <
>>> 140235990;
>>>
>>>
>>> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <
>>> matthew.j.al...@gmail.com> wrote:
>>>
 Hi,

 We have a roll-up table that as follows.

 CREATE TABLE SKILL_COUNT (
   skill text,
   interval_id bigint,
   skill_level int,
   skill_count int,
   PRIMARY KEY (skill, interval_id, skill_level));

 Essentially,
   skill = a names skill i.e. "Complaints"
   interval_id = a rounded epoch time (15 minute intervals)
   skill_level = a number/rating from 1-10
   skill_count = the number of people with the specified skill, with the
 specified skill level, logged in at the interval_id

 We'd like to run the following query against it

 select * from skill_count where skill='Complaints' and interval_id >=
 140235930 and interval_id < 140235990 and skill_level >= 5;

 to get a count of people with the relevant skill and level at the
 appropriate time.  However I am getting the following message.

 Bad Request: PRIMARY KEY part skill_level cannot be restricted
 (preceding part interval_id is either not restricted or by a non-EQ
 relation)

 Looking at how the data is stored ...

 ---
 RowKey: Complaints
 => (name=140235930:2:, value=, timestamp=1405308260403000)
 => (name=140235930:2:skill_count, value=000a,
 timestamp=1405308260403000)
 => (name=140235930:5:, value=, timestamp=1405308260403001)
 => (name=140235930:5:skill_count, value=0014,
 timestamp=1405308260403001)
 => (name=140235930:8:, value=, timestamp=1405308260419000)
 => (name=140235930:8:skill_count, value=001e,
 timestamp=1405308260419000)
 => (name=140235930:10:, value=, timestamp=1405308260419001)
 => (name=140235930:10:skill_count, value=0001,
 timestamp=1405308260419001)

 Should cassandra be able to allow for an extra level of filtering ? or
 is this something that should be performed from within the application.

 We have a solution working in Oracle, but would like to store this data
 in Cassandra, as all the other data that this solution relies on already
 sits within Cassandra.

 Appreciate any guidance on this matter.
>

Re: Multi-column range scans

2014-07-14 Thread DuyHai Doan
Exact Ken, I get bitten again by the semantics of composite tuples.

 This kind of query won't be possible until something like wide row end
slice predicate is available (
https://issues.apache.org/jira/browse/CASSANDRA-6167), if it will one day




On Mon, Jul 14, 2014 at 5:02 PM, Ken Hancock 
wrote:

> I don't think your query is doing what he wants.  Your query will
> correctly set the starting point, but will also return larger interval_id's
> but with lower skill_levels:
>
> cqlsh:test> select * from skill_count where skill='Complaints' and
> (interval_id, skill_level) >= (140235930, 5);
>
>  skill  | interval_id   | skill_level | skill_count
> +---+-+-
>  Complaints | 140235930 |   5 |  20
>  Complaints | 140235930 |   8 |  30
>  Complaints | 140235930 |  10 |   1
>  Complaints | 140235940 |   2 |  10
>  Complaints | 140235940 |   8 |  30
>
> (5 rows)
>
> cqlsh:test> select * from skill_count where skill='Complaints' and
> (interval_id, skill_level) >= (140235930, 5) and (interval_id) <
> (140235990);
>
>  skill  | interval_id   | skill_level | skill_count
> +---+-+-
>  Complaints | 140235930 |   5 |  20  <- desired
>  Complaints | 140235930 |   8 |  30  <- desired
>  Complaints | 140235930 |  10 |   1  <- desired
>  Complaints | 140235940 |   2 |  10  <- SKIP
>  Complaints | 140235940 |   8 |  30  <- desired
>
> The query results in a discontinuous range slice so isn't supported --
> Essentially, the client will have to read the entire range and perform
> client-side filtering.  Whether this is efficient depends on the
> cardinality of skill_level.
>
> I tried playing with the "allow filtering" cql clause, but it would appear
> from the documentation it's very restrictive...
>
>
>
>
>
> On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan  wrote:
>
>> or :
>>
>>
>> select * from skill_count where skill='Complaints'
>> and (interval_id,skill_level) >= (140235930,5)
>> and (interval_id) < (140235990)
>>
>> Strange enough, when starting using tuple notation you'll need to stick
>> to it even if there is only one element in the tuple
>>
>>
>> On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan 
>> wrote:
>>
>>> Sorry, I've just checked, the correct query should be:
>>>
>>> select * from skill_count where skill='Complaints' and
>>> (interval_id,skill_level) >= (140235930,5) and
>>> (interval_id,skill_level) < (140235990,11)
>>>
>>>
>>> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan 
>>> wrote:
>>>
 Hello Mathew

  Since Cassandra 2.0.6 it is possible to query over composites:
 https://issues.apache.org/jira/browse/CASSANDRA-4851

 For your example:

 select * from skill_count where skill='Complaints' and
 (interval_id,skill_level) >= (140235930,5) and interval_id <
 140235990;


 On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <
 matthew.j.al...@gmail.com> wrote:

> Hi,
>
> We have a roll-up table that as follows.
>
> CREATE TABLE SKILL_COUNT (
>   skill text,
>   interval_id bigint,
>   skill_level int,
>   skill_count int,
>   PRIMARY KEY (skill, interval_id, skill_level));
>
> Essentially,
>   skill = a names skill i.e. "Complaints"
>   interval_id = a rounded epoch time (15 minute intervals)
>   skill_level = a number/rating from 1-10
>   skill_count = the number of people with the specified skill, with
> the specified skill level, logged in at the interval_id
>
> We'd like to run the following query against it
>
> select * from skill_count where skill='Complaints' and interval_id >=
> 140235930 and interval_id < 140235990 and skill_level >= 5;
>
> to get a count of people with the relevant skill and level at the
> appropriate time.  However I am getting the following message.
>
> Bad Request: PRIMARY KEY part skill_level cannot be restricted
> (preceding part interval_id is either not restricted or by a non-EQ
> relation)
>
> Looking at how the data is stored ...
>
> ---
> RowKey: Complaints
> => (name=140235930:2:, value=, timestamp=1405308260403000)
> => (name=140235930:2:skill_count, value=000a,
> timestamp=1405308260403000)
> => (name=140235930:5:, value=, timestamp=1405308260403001)
> => (name=140235930:5:skill_count, value=0014,
> timestamp=1405308260403001)
> => (name=140235930:8:, value=, timestamp=1405308260419000)
> => (name=140235930:8:skill_count, value=001e,
> timestamp=1405308260419000)
> => (name=140235930:10:, value=, timestamp=1405308260419001)
> => (name=140235930

Upgrading from 1.1.9 to 1.2.18

2014-07-14 Thread Denning, Michael
Hello All,

I'm trying to upgrade from a 3 node 1.1.9 cluster to a 6 node 1.2.18 cluster on 
ubuntu. Can sstableloader be used to stream from the existing cluster to the 
new cluster? If so, what that the suggested method? I keep getting the 
following when trying this:

partitioner org.apache.cassandra.dht.RandomPartitioner does not match system 
partitioner org.apache.cassandra.dht.Murmur3Partitioner. Note that the default 
partitioner starting with Cassandra 1.2 is Murmur3Partitioner, so you will need 
to edit that to match your old partitioner if upgrading.

It would appear that 1.1.9 doesn't have Murmur3Partitioner though, so I changed 
the partitioner on the new cluster to RandomPartitioner. Even with that, I get 
the following error:

CLASSPATH=/etc/cassandra/conf/cassandra.yaml:/root/lib_cass15/apache-cassandra-1.2.18.jar:/root/lib_cass15/guava-13.0.1.jar:/etc/cassandra/conf:/usr/share/java/jna.jar:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/apache-cassandra-1.1.9.jar:/usr/share/cassandra/lib/apache-cassandra-clientutil-1.1.9.jar:/usr/share/cassandra/lib/apache-cassandra-thrift-1.1.9.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/stress.jar
 Could not retrieve endpoint ranges: java.lang.RuntimeException: Could not 
retrieve endpoint ranges: at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:233) 
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:119) 
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:67) Caused by: 
org.apache.thrift.transport.TTransportException at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
 at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at 
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1155)
 at 
org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1142) 
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:212) 
... 2 more

Is there a way to get sstableloader to work? If not, can someone point me to 
documentation explaining other ways to migrate the data/keyspaces? I haven't 
been able to find any detailed docs.

Thank you



If you received this message and have reason to believe the sender did not 
intend to direct it to you, please notify the sender immediately by e-mail and 
delete the message from your system. This message (including any attachments) 
may contain confidential and/or proprietary information that should be read 
only by certain individuals. As a result, any unauthorized disclosure, copying, 
or distribution of this e-mail and the information contained herein is strictly 
prohibited and may constitute a violation of law. If you have any questions 
about this e-mail please notify the sender immediately.


Re: UnavailableException

2014-07-14 Thread Ruchir Jha
Mark,

Here you go:

*NodeTool status:*

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  Owns   Host ID
  Rack
UN  10.10.20.15  1.62 TB256 8.1%
01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
UN  10.10.20.19  1.66 TB256 8.3%
30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
UN  10.10.20.35  1.62 TB256 9.0%
17cb8772-2444-46ff-8525-33746514727d  rack1
UN  10.10.20.31  1.64 TB256 8.3%
1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
UN  10.10.20.52  1.59 TB256 9.1%
6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
UN  10.10.20.27  1.66 TB256 7.7%
76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
UN  10.10.20.22  1.66 TB256 8.9%
46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
UN  10.10.20.39  1.68 TB256 8.0%
b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
UN  10.10.20.45  1.49 TB256 7.7%
8d6bce33-8179-4660-8443-2cf822074ca4  rack1
UN  10.10.20.47  1.64 TB256 7.9%
bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
UN  10.10.20.62  1.59 TB256 8.2%
84b47313-da75-4519-94f3-3951d554a3e5  rack1
UN  10.10.20.51  1.66 TB256 8.9%
0343cd58-3686-465f-8280-56fb72d161e2  rack1


*Astyanax Connection Settings:*

seeds   :12
maxConns   :16
maxConnsPerHost:16
connectTimeout :2000
socketTimeout  :6
maxTimeoutCount:16
maxBlockedThreadsPerHost:16
maxOperationsPerConnection:16
DiscoveryType: RING_DESCRIBE
ConnectionPoolType: TOKEN_AWARE
DefaultReadConsistencyLevel: CL_QUORUM
DefaultWriteConsistencyLevel: CL_QUORUM



On Fri, Jul 11, 2014 at 5:04 PM, Mark Reddy  wrote:

> Can you post the output of nodetool status and your Astyanax connection
> settings?
>
>
> On Fri, Jul 11, 2014 at 9:06 PM, Ruchir Jha  wrote:
>
>> This is how we create our keyspace. We just ran this command once through
>> a cqlsh session on one of the nodes, so don't quite understand what you
>> mean by "check that your DC names match up"
>>
>> CREATE KEYSPACE prod WITH replication = {
>>   'class': 'NetworkTopologyStrategy',
>>   'datacenter1': '3'
>> };
>>
>>
>>
>> On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink 
>> wrote:
>>
>>> What replication strategy are you using? if using NetworkTopolgyStrategy
>>> double check that your DC names match up (case sensitive)
>>>
>>> Chris
>>>
>>> On Jul 11, 2014, at 9:38 AM, Ruchir Jha  wrote:
>>>
>>> Here's the complete stack trace:
>>>
>>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
>>> TokenRangeOfflineException:
>>> [host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, latency=22784(42874),
>>> attempts=3]UnavailableException()
>>> at
>>> com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>>> at
>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
>>> at
>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
>>> at
>>> com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
>>> at
>>> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>>> at
>>> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
>>> at
>>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
>>> at
>>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
>>> at
>>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
>>> Caused by: UnavailableException()
>>> at
>>> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
>>> at
>>> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>>> at
>>> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
>>> at
>>> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
>>> at
>>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:129)
>>> at
>>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:126)
>>> at
>>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>>> ... 12 more
>>>
>>>
>>>
>>> On Fri, Jul 11, 2014 at 9:11 AM, Prem Yadav 
>>> wrote:
>>>
 Please post the full exception.


 On Fri, Jul 11, 2014 at 1:50 PM, Ruchir Jha 
 wrote:

> We have a 12 node cluster and we are consistently seeing this
> exception being thrown during peak write traffic. We have a replication
> factor of 3 and a write consistency level of QUORUM. Also note there is no
> unusual Or Full GC activity during this time. Appreciate 

Re: Cassandra use cases/Strengths/Weakness

2014-07-14 Thread Keith Freeman
We've struggled getting consistent write latency & linear write 
scalability with a pretty heavy insert load (1000's of records/second), 
and our records are about 1k-2k of data (mix of integer/string columns 
and a blob).  Wondering if you have any rough numbers for your "small to 
medium write sizes" experience?


On 07/04/2014 01:58 PM, James Horey wrote:

...
* Low write latency with respect to small to medium write sizes (logs, 
sensor data, etc.)

* Linear write scalability
* ...




Re: Upgrading from 1.1.9 to 1.2.18

2014-07-14 Thread Robert Coli
On Mon, Jul 14, 2014 at 9:54 AM, Denning, Michael <
michael.denn...@kavokerrgroup.com> wrote:

>  I'm trying to upgrade from a 3 node 1.1.9 cluster to a 6 node 1.2.18
> cluster on ubuntu. Can sstableloader be used to stream from the existing
> cluster to the new cluster? If so, what that the suggested method? I keep
> getting the following when trying this:
>
http://palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

One of the caveats mentioned there is that sstableloader often does not
work between major versions.

If I were you, I would accomplish this task by dividing it in two :

1) Upgrade my 3 node cluster from 1.1.9 to 1.2.18 via rolling
restart/upgradesstables.
2) Expand 3 node cluster to 6 nodes

Is there a reason you are not using this process?

=Rob


RE: COMMERCIAL:Re: Upgrading from 1.1.9 to 1.2.18

2014-07-14 Thread Denning, Michael
3 node cluster is in production.  It’d difficult for me to get sign off on the 
change control to upgrade it.   The 6 node cluster is already stood up (in 
aws).  In an ideal scenario I’d just be able to bring the data over to the new 
cluster.


From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Monday, July 14, 2014 1:53 PM
To: user@cassandra.apache.org
Subject: COMMERCIAL:Re: Upgrading from 1.1.9 to 1.2.18

On Mon, Jul 14, 2014 at 9:54 AM, Denning, Michael 
mailto:michael.denn...@kavokerrgroup.com>> 
wrote:

I'm trying to upgrade from a 3 node 1.1.9 cluster to a 6 node 1.2.18 cluster on 
ubuntu. Can sstableloader be used to stream from the existing cluster to the 
new cluster? If so, what that the suggested method? I keep getting the 
following when trying this:
http://palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

One of the caveats mentioned there is that sstableloader often does not work 
between major versions.

If I were you, I would accomplish this task by dividing it in two :

1) Upgrade my 3 node cluster from 1.1.9 to 1.2.18 via rolling 
restart/upgradesstables.
2) Expand 3 node cluster to 6 nodes

Is there a reason you are not using this process?

=Rob



If you received this message and have reason to believe the sender did not 
intend to direct it to you, please notify the sender immediately by e-mail and 
delete the message from your system. This message (including any attachments) 
may contain confidential and/or proprietary information that should be read 
only by certain individuals. As a result, any unauthorized disclosure, copying, 
or distribution of this e-mail and the information contained herein is strictly 
prohibited and may constitute a violation of law. If you have any questions 
about this e-mail please notify the sender immediately.


Re: COMMERCIAL:Re: Upgrading from 1.1.9 to 1.2.18

2014-07-14 Thread Robert Coli
On Mon, Jul 14, 2014 at 11:12 AM, Denning, Michael <
michael.denn...@kavokerrgroup.com> wrote:

>  3 node cluster is in production.  It’d difficult for me to get sign off
> on the change control to upgrade it.   The 6 node cluster is already stood
> up (in aws).  In an ideal scenario I’d just be able to bring the data over
> to the new cluster.
>

Ok, use the "copy the sstables" method from the previous link?

1) fork writes so all writes go to both clusters
2) nodetool flush on source cluster
3) copy all sstables to all target nodes, being careful to avoid name
collision (use rolling restart, probably, "refresh" is unsafe)
4) run cleanup on target nodes (this will have the same effect as doing an
upgradesstables, as a bonus)
5) turn off writes to old cluster/turn on reads to new cluster

If I were you, I would strongly consider not using vnodes on your new
cluster. Unless you are very confident the cluster will grow above appx 10
nodes in the near future, you are likely to Just Lose from vnodes.

=Rob


Re: UnavailableException

2014-07-14 Thread Chris Lohfink
Is there a line when doing nodetool info/status like: 

Datacenter: datacenter1
=

You need to make sure the Datacenter name matches the name specified in your 
replication factor

Chris

On Jul 14, 2014, at 12:04 PM, Ruchir Jha  wrote:

> Mark,
> 
> Here you go:
> 
> NodeTool status:
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens  Owns   Host ID
>Rack
> UN  10.10.20.15  1.62 TB256 8.1%   
> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
> UN  10.10.20.19  1.66 TB256 8.3%   
> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
> UN  10.10.20.35  1.62 TB256 9.0%   
> 17cb8772-2444-46ff-8525-33746514727d  rack1
> UN  10.10.20.31  1.64 TB256 8.3%   
> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
> UN  10.10.20.52  1.59 TB256 9.1%   
> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
> UN  10.10.20.27  1.66 TB256 7.7%   
> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
> UN  10.10.20.22  1.66 TB256 8.9%   
> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
> UN  10.10.20.39  1.68 TB256 8.0%   
> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
> UN  10.10.20.45  1.49 TB256 7.7%   
> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
> UN  10.10.20.47  1.64 TB256 7.9%   
> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
> UN  10.10.20.62  1.59 TB256 8.2%   
> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
> UN  10.10.20.51  1.66 TB256 8.9%   
> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
> 
> 
> Astyanax Connection Settings:
> 
> seeds   :12
> maxConns   :16
> maxConnsPerHost:16
> connectTimeout :2000
> socketTimeout  :6
> maxTimeoutCount:16
> maxBlockedThreadsPerHost:16
> maxOperationsPerConnection:16
> DiscoveryType: RING_DESCRIBE
> ConnectionPoolType: TOKEN_AWARE
> DefaultReadConsistencyLevel: CL_QUORUM
> DefaultWriteConsistencyLevel: CL_QUORUM
> 
> 
> 
> On Fri, Jul 11, 2014 at 5:04 PM, Mark Reddy  wrote:
> Can you post the output of nodetool status and your Astyanax connection 
> settings?
> 
> 
> On Fri, Jul 11, 2014 at 9:06 PM, Ruchir Jha  wrote:
> This is how we create our keyspace. We just ran this command once through a 
> cqlsh session on one of the nodes, so don't quite understand what you mean by 
> "check that your DC names match up"
> 
> CREATE KEYSPACE prod WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'datacenter1': '3'
> };
> 
> 
> 
> On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink  
> wrote:
> What replication strategy are you using? if using NetworkTopolgyStrategy 
> double check that your DC names match up (case sensitive)
> 
> Chris
> 
> On Jul 11, 2014, at 9:38 AM, Ruchir Jha  wrote:
> 
>> Here's the complete stack trace:
>> 
>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: 
>> TokenRangeOfflineException: [host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, 
>> latency=22784(42874), attempts=3]UnavailableException()
>> at 
>> com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>> at 
>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
>> at 
>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
>> at 
>> com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
>> at 
>> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>> at 
>> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
>> Caused by: UnavailableException()
>> at 
>> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
>> at 
>> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>> at 
>> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
>> at 
>> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:129)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:126)
>> at 
>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>> ... 12 more
>> 
>> 
>> 
>> On Fri, Jul 11, 2014 at 9:11 AM, Prem Yadav  wrote:
>> Please post the full exception

Re: UnavailableException

2014-07-14 Thread Chris Lohfink
If you list all 12 nodes in seeds list, you can try using 
NodeDiscoveryType.NONE instead of RING_DESCRIBE.  

Its been recommended that way by some anyway so if you add nodes to cluster 
your app wont start using it until all bootstrapping and everythings settled 
down.

Chris

On Jul 14, 2014, at 12:04 PM, Ruchir Jha  wrote:

> Mark,
> 
> Here you go:
> 
> NodeTool status:
> 
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens  Owns   Host ID
>Rack
> UN  10.10.20.15  1.62 TB256 8.1%   
> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
> UN  10.10.20.19  1.66 TB256 8.3%   
> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
> UN  10.10.20.35  1.62 TB256 9.0%   
> 17cb8772-2444-46ff-8525-33746514727d  rack1
> UN  10.10.20.31  1.64 TB256 8.3%   
> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
> UN  10.10.20.52  1.59 TB256 9.1%   
> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
> UN  10.10.20.27  1.66 TB256 7.7%   
> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
> UN  10.10.20.22  1.66 TB256 8.9%   
> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
> UN  10.10.20.39  1.68 TB256 8.0%   
> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
> UN  10.10.20.45  1.49 TB256 7.7%   
> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
> UN  10.10.20.47  1.64 TB256 7.9%   
> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
> UN  10.10.20.62  1.59 TB256 8.2%   
> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
> UN  10.10.20.51  1.66 TB256 8.9%   
> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
> 
> 
> Astyanax Connection Settings:
> 
> seeds   :12
> maxConns   :16
> maxConnsPerHost:16
> connectTimeout :2000
> socketTimeout  :6
> maxTimeoutCount:16
> maxBlockedThreadsPerHost:16
> maxOperationsPerConnection:16
> DiscoveryType: RING_DESCRIBE
> ConnectionPoolType: TOKEN_AWARE
> DefaultReadConsistencyLevel: CL_QUORUM
> DefaultWriteConsistencyLevel: CL_QUORUM
> 
> 
> 
> On Fri, Jul 11, 2014 at 5:04 PM, Mark Reddy  wrote:
> Can you post the output of nodetool status and your Astyanax connection 
> settings?
> 
> 
> On Fri, Jul 11, 2014 at 9:06 PM, Ruchir Jha  wrote:
> This is how we create our keyspace. We just ran this command once through a 
> cqlsh session on one of the nodes, so don't quite understand what you mean by 
> "check that your DC names match up"
> 
> CREATE KEYSPACE prod WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'datacenter1': '3'
> };
> 
> 
> 
> On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink  
> wrote:
> What replication strategy are you using? if using NetworkTopolgyStrategy 
> double check that your DC names match up (case sensitive)
> 
> Chris
> 
> On Jul 11, 2014, at 9:38 AM, Ruchir Jha  wrote:
> 
>> Here's the complete stack trace:
>> 
>> com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException: 
>> TokenRangeOfflineException: [host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, 
>> latency=22784(42874), attempts=3]UnavailableException()
>> at 
>> com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
>> at 
>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
>> at 
>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
>> at 
>> com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
>> at 
>> com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
>> at 
>> com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
>> Caused by: UnavailableException()
>> at 
>> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
>> at 
>> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
>> at 
>> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
>> at 
>> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:129)
>> at 
>> com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:126)
>> at 
>> com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
>> ... 12 more
>> 
>> 
>> 
>> On Fri, Jul 1

Re: UnavailableException

2014-07-14 Thread Ruchir Jha
Yes the line is : Datacenter: datacenter1 which matches with my create
keyspace command. As for the NodeDiscoveryType, we will follow it but I
don't believe it to be the root of my issue here because the nodes start up
atleast 6 hours before the UnavailableException and as far as adding nodes
is concerned we would only do it after hours.


On Mon, Jul 14, 2014 at 2:34 PM, Chris Lohfink 
wrote:

> If you list all 12 nodes in seeds list, you can try using
> NodeDiscoveryType.NONE instead of RING_DESCRIBE.
>
> Its been recommended that way by some anyway so if you add nodes to
> cluster your app wont start using it until all bootstrapping and
> everythings settled down.
>
> Chris
>
> On Jul 14, 2014, at 12:04 PM, Ruchir Jha  wrote:
>
> Mark,
>
> Here you go:
>
> *NodeTool status:*
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens  Owns   Host ID
>   Rack
> UN  10.10.20.15  1.62 TB256 8.1%
> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
> UN  10.10.20.19  1.66 TB256 8.3%
> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
> UN  10.10.20.35  1.62 TB256 9.0%
> 17cb8772-2444-46ff-8525-33746514727d  rack1
> UN  10.10.20.31  1.64 TB256 8.3%
> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
> UN  10.10.20.52  1.59 TB256 9.1%
> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
> UN  10.10.20.27  1.66 TB256 7.7%
> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
> UN  10.10.20.22  1.66 TB256 8.9%
> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
> UN  10.10.20.39  1.68 TB256 8.0%
> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
> UN  10.10.20.45  1.49 TB256 7.7%
> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
> UN  10.10.20.47  1.64 TB256 7.9%
> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
> UN  10.10.20.62  1.59 TB256 8.2%
> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
> UN  10.10.20.51  1.66 TB256 8.9%
> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
>
>
> *Astyanax Connection Settings:*
>
> seeds   :12
> maxConns   :16
> maxConnsPerHost:16
> connectTimeout :2000
> socketTimeout  :6
> maxTimeoutCount:16
> maxBlockedThreadsPerHost:16
> maxOperationsPerConnection:16
> DiscoveryType: RING_DESCRIBE
> ConnectionPoolType: TOKEN_AWARE
> DefaultReadConsistencyLevel: CL_QUORUM
> DefaultWriteConsistencyLevel: CL_QUORUM
>
>
>
> On Fri, Jul 11, 2014 at 5:04 PM, Mark Reddy 
> wrote:
>
>> Can you post the output of nodetool status and your Astyanax connection
>> settings?
>>
>>
>> On Fri, Jul 11, 2014 at 9:06 PM, Ruchir Jha  wrote:
>>
>>> This is how we create our keyspace. We just ran this command once
>>> through a cqlsh session on one of the nodes, so don't quite understand what
>>> you mean by "check that your DC names match up"
>>>
>>> CREATE KEYSPACE prod WITH replication = {
>>>   'class': 'NetworkTopologyStrategy',
>>>   'datacenter1': '3'
>>> };
>>>
>>>
>>>
>>> On Fri, Jul 11, 2014 at 3:48 PM, Chris Lohfink >> > wrote:
>>>
 What replication strategy are you using? if using
 NetworkTopolgyStrategy double check that your DC names match up (case
 sensitive)

 Chris

 On Jul 11, 2014, at 9:38 AM, Ruchir Jha  wrote:

 Here's the complete stack trace:

 com.netflix.astyanax.connectionpool.exceptions.TokenRangeOfflineException:
 TokenRangeOfflineException:
 [host=ny4lpcas5.fusionts.corp(10.10.20.47):9160, latency=22784(42874),
 attempts=3]UnavailableException()
 at
 com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:165)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
 at
 com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
 at
 com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69)
 at
 com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:485)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:79)
 at
 com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:123)
 Caused by: UnavailableException()
 at
 org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20841)
 at
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
 at
 org.apache.c

Re: Multi-column range scans

2014-07-14 Thread Matthew Allen
Thanks for both your help, greatly appreciated.

We'll proceed down the path of putting the filtering into the application
logic for the time being.

Matt.


On Tue, Jul 15, 2014 at 1:20 AM, DuyHai Doan  wrote:

> Exact Ken, I get bitten again by the semantics of composite tuples.
>
>  This kind of query won't be possible until something like wide row end
> slice predicate is available (
> https://issues.apache.org/jira/browse/CASSANDRA-6167), if it will one day
>
>
>
>
> On Mon, Jul 14, 2014 at 5:02 PM, Ken Hancock 
> wrote:
>
>> I don't think your query is doing what he wants.  Your query will
>> correctly set the starting point, but will also return larger interval_id's
>> but with lower skill_levels:
>>
>> cqlsh:test> select * from skill_count where skill='Complaints' and
>> (interval_id, skill_level) >= (140235930, 5);
>>
>>  skill  | interval_id   | skill_level | skill_count
>> +---+-+-
>>  Complaints | 140235930 |   5 |  20
>>  Complaints | 140235930 |   8 |  30
>>  Complaints | 140235930 |  10 |   1
>>  Complaints | 140235940 |   2 |  10
>>  Complaints | 140235940 |   8 |  30
>>
>> (5 rows)
>>
>> cqlsh:test> select * from skill_count where skill='Complaints' and
>> (interval_id, skill_level) >= (140235930, 5) and (interval_id) <
>> (140235990);
>>
>>  skill  | interval_id   | skill_level | skill_count
>> +---+-+-
>>  Complaints | 140235930 |   5 |  20  <- desired
>>  Complaints | 140235930 |   8 |  30  <- desired
>>  Complaints | 140235930 |  10 |   1  <- desired
>>  Complaints | 140235940 |   2 |  10  <- SKIP
>>  Complaints | 140235940 |   8 |  30  <- desired
>>
>> The query results in a discontinuous range slice so isn't supported --
>> Essentially, the client will have to read the entire range and perform
>> client-side filtering.  Whether this is efficient depends on the
>> cardinality of skill_level.
>>
>> I tried playing with the "allow filtering" cql clause, but it would
>> appear from the documentation it's very restrictive...
>>
>>
>>
>>
>>
>> On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan 
>> wrote:
>>
>>> or :
>>>
>>>
>>> select * from skill_count where skill='Complaints'
>>> and (interval_id,skill_level) >= (140235930,5)
>>> and (interval_id) < (140235990)
>>>
>>> Strange enough, when starting using tuple notation you'll need to stick
>>> to it even if there is only one element in the tuple
>>>
>>>
>>> On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan 
>>> wrote:
>>>
 Sorry, I've just checked, the correct query should be:

 select * from skill_count where skill='Complaints' and
 (interval_id,skill_level) >= (140235930,5) and
 (interval_id,skill_level) < (140235990,11)


 On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan 
 wrote:

> Hello Mathew
>
>  Since Cassandra 2.0.6 it is possible to query over composites:
> https://issues.apache.org/jira/browse/CASSANDRA-4851
>
> For your example:
>
> select * from skill_count where skill='Complaints' and
> (interval_id,skill_level) >= (140235930,5) and interval_id <
> 140235990;
>
>
> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <
> matthew.j.al...@gmail.com> wrote:
>
>> Hi,
>>
>> We have a roll-up table that as follows.
>>
>> CREATE TABLE SKILL_COUNT (
>>   skill text,
>>   interval_id bigint,
>>   skill_level int,
>>   skill_count int,
>>   PRIMARY KEY (skill, interval_id, skill_level));
>>
>> Essentially,
>>   skill = a names skill i.e. "Complaints"
>>   interval_id = a rounded epoch time (15 minute intervals)
>>   skill_level = a number/rating from 1-10
>>   skill_count = the number of people with the specified skill, with
>> the specified skill level, logged in at the interval_id
>>
>> We'd like to run the following query against it
>>
>> select * from skill_count where skill='Complaints' and interval_id >=
>> 140235930 and interval_id < 140235990 and skill_level >= 5;
>>
>> to get a count of people with the relevant skill and level at the
>> appropriate time.  However I am getting the following message.
>>
>> Bad Request: PRIMARY KEY part skill_level cannot be restricted
>> (preceding part interval_id is either not restricted or by a non-EQ
>> relation)
>>
>> Looking at how the data is stored ...
>>
>> ---
>> RowKey: Complaints
>> => (name=140235930:2:, value=, timestamp=1405308260403000)
>> => (name=140235930:2:skill_count, value=000a,
>> timestamp=1405308260403000)
>> => (name=140235930:5:, value=, timestamp=1405308260403001)
>>

Re: high pending compactions

2014-07-14 Thread Greg Bone
I'm looking into creation of monitoring thresholds for cassandra to report
on its health. Does it make sense to set an alert threshold on compaction
stats? If so, would setting it to a value equal to or greater than
concurrent compactions make sense?

Thanks,
Greg




On Mon, Jun 9, 2014 at 2:14 PM, S C  wrote:

> Thank you all for quick responses.
> --
> From: clohf...@blackbirdit.com
> Subject: Re: high pending compactions
> Date: Mon, 9 Jun 2014 14:11:36 -0500
> To: user@cassandra.apache.org
>
> Bean: org.apache.cassandra.db.CompactionManager
>
> also nodetool compactionstats gives you how many are in the queue +
> estimate of how many will be needed.
>
> in 1.1 you will OOM *far* before you hit the limit,.  In theory though,
> the compaction executor is a little special cased and will actually throw
> an exception (normally it will block)
>
> Chris
>
> On Jun 9, 2014, at 7:49 AM, S C  wrote:
>
> Thank you all for valuable suggestions. Couple more questions,
>
> How to check the compaction queue? MBean/C* system log ?
> What happens if the queue is full?
>
> --
> From: colinkuo...@gmail.com
> Date: Mon, 9 Jun 2014 18:53:41 +0800
> Subject: Re: high pending compactions
> To: user@cassandra.apache.org
>
> As Jake suggested, you could firstly increase
> "compaction_throughput_mb_per_sec" and "concurrent_compactions" to suitable
> values if system resource is allowed. From my understanding, major
> compaction will internally acquire lock before running compaction. In your
> case, there might be a major compaction blocking the pending following
> compaction tasks. You could check the result of "nodetool compactionstats"
> and C* system log for double confirm.
>
> If the running compaction is compacting wide row for a long time, you
> could try to tune "in_memory_compaction_limit_in_mb" value.
>
> Thanks,
>
>
>
> On Sun, Jun 8, 2014 at 11:27 PM, S C  wrote:
>
> I am using Cassandra 1.1 (sorry bit old) and I am seeing high pending
> compaction count. "pending tasks: 67" while active compaction tasks are
> not more than 5. I have a 24CPU machine. Shouldn't I be seeing more
> compactions? Is this a pattern of high writes and compactions backing up?
> How can I improve this? Here are my thoughts.
>
>
>1. Increase memtable_total_space_in_mb
>2. Increase compaction_throughput_mb_per_sec
>3. Increase concurrent_compactions
>
>
> Sorry if this was discussed already. Any pointers is much appreciated.
>
> Thanks,
> Kumar
>
>
>


Index creation sometimes fails

2014-07-14 Thread Clint Kelly
Hi everyone,

I have some code that I've been fiddling with today that uses the
DataStax Java driver to create a table and then create a secondary
index on a column in that table.  I've testing this code fairly
thoroughly on a single-node Cassandra instance on my laptop and in
unit test (using the CassandraDaemon).

When running on a three-node cluster, however, I see strange behavior.
Although my table always gets created, the secondary index often does
not!  If I delete the table and then create it again (through the same
code that I've written), I've never seen the index fail to appear the
second time.

Does anyone have any idea what to look for here?  I have no experience
working on a Cassandra cluster and I wonder if maybe I am doing
something dumb (I basically just installed DSE and started up the
three nodes and that was it).  I don't see anything that looks unusual
in OpsCenter for DSE.

The only thing I've noticed is that the presence of output like the
following from my program after executing the command to create the
index is perfectly correlated with successful creation of the index:

14/07/14 17:40:01 DEBUG com.datastax.driver.core.Cluster: Received
event EVENT CREATED kiji_retail2.t_model_repo, scheduling delivery
14/07/14 17:40:01 DEBUG com.datastax.driver.core.ControlConnection:
[Control connection] Refreshing schema for kiji_retail2
14/07/14 17:40:01 DEBUG com.datastax.driver.core.Cluster: Refreshing
schema for kiji_retail2
14/07/14 17:40:01 DEBUG com.datastax.driver.core.ControlConnection:
Checking for schema agreement: versions are
[9a8d72f9-e384-3aa8-bc85-185e2c303ade,
b309518a-35d2-3790-bb66-ea39bb0d188c]
14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
Checking for schema agreement: versions are
[9a8d72f9-e384-3aa8-bc85-185e2c303ade,
b309518a-35d2-3790-bb66-ea39bb0d188c]
14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
Checking for schema agreement: versions are
[9a8d72f9-e384-3aa8-bc85-185e2c303ade,
b309518a-35d2-3790-bb66-ea39bb0d188c]
14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
Checking for schema agreement: versions are
[9a8d72f9-e384-3aa8-bc85-185e2c303ade,
b309518a-35d2-3790-bb66-ea39bb0d188c]
14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
Checking for schema agreement: versions are
[b309518a-35d2-3790-bb66-ea39bb0d188c]

If anyone can give me a hand, I would really appreciate it.  I am out of ideas!

Best regards,
Clint


Re: Index creation sometimes fails

2014-07-14 Thread Clint Kelly
BTW I have seen this using versions 2.0.1 and 2.0.3 of the java driver
on a three-node cluster with DSE 4.5.

On Mon, Jul 14, 2014 at 5:51 PM, Clint Kelly  wrote:
> Hi everyone,
>
> I have some code that I've been fiddling with today that uses the
> DataStax Java driver to create a table and then create a secondary
> index on a column in that table.  I've testing this code fairly
> thoroughly on a single-node Cassandra instance on my laptop and in
> unit test (using the CassandraDaemon).
>
> When running on a three-node cluster, however, I see strange behavior.
> Although my table always gets created, the secondary index often does
> not!  If I delete the table and then create it again (through the same
> code that I've written), I've never seen the index fail to appear the
> second time.
>
> Does anyone have any idea what to look for here?  I have no experience
> working on a Cassandra cluster and I wonder if maybe I am doing
> something dumb (I basically just installed DSE and started up the
> three nodes and that was it).  I don't see anything that looks unusual
> in OpsCenter for DSE.
>
> The only thing I've noticed is that the presence of output like the
> following from my program after executing the command to create the
> index is perfectly correlated with successful creation of the index:
>
> 14/07/14 17:40:01 DEBUG com.datastax.driver.core.Cluster: Received
> event EVENT CREATED kiji_retail2.t_model_repo, scheduling delivery
> 14/07/14 17:40:01 DEBUG com.datastax.driver.core.ControlConnection:
> [Control connection] Refreshing schema for kiji_retail2
> 14/07/14 17:40:01 DEBUG com.datastax.driver.core.Cluster: Refreshing
> schema for kiji_retail2
> 14/07/14 17:40:01 DEBUG com.datastax.driver.core.ControlConnection:
> Checking for schema agreement: versions are
> [9a8d72f9-e384-3aa8-bc85-185e2c303ade,
> b309518a-35d2-3790-bb66-ea39bb0d188c]
> 14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
> Checking for schema agreement: versions are
> [9a8d72f9-e384-3aa8-bc85-185e2c303ade,
> b309518a-35d2-3790-bb66-ea39bb0d188c]
> 14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
> Checking for schema agreement: versions are
> [9a8d72f9-e384-3aa8-bc85-185e2c303ade,
> b309518a-35d2-3790-bb66-ea39bb0d188c]
> 14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
> Checking for schema agreement: versions are
> [9a8d72f9-e384-3aa8-bc85-185e2c303ade,
> b309518a-35d2-3790-bb66-ea39bb0d188c]
> 14/07/14 17:40:02 DEBUG com.datastax.driver.core.ControlConnection:
> Checking for schema agreement: versions are
> [b309518a-35d2-3790-bb66-ea39bb0d188c]
>
> If anyone can give me a hand, I would really appreciate it.  I am out of 
> ideas!
>
> Best regards,
> Clint