org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: Corrupt (negative) value length encountered

2014-02-28 Thread Shammi Jayasinghe
Hi ,

We are using apache cassandra 1.2.13 version with three nodes. In that with
a high load we are getting following exception.[1] .

Could some one help on this. This is already reported in [2]

[1]

INFO [ScheduledTasks:1] 2014-02-27 21:56:59,928 GCInspector.java (line 119)
GC for ParNew: 241 ms for 1 collections, 1191010416 used; max is 8375238656
 INFO [MemoryMeter:1] 2014-02-27 21:57:38,322 Memtable.java (line 516)
CFS(Keyspace='QpidKeySpace', ColumnFamily='QueueEntries') liveRatio is
49.411764705882355 (just-counted was 49.411764705882355).  calculation took
0ms for 1 columns
ERROR [ReadStage:923] 2014-02-27 22:01:42,284 CassandraDaemon.java (line
191) Exception in thread Thread[ReadStage:923,5,main]
java.lang.RuntimeException:
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.IOException: Corrupt (negative) value length encountered
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(
StorageProxy.java:1614)
at java.util.concurrent.ThreadPoolExecutor$Worker.
runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.IOException: Corrupt (negative) value length encountered
at org.apache.cassandra.db.columniterator.IndexedSliceReader$
IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:357)
at org.apache.cassandra.db.columniterator.IndexedSliceReader.
computeNext(IndexedSliceReader.java:166)
at org.apache.cassandra.db.columniterator.IndexedSliceReader.
computeNext(IndexedSliceReader.java:50)
at com.google.common.collect.AbstractIterator.tryToComputeNext(
AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(
AbstractIterator.java:138)
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(
SSTableSliceIterator.java:90)
at org.apache.cassandra.db.filter.QueryFilter$2.getNext(
QueryFilter.java:171)
at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(
QueryFilter.java:154)
at org.apache.cassandra.utils.MergeIterator$Candidate.
advance(MergeIterator.java:143)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.
advance(MergeIterator.java:122)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.
computeNext(MergeIterator.java:96)
at com.google.common.collect.AbstractIterator.tryToComputeNext(
AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(
AbstractIterator.java:138)
at org.apache.cassandra.db.filter.SliceQueryFilter.
collectReducedColumns(SliceQueryFilter.java:160)
at org.apache.cassandra.db.filter.QueryFilter.
collateColumns(QueryFilter.java:136)
at org.apache.cassandra.db.filter.QueryFilter.
collateOnDiskAtom(QueryFilter.java:84)
at org.apache.cassandra.db.CollationController.collectAllData(
CollationController.java:291)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(
CollationController.java:65)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(
ColumnFamilyStore.java:1397)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(
ColumnFamilyStore.java:1213)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(
ColumnFamilyStore.java:1129)
at org.apache.cassandra.db.Table.getRow(Table.java:344)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(
SliceFromReadCommand.java:70)
at org.apache.cassandra.service.StorageProxy$
LocalReadRunnable.runMayThrow(StorageProxy.java:1058)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(
StorageProxy.java:1610)
... 3 more
Caused by: java.io.IOException: Corrupt (negative) value length encountered
at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(
ByteBufferUtil.java:352)
at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(
ColumnSerializer.java:102)
at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(
OnDiskAtom.java:92)
at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(
OnDiskAtom.java:73)
at org.apache.cassandra.db.columniterator.IndexedSliceReader$
IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:398)
at org.apache.cassandra.db.columniterator.IndexedSliceReader$
IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:353)
... 27 more

[2]https://issues.apache.org/jira/browse/CASSANDRA-6536
-- 
Best Regards,

*  Shammi Jayasinghe*
Associate Tech Lead
WSO2, Inc.; http://wso2.com,
mobile: +94 71 4493085


CorruptSSTableException in system_auth keyspace

2014-02-28 Thread Ondřej Černoš
Hello,

we are trying to add authentication to our Cassandra cluster. We add our
authenticated users during puppet deployment using the default user, which
is then disabled.

We have the following issues:

- we see CorruptSSTableException in system_auth.users table
- we are not able to add users after delete, which can be explained by the
following statement found in the source code: "INSERT INTO %s.%s (username,
salted_hash) VALUES ('%s', '%s') USING TIMESTAMP 0" (see the 0 - is this
really correct?)

nodetool scrub didn't help, compactation didn't help - tombstones were
still there, as well as the exception.

Has anybody else seen this?

It's cassandra 1.2.11 with vnodes on.

regards,
ondrej cernos


Re: CorruptSSTableException in system_auth keyspace

2014-02-28 Thread Ondřej Černoš
Sorry, I sent the mail too early.

This is the stack trace:

2014-02-28 10:56:03.205+0100 [SSTableBatchOpen:1] [ERROR] DebuggableThrea
dPoolExecutor.java(218) org.apache.cassandra.concurrent.DebuggableThr
eadPoolExecutor: Error in ThreadPoolExecutor
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFExce
ption
   at org.apache.cassandra.io.compress.CompressionMetadata.(
CompressionMetadata.java:108)
   at org.apache.cassandra.io.compress.CompressionMetadata.create(
CompressionMetadata.java:63)
   at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$
Builder.complete(CompressedPoolingSegmentedFile.java:42)
   at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableRe
ader.java:407)
   at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableRe
ader.java:198)
   at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableRe
ader.java:157)
   at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableR
eader.java:262)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executor
s.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.EOFException
   at java.io.DataInputStream.readUnsignedShort(DataInputStream.ja
va:340)
   at java.io.DataInputStream.readUTF(DataInputStream.java:589)
   at java.io.DataInputStream.readUTF(DataInputStream.java:564)
   at org.apache.cassandra.io.compress.CompressionMetadata.(
CompressionMetadata.java:83)
   ... 11 more

Snappy is used for compression on this table.

ondrej c.


On Fri, Feb 28, 2014 at 11:09 AM, Ondřej Černoš  wrote:

> Hello,
>
> we are trying to add authentication to our Cassandra cluster. We add our
> authenticated users during puppet deployment using the default user, which
> is then disabled.
>
> We have the following issues:
>
> - we see CorruptSSTableException in system_auth.users table
> - we are not able to add users after delete, which can be explained by the
> following statement found in the source code: "INSERT INTO %s.%s
> (username, salted_hash) VALUES ('%s', '%s') USING TIMESTAMP 0" (see the 0 -
> is this really correct?)
>
> nodetool scrub didn't help, compactation didn't help - tombstones were
> still there, as well as the exception.
>
> Has anybody else seen this?
>
> It's cassandra 1.2.11 with vnodes on.
>
> regards,
> ondrej cernos
>


Re: CQL: Any way to have inequalities on multiple clustering columns in a WHERE clause?

2014-02-28 Thread Clint Kelly
Yes, thank you!


On Thu, Feb 27, 2014 at 10:26 PM, DuyHai Doan  wrote:

> Clint, what you want is this :
> https://issues.apache.org/jira/browse/CASSANDRA-4851
>
> select * from foo where key=something and fam = 'Info' and (qual,version)
> > ('A',2013) and qual < 'D' ALLOW FILTERING
>
>
> On Fri, Feb 28, 2014 at 6:57 AM, Clint Kelly wrote:
>
>> All,
>>
>> Is there any way to have inequalities comparisons on multiple clustering
>> columns in a WHERE clause in CQL?  For example, I'd like to do:
>>
>> select * from foo where fam = 'Info' and qual > 'A' and qual < 'D' and
>> version > 2013 ALLOW FILTERING;
>>
>> I get an error:
>>
>> Bad Request: PRIMARY KEY part version cannot be restricted (preceding
>> part qual is either not restricted or by a non-EQ relation)
>>
>> when I try this.  Is there any way to make a query like this work?  I
>> understand that this query will not be a nice, continuous scan of data, but
>> I'd rather have a slower query than have to do all of this filtering on the
>> client side.  Any other suggestions?
>>
>> BTW my table looks like this:
>>
>> CREATE TABLE foo (
>>   key text,
>>   fam text,
>>   qual text,
>>   version int,
>>   val text,
>>   PRIMARY KEY (key, fam, qual, version)
>> ) WITH
>>   bloom_filter_fp_chance=0.01 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.00 AND
>>   gc_grace_seconds=864000 AND
>>   index_interval=128 AND
>>   read_repair_chance=0.10 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   default_time_to_live=0 AND
>>   speculative_retry='99.0PERCENTILE' AND
>>   memtable_flush_period_in_ms=0 AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={'sstable_compression': 'LZ4Compressor'};
>>
>> Best regards,
>> Clint
>>
>
>


Compaction does not remove tombstones if column has higher TTL

2014-02-28 Thread Keith Wright
FYI – I recently filed https://issues.apache.org/jira/browse/CASSANDRA-6654 and 
wanted to let everyone know the result as it was not what I expected.   I am 
using C* 1.2.12 and found that my droppable tombstone ratio kept increasing on 
an LCS table (currently > .3).  Documentation states that compactions should be 
triggered when that gets above .2 to help cleanup tombstones and in my case 
compactions are definitely not running behind.

I am setting different TTLs on different columns (this capability was one of 
the things I love about Cassandra) and the result of the ticket is that only 
columns whose write time is less than now + the MAX TTL of columns within the 
row will NOT be removed.  In my case, I was setting some columns to 6 months 
and others to 7 days so this meant that the 7 day data will in fact NOT be 
removed until 6 months!  This results in MUCH wider rows than I expected.

It appears that this was likely fixed in 2.1 but obviously people will not be 
deploying that to production anytime soon.  It appears that I will just have to 
no longer set the 6 month TTL and instead leave it as forever to ensure that 
the smaller TTLs are respected.  This is an acceptable tradeoff for me since 
the 7 day columns are the ones that get much larger (against a map column type).

So be warned, mixing TTLs in a row does not appear to result in the data being 
compacted away.

Thanks


Query on blob col using CQL3

2014-02-28 Thread Senthil, Athinanthny X. -ND
Anyone can suggest how to query on blob column via CQL3. I get  bad request 
error saying cannot parse data. I want to lookup on key column which is defined 
as blob.

But I am able to lookup data via opscenter data explorer.  Is there a 
conversion functions I need to use?




Sent from my Galaxy S®III


Re: Query on blob col using CQL3

2014-02-28 Thread Mikhail Stepura

Did you try http://cassandra.apache.org/doc/cql3/CQL.html#blobFun ?


On 2/28/14, 9:14, Senthil, Athinanthny X. -ND wrote:

Anyone can suggest how to query on blob column via CQL3. I get  bad
request error saying cannot parse data. I want to lookup on key column
which is defined as blob.

But I am able to lookup data via opscenter data explorer.  Is there a
conversion functions I need to use?




Sent from my Galaxy S®III





Re: Query on blob col using CQL3

2014-02-28 Thread Peter Lin
why are you trying to view a blob with CQL3? and what kind of blob is it?

if the blob is an object, there's no way to view that in CQL3. You'd need
to do extra work like user defined types, but I don't know of anyone that's
actually using that.


On Fri, Feb 28, 2014 at 12:14 PM, Senthil, Athinanthny X. -ND <
athinanthny.x.senthil@disney.com> wrote:

> Anyone can suggest how to query on blob column via CQL3. I get  bad
> request error saying cannot parse data. I want to lookup on key column
> which is defined as blob.
>
> But I am able to lookup data via opscenter data explorer.  Is there a
> conversion functions I need to use?
>
>
>
>
> Sent from my Galaxy S®III
>


Re: Getting the most-recent version from time-series data

2014-02-28 Thread Clint Kelly
Hi Tupshin,

Thanks for your help once again, I really appreciate it.  Quick question
regarding the issue of token-aware routing, etc.  Let's say that I am using
the table described earlier:

CREATE TABLE time_series_stuff (
  key text,
  family text,
  version int,
  val text,
  PRIMARY KEY (key, family, version)
) WITH CLUSTERING ORDER BY (family ASC, version DESC)

I want to retrieve values for the most-recent version of every family for a
given key, doing something like:

SELECT * from time_series_stuff where key='mykey'

but getting only one version per family.

All of this data should live on the same node (or set of replica nodes),
correct?  I am specifying the partition key here, and I thought that only
the partition key determined on what physical nodes data exists.
Therefore, I would think that all of the results from this query would come
from a single replica node (or set of replica nodes, if the consistency
level is greater than 1).

Would you mind clarifying?  Thanks a lot!

Best regards,
Clint






On Wed, Feb 26, 2014 at 4:56 AM, Tupshin Harper  wrote:

> And one last clarification. Where I said "stored procedure" earlier, I
> meant "prepared statement". Sorry for the confusion. Too much typing while
> tired.
>
> -Tupshin
>
>
> On Tue, Feb 25, 2014 at 10:36 PM, Tupshin Harper wrote:
>
>> I failed to address the matter of not knowing the families in advance.
>>
>> I can't really recommend any solution to that other than storing the list
>> of families in another structure that is readily queryable. I don't know
>> how many families you are thinking, but if it is in the millions or more,
>> You might consider constructing another table such as:
>> CREATE TABLE families (
>>   key int,
>>   family text,
>>   PRIMARY KEY (key, family)
>> );
>>
>>
>> store your families there, with a knowable set of keys (I suggest
>> something like the last 3 digits of the md5 hash of the family). So then
>> you could retrieve your families in nice sized batches
>> SELECT family FROM id WHERE key=0;
>> and then do the fan-out selects that I described previously.
>>
>> -Tupshin
>>
>>
>> On Tue, Feb 25, 2014 at 10:15 PM, Tupshin Harper wrote:
>>
>>> Hi Clint,
>>>
>>> What you are describing could actually be accomplished with the Thrift
>>> API and a multiget_slice with a slicerange having a count of 1. Initially I
>>> was thinking that this was an important feature gap between Thrift and CQL,
>>> and was going to suggest that it should be implemented (possible syntax is
>>> in https://issues.apache.org/jira/browse/CASSANDRA-6167 which is almost
>>> a superset of this feature).
>>>
>>> But then I was convinced by some colleagues, that with a modern CQL
>>> driver that is token aware, you are actually better off (in terms of
>>> latency, throughput, and reliability), by doing each query separately on
>>> the client.
>>>
>>> The reasoning is that if you did this with a single query, it would
>>> necessarily be sent to a coordinator that wouldn't own most of the data
>>> that you are looking for. That coordinator would then need to fan out the
>>> read to all the nodes owning the partitions you are looking for.
>>>
>>> Far better to just do it directly on the client. The token aware client
>>> will send each request for a row straight to a node that owns it. With a
>>> separate connection open to each node, this is done in parallel from the
>>> get-go. Fewer hops. Less load on the coordinator. No bottlenecks. And with
>>> a stored procedure, very very little additional overhead to the client,
>>> server, or network.
>>>
>>> -Tupshin
>>>
>>>
>>> On Tue, Feb 25, 2014 at 7:48 PM, Clint Kelly wrote:
>>>
 Hi everyone,

 Let's say that I have a table that looks like the following:

 CREATE TABLE time_series_stuff (
   key text,
   family text,
   version int,
   val text,
   PRIMARY KEY (key, family, version)
 ) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.00 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.10 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};

 cqlsh:fiddle> select * from time_series_stuff ;

  key| family  | version | val
 +-+-+
  monday | revenue |   3 | $$
  monday | revenue |   2 |$$$
  monday | revenue |   1 | $$
  monday | revenue |   0 |  $
  monday | traffic |   2 | medium
  monday | traffic |   1 |  light
  monday | traffic |   0 |  heavy

 (

Re: Combine multiple SELECT statements into one RPC?

2014-02-28 Thread Clint Kelly
Hi Sylvain,

Thanks for your response.  I am writing code to allow users to query a
table that looks something like this:

CREATE TABLE time_series_stuff (
  key text,
  family text,
  qualifier text,
  version long,
  val blob,
  PRIMARY KEY (key, family, qualifier, version)
) WITH CLUSTERING ORDER BY (family ASC, qualifier ASC, version DESC)

To a user, working with this table looks something like working with a
nested map (from family -> qualifier -> version -> value).

A user of our system can create a single data request that indicates that
he or she would like to fetch data for a given (family, qualifier, version)
value, but a user can also do things like:

   - Specify only a (family, qualifier) -> fetch all of the versions
   - Specify (family, qualifier, min and max versions) -> fetch all of the
   versions in a range
   - Specify (family) -> fetch all qualifiers, all versions in a family
   - Specify (family, *, min and max versions) -> fetch all values with
   version range for any qualifier

etc.

So some of the data requests from users can translate into multiple C*
SELECT statements.  I was just looking for a way to combine them into a
single client / server transaction, if possible.  At the moment I'm just
issuing multiple SELECT statements, which is fine, but I was wondering if
there was a way to combine them together somehow.

Thanks for your help!

Best regards,
Clint






On Thu, Feb 27, 2014 at 1:05 AM, Sylvain Lebresne wrote:

> On Thu, Feb 27, 2014 at 1:00 AM, Clint Kelly wrote:
>
>> Hi all,
>>
>> Is there any way to use the DataStax Java driver to combine multiple
>> SELECT statements into a single RPC?  I assume not (I could not find
>> anything about this in the documentation), but I just wanted to check.
>>
>
> The short answer is no.
>
> The slightly longer answer is that the DataStax Java driver uses the
> so-called native protocol. And that protocol does not allow to have
> multiple SELECT into a single protocol message (the protocol is not really
> RPC-based strictly speaking so I'll assume you meant one client->server
> message here), and it follows that the driver can't either. But I'll note
> that the reason why the protocol doesn't have such a thing is that it's
> generally a better idea to parallelize your SELECT client side, though
> since you haven't provided much context for you question I'd rather not go
> into too much details here since that might be off-topic.
>
> --
> Sylvain
>


Re: Getting the most-recent version from time-series data

2014-02-28 Thread Clint Kelly
Hi Tupshin,

BTW, you asked earlier about the number of different distinct "family"
values.  There could easily be millions of different families, each with
many different values.  Right now I see two options:

   1. Query the table once just to get all of the distinct families, then
   do separate queries for each family to get the most-recent version.
   2. Read back all of the versions of all of the families and then filter
   on the client side.

Neither one of these are great solutions, although once we have users
reading back millions of values in a single query, they will have to
indicate (to our software that sits on top of C*) that they are going to
use paging, and then we are going to be doing multiple client / server
operations anyway.  I'd just like to minimize them.  :)

Best regards,
Clint




On Fri, Feb 28, 2014 at 9:47 AM, Clint Kelly  wrote:

> Hi Tupshin,
>
> Thanks for your help once again, I really appreciate it.  Quick question
> regarding the issue of token-aware routing, etc.  Let's say that I am using
> the table described earlier:
>
>
> CREATE TABLE time_series_stuff (
>   key text,
>   family text,
>   version int,
>   val text,
>   PRIMARY KEY (key, family, version)
> ) WITH CLUSTERING ORDER BY (family ASC, version DESC)
>
> I want to retrieve values for the most-recent version of every family for
> a given key, doing something like:
>
> SELECT * from time_series_stuff where key='mykey'
>
> but getting only one version per family.
>
> All of this data should live on the same node (or set of replica nodes),
> correct?  I am specifying the partition key here, and I thought that only
> the partition key determined on what physical nodes data exists.
> Therefore, I would think that all of the results from this query would come
> from a single replica node (or set of replica nodes, if the consistency
> level is greater than 1).
>
> Would you mind clarifying?  Thanks a lot!
>
> Best regards,
> Clint
>
>
>
>
>
>
> On Wed, Feb 26, 2014 at 4:56 AM, Tupshin Harper wrote:
>
>> And one last clarification. Where I said "stored procedure" earlier, I
>> meant "prepared statement". Sorry for the confusion. Too much typing while
>> tired.
>>
>> -Tupshin
>>
>>
>> On Tue, Feb 25, 2014 at 10:36 PM, Tupshin Harper wrote:
>>
>>> I failed to address the matter of not knowing the families in advance.
>>>
>>> I can't really recommend any solution to that other than storing the
>>> list of families in another structure that is readily queryable. I don't
>>> know how many families you are thinking, but if it is in the millions or
>>> more, You might consider constructing another table such as:
>>> CREATE TABLE families (
>>>   key int,
>>>   family text,
>>>   PRIMARY KEY (key, family)
>>> );
>>>
>>>
>>> store your families there, with a knowable set of keys (I suggest
>>> something like the last 3 digits of the md5 hash of the family). So then
>>> you could retrieve your families in nice sized batches
>>> SELECT family FROM id WHERE key=0;
>>> and then do the fan-out selects that I described previously.
>>>
>>> -Tupshin
>>>
>>>
>>> On Tue, Feb 25, 2014 at 10:15 PM, Tupshin Harper wrote:
>>>
 Hi Clint,

 What you are describing could actually be accomplished with the Thrift
 API and a multiget_slice with a slicerange having a count of 1. Initially I
 was thinking that this was an important feature gap between Thrift and CQL,
 and was going to suggest that it should be implemented (possible syntax is
 in https://issues.apache.org/jira/browse/CASSANDRA-6167 which is
 almost a superset of this feature).

 But then I was convinced by some colleagues, that with a modern CQL
 driver that is token aware, you are actually better off (in terms of
 latency, throughput, and reliability), by doing each query separately on
 the client.

 The reasoning is that if you did this with a single query, it would
 necessarily be sent to a coordinator that wouldn't own most of the data
 that you are looking for. That coordinator would then need to fan out the
 read to all the nodes owning the partitions you are looking for.

 Far better to just do it directly on the client. The token aware client
 will send each request for a row straight to a node that owns it. With a
 separate connection open to each node, this is done in parallel from the
 get-go. Fewer hops. Less load on the coordinator. No bottlenecks. And with
 a stored procedure, very very little additional overhead to the client,
 server, or network.

 -Tupshin


 On Tue, Feb 25, 2014 at 7:48 PM, Clint Kelly wrote:

> Hi everyone,
>
> Let's say that I have a table that looks like the following:
>
> CREATE TABLE time_series_stuff (
>   key text,
>   family text,
>   version int,
>   val text,
>   PRIMARY KEY (key, family, version)
> ) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND
>   bloo

Caching prepared queries and different consistency levels

2014-02-28 Thread Wayne Schroeder
After upgrading to the 2.0 driver branch, I received a lot of warnings about 
re-preparing previously prepared statements.  I read about this issue, and my 
work around was to cache my prepared statements in a Map internally in my app via a common prepare method, where the 
string key was the CQL query itself.  This has been working perfectly, but I 
realized today that the consistency level I was setting on BoundStatement is 
actually inherited from Statement.  Now, while it is obviously not the same 
object instance (the BoundStatement vs the cached PreparedStatement), I was 
concerned that I was inadvertently changing the consistency level of the cached 
PreparedStatement in a non thread safe fashion.  My impression had been that 
the BoundStatement, even though created against a cached/shared 
PreparedStatement, was mine to do with what I pleased exclusively in my thread 
context.  Is this a correct/incorrect assumption?

I guess what it boils down to is the following:   Are the consistency level in 
the PreparedStatement and BoundStatement linked when the BoundStatement is 
created so that modifying the consistency level of the BoundStatement modifies 
the underlying PreparedStatement?

What I am hoping is the case is that the PreparedStatement's consistency level 
is just used to initialize the BoundStatement and that the BoundStatement's 
consistency level is then used when executing the query.

Wayne



Any way to get a list of per-node token ranges using the DataStax Java driver?

2014-02-28 Thread Clint Kelly
Hi everyone,

I've been working on a rewrite of the Cassandra InputFormat for Hadoop 2
using the DataStax Java driver instead of the Thrift API.

I have a prototype working now, but there is one bit of code that I have
not been able to replace with code for the Java driver.  In the
InputFormat#getSplits method, the old code has a call like the following:

  map = client.describe_ring(ConfigHelper.getInputKeyspace(conf));

This gets a list of the distinct token ranges for the Cassandra cluster.

The rest of "getSplits" then takes these key ranges, breaks them up into
subranges (to match the user-specified input split size), and then gets the
replica nodes for the various token ranges (as the locations for the
splits).

Does anyone know how I can do the following with the native protocol?

   - Get the distinct token ranges for the C* cluster
   - Get the set of replica nodes for a given range of tokens?

I tried looking around in Cluster and Metadata, among other places, in the
API docs, but I didn't see anything that looked like it would do what I
want.

Thanks!

Best regards,
Clint


Re: Caching prepared queries and different consistency levels

2014-02-28 Thread Wayne Schroeder
Well, it may seem like I'm talking to myself now with this response, but I 
cracked open the source and found the answer in fairly short order so I figured 
I would share what I found.  Datastax folks, please do verify that I'm correct 
if you don't mind.

Long story short, BoundStatement initializes from the PreparedStatement's 
consistency, and that's where it stops -- it is not connected to the original 
PreparedStatement.  The thing I have to be careful about is changing the 
consistency level of the PreparedStatement's that I am caching as it will 
effectively change the default of ONE that I am expecting.  This is obviously 
specific to my application, but hopefully it helps anyone who has followed that 
pattern as well.

Wayne


On Feb 28, 2014, at 12:18 PM, Wayne Schroeder 
 wrote:

> After upgrading to the 2.0 driver branch, I received a lot of warnings about 
> re-preparing previously prepared statements.  I read about this issue, and my 
> work around was to cache my prepared statements in a Map PreparedStatement> internally in my app via a common prepare method, where 
> the string key was the CQL query itself.  This has been working perfectly, 
> but I realized today that the consistency level I was setting on 
> BoundStatement is actually inherited from Statement.  Now, while it is 
> obviously not the same object instance (the BoundStatement vs the cached 
> PreparedStatement), I was concerned that I was inadvertently changing the 
> consistency level of the cached PreparedStatement in a non thread safe 
> fashion.  My impression had been that the BoundStatement, even though created 
> against a cached/shared PreparedStatement, was mine to do with what I pleased 
> exclusively in my thread context.  Is this a correct/incorrect assumption?
> 
> I guess what it boils down to is the following:   Are the consistency level 
> in the PreparedStatement and BoundStatement linked when the BoundStatement is 
> created so that modifying the consistency level of the BoundStatement 
> modifies the underlying PreparedStatement?
> 
> What I am hoping is the case is that the PreparedStatement's consistency 
> level is just used to initialize the BoundStatement and that the 
> BoundStatement's consistency level is then used when executing the query.
> 
> Wayne
> 



Re:

2014-02-28 Thread Tyler Hobbs
Can you clarify exactly what you need help with?  It seems like you already
know how to fetch the timestamps.  Are you just looking for python code to
filter data that's not in a time range?

By the way, there's a pycassa-specific mailing list here:
https://groups.google.com/forum/#!forum/pycassa-discuss


On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan  wrote:

> Hey folks,
>
> I am dealing with a legacy CFs where super_column has been used and python
> client pycassa is being used. An example is given below. My question here
> is, can I make use of  include_timestamp to select data between two
> returned timestamps e.g between 1393516744591751 and 1393516772131811. This
> is not exactly timeseries but just selected between two. Please help on
> this?
>
>
> Data is inserted like this
>
> TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}})
>
>
> Data Fetch:
>
> TEST_CF.get('test_r_key', include_timestamp=True)
>
>
> OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1',
> 1393451990902345))])),
>
>  ('1235', OrderedDict([('key_name_2', (u'taf_test_2',
> 1393516744591751))])),
>
>  ('1236', OrderedDict([('key_name_3', (u'taf_test_3',
> 1393516772131782))]))
>
>  ('1237', OrderedDict([('key_name_4', (u'taf_test_4',
> 1393516772131799))]))
>
>  ('1238', OrderedDict([('key_name_5', (u'taf_test_5',
> 1393516772131811))]))
>
>  ('1239', OrderedDict([('key_name_6', (u'taf_test_6',
> 1393516772131854))]))
>
>  ('1240', OrderedDict([('key_name_7', (u'taf_test_7',
> 1393516772131899))]))
>
> ])
>



-- 
Tyler Hobbs
DataStax 


Re:

2014-02-28 Thread Kumar Ranjan
Yes, filter out based on time range. Currently i do this in python . Just 
curious to see if this can be done using pycassa somehow?—
Sent from Mailbox for iPhone

On Fri, Feb 28, 2014 at 2:13 PM, Tyler Hobbs  wrote:

> Can you clarify exactly what you need help with?  It seems like you already
> know how to fetch the timestamps.  Are you just looking for python code to
> filter data that's not in a time range?
> By the way, there's a pycassa-specific mailing list here:
> https://groups.google.com/forum/#!forum/pycassa-discuss
> On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan  wrote:
>> Hey folks,
>>
>> I am dealing with a legacy CFs where super_column has been used and python
>> client pycassa is being used. An example is given below. My question here
>> is, can I make use of  include_timestamp to select data between two
>> returned timestamps e.g between 1393516744591751 and 1393516772131811. This
>> is not exactly timeseries but just selected between two. Please help on
>> this?
>>
>>
>> Data is inserted like this
>>
>> TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}})
>>
>>
>> Data Fetch:
>>
>> TEST_CF.get('test_r_key', include_timestamp=True)
>>
>>
>> OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1',
>> 1393451990902345))])),
>>
>>  ('1235', OrderedDict([('key_name_2', (u'taf_test_2',
>> 1393516744591751))])),
>>
>>  ('1236', OrderedDict([('key_name_3', (u'taf_test_3',
>> 1393516772131782))]))
>>
>>  ('1237', OrderedDict([('key_name_4', (u'taf_test_4',
>> 1393516772131799))]))
>>
>>  ('1238', OrderedDict([('key_name_5', (u'taf_test_5',
>> 1393516772131811))]))
>>
>>  ('1239', OrderedDict([('key_name_6', (u'taf_test_6',
>> 1393516772131854))]))
>>
>>  ('1240', OrderedDict([('key_name_7', (u'taf_test_7',
>> 1393516772131899))]))
>>
>> ])
>>
> -- 
> Tyler Hobbs
> DataStax 

Re:

2014-02-28 Thread Tyler Hobbs
No, pycassa won't do anything fancy with timestamps automatically, you'll
have to keep doing yourself.


On Fri, Feb 28, 2014 at 1:28 PM, Kumar Ranjan  wrote:

> Yes, filter out based on time range. Currently i do this in python . Just
> curious to see if this can be done using pycassa somehow?
> --
> Sent from Mailbox  for iPhone
>
>
> On Fri, Feb 28, 2014 at 2:13 PM, Tyler Hobbs  wrote:
>
>> Can you clarify exactly what you need help with?  It seems like you
>> already know how to fetch the timestamps.  Are you just looking for python
>> code to filter data that's not in a time range?
>>
>> By the way, there's a pycassa-specific mailing list here:
>> https://groups.google.com/forum/#!forum/pycassa-discuss
>>
>>
>> On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan wrote:
>>
>>>  Hey folks,
>>>
>>> I am dealing with a legacy CFs where super_column has been used and
>>> python client pycassa is being used. An example is given below. My question
>>> here is, can I make use of  include_timestamp to select data between two
>>> returned timestamps e.g between 1393516744591751 and 1393516772131811. This
>>> is not exactly timeseries but just selected between two. Please help on
>>> this?
>>>
>>>
>>> Data is inserted like this
>>>
>>> TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}})
>>>
>>>
>>> Data Fetch:
>>>
>>> TEST_CF.get('test_r_key', include_timestamp=True)
>>>
>>>
>>> OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1',
>>> 1393451990902345))])),
>>>
>>>  ('1235', OrderedDict([('key_name_2', (u'taf_test_2',
>>> 1393516744591751))])),
>>>
>>>  ('1236', OrderedDict([('key_name_3', (u'taf_test_3',
>>> 1393516772131782))]))
>>>
>>>  ('1237', OrderedDict([('key_name_4', (u'taf_test_4',
>>> 1393516772131799))]))
>>>
>>>  ('1238', OrderedDict([('key_name_5', (u'taf_test_5',
>>> 1393516772131811))]))
>>>
>>>  ('1239', OrderedDict([('key_name_6', (u'taf_test_6',
>>> 1393516772131854))]))
>>>
>>>  ('1240', OrderedDict([('key_name_7', (u'taf_test_7',
>>> 1393516772131899))]))
>>>
>>>  ])
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>
>


-- 
Tyler Hobbs
DataStax 


Re:

2014-02-28 Thread Kumar Ranjan
Thanks Tyler. Yes, I scanned through pycassaShell code couple of times but
did not find anything like that.


On Fri, Feb 28, 2014 at 3:24 PM, Tyler Hobbs  wrote:

> No, pycassa won't do anything fancy with timestamps automatically, you'll
> have to keep doing yourself.
>
>
> On Fri, Feb 28, 2014 at 1:28 PM, Kumar Ranjan wrote:
>
>> Yes, filter out based on time range. Currently i do this in python . Just
>> curious to see if this can be done using pycassa somehow?
>> --
>> Sent from Mailbox  for iPhone
>>
>>
>> On Fri, Feb 28, 2014 at 2:13 PM, Tyler Hobbs  wrote:
>>
>>> Can you clarify exactly what you need help with?  It seems like you
>>> already know how to fetch the timestamps.  Are you just looking for python
>>> code to filter data that's not in a time range?
>>>
>>> By the way, there's a pycassa-specific mailing list here:
>>> https://groups.google.com/forum/#!forum/pycassa-discuss
>>>
>>>
>>> On Thu, Feb 27, 2014 at 2:48 PM, Kumar Ranjan wrote:
>>>
  Hey folks,

 I am dealing with a legacy CFs where super_column has been used and
 python client pycassa is being used. An example is given below. My question
 here is, can I make use of  include_timestamp to select data between two
 returned timestamps e.g between 1393516744591751 and 1393516772131811. This
 is not exactly timeseries but just selected between two. Please help on
 this?


 Data is inserted like this

 TEST_CF.insert('test_r_key',{'1234': {'key_name_1': 'taf_test_1'}})


 Data Fetch:

 TEST_CF.get('test_r_key', include_timestamp=True)


 OrderedDict([('1234', OrderedDict([('key_name_1', (u'taf_test_1',
 1393451990902345))])),

  ('1235', OrderedDict([('key_name_2', (u'taf_test_2',
 1393516744591751))])),

  ('1236', OrderedDict([('key_name_3', (u'taf_test_3',
 1393516772131782))]))

  ('1237', OrderedDict([('key_name_4', (u'taf_test_4',
 1393516772131799))]))

  ('1238', OrderedDict([('key_name_5', (u'taf_test_5',
 1393516772131811))]))

  ('1239', OrderedDict([('key_name_6', (u'taf_test_6',
 1393516772131854))]))

  ('1240', OrderedDict([('key_name_7', (u'taf_test_7',
 1393516772131899))]))

  ])

>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax 
>>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Resetting a counter in CQL

2014-02-28 Thread Clint Kelly
Folks,

What is the best known method for resetting a counter in CQL?  Is it best
to read the counter and then increment it by a negative amount?  Or to
delete the row and then increment it by zero?

These are the two methods I could come up with.  Both of these seem fine to
me---I'm just wondering if there is a standard way to do this.  Thanks!

Best regards,
Clint


Re: Resetting a counter in CQL

2014-02-28 Thread Tyler Hobbs
On Fri, Feb 28, 2014 at 6:32 PM, Clint Kelly  wrote:

>
>
> What is the best known method for resetting a counter in CQL?  Is it best
> to read the counter and then increment it by a negative amount?


Do this.


>   Or to delete the row and then increment it by zero?
>

Don't do this.  When you delete a counter, you are basically saying "I will
never use this counter again".  If you try to use it again, the behavior is
undefined. It's one of the documented limitations of counters.


-- 
Tyler Hobbs
DataStax 


Re: Resetting a counter in CQL

2014-02-28 Thread Clint Kelly
Great, thanks!


On Fri, Feb 28, 2014 at 4:38 PM, Tyler Hobbs  wrote:

>
> On Fri, Feb 28, 2014 at 6:32 PM, Clint Kelly wrote:
>
>>
>>
>> What is the best known method for resetting a counter in CQL?  Is it best
>> to read the counter and then increment it by a negative amount?
>
>
> Do this.
>
>
>>   Or to delete the row and then increment it by zero?
>>
>
> Don't do this.  When you delete a counter, you are basically saying "I
> will never use this counter again".  If you try to use it again, the
> behavior is undefined. It's one of the documented limitations of counters.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Any way to get a list of per-node token ranges using the DataStax Java driver?

2014-02-28 Thread Tupshin Harper
For the first question, try "select * from system.peers"

http://www.datastax.com/documentation/cql/cql_using/use_query_system_c.html?pagename=docs&version=1.2&file=cql_cli/using/query_system_tables

For the second, there is a JMX and nodetool command, but I'm not aware of
any way to get it directly through CQL.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsGetEndPoints.html

-Tupshin


On Fri, Feb 28, 2014 at 1:27 PM, Clint Kelly  wrote:

> Hi everyone,
>
> I've been working on a rewrite of the Cassandra InputFormat for Hadoop 2
> using the DataStax Java driver instead of the Thrift API.
>
> I have a prototype working now, but there is one bit of code that I have
> not been able to replace with code for the Java driver.  In the
> InputFormat#getSplits method, the old code has a call like the following:
>
>   map = client.describe_ring(ConfigHelper.getInputKeyspace(conf));
>
> This gets a list of the distinct token ranges for the Cassandra cluster.
>
> The rest of "getSplits" then takes these key ranges, breaks them up into
> subranges (to match the user-specified input split size), and then gets the
> replica nodes for the various token ranges (as the locations for the
> splits).
>
> Does anyone know how I can do the following with the native protocol?
>
>- Get the distinct token ranges for the C* cluster
>- Get the set of replica nodes for a given range of tokens?
>
> I tried looking around in Cluster and Metadata, among other places, in the
> API docs, but I didn't see anything that looked like it would do what I
> want.
>
> Thanks!
>
> Best regards,
> Clint
>


Re: Getting the most-recent version from time-series data

2014-02-28 Thread Tupshin Harper
You are correct that with that schema, all data for a give key would be in
a single partition, and hence on the same node(s). I missed that before.

-Tupshin




On Fri, Feb 28, 2014 at 12:47 PM, Clint Kelly  wrote:

> Hi Tupshin,
>
> Thanks for your help once again, I really appreciate it.  Quick question
> regarding the issue of token-aware routing, etc.  Let's say that I am using
> the table described earlier:
>
>
> CREATE TABLE time_series_stuff (
>   key text,
>   family text,
>   version int,
>   val text,
>   PRIMARY KEY (key, family, version)
> ) WITH CLUSTERING ORDER BY (family ASC, version DESC)
>
> I want to retrieve values for the most-recent version of every family for
> a given key, doing something like:
>
> SELECT * from time_series_stuff where key='mykey'
>
> but getting only one version per family.
>
> All of this data should live on the same node (or set of replica nodes),
> correct?  I am specifying the partition key here, and I thought that only
> the partition key determined on what physical nodes data exists.
> Therefore, I would think that all of the results from this query would come
> from a single replica node (or set of replica nodes, if the consistency
> level is greater than 1).
>
> Would you mind clarifying?  Thanks a lot!
>
> Best regards,
> Clint
>
>
>
>
>
>
> On Wed, Feb 26, 2014 at 4:56 AM, Tupshin Harper wrote:
>
>> And one last clarification. Where I said "stored procedure" earlier, I
>> meant "prepared statement". Sorry for the confusion. Too much typing while
>> tired.
>>
>> -Tupshin
>>
>>
>> On Tue, Feb 25, 2014 at 10:36 PM, Tupshin Harper wrote:
>>
>>> I failed to address the matter of not knowing the families in advance.
>>>
>>> I can't really recommend any solution to that other than storing the
>>> list of families in another structure that is readily queryable. I don't
>>> know how many families you are thinking, but if it is in the millions or
>>> more, You might consider constructing another table such as:
>>> CREATE TABLE families (
>>>   key int,
>>>   family text,
>>>   PRIMARY KEY (key, family)
>>> );
>>>
>>>
>>> store your families there, with a knowable set of keys (I suggest
>>> something like the last 3 digits of the md5 hash of the family). So then
>>> you could retrieve your families in nice sized batches
>>> SELECT family FROM id WHERE key=0;
>>> and then do the fan-out selects that I described previously.
>>>
>>> -Tupshin
>>>
>>>
>>> On Tue, Feb 25, 2014 at 10:15 PM, Tupshin Harper wrote:
>>>
 Hi Clint,

 What you are describing could actually be accomplished with the Thrift
 API and a multiget_slice with a slicerange having a count of 1. Initially I
 was thinking that this was an important feature gap between Thrift and CQL,
 and was going to suggest that it should be implemented (possible syntax is
 in https://issues.apache.org/jira/browse/CASSANDRA-6167 which is
 almost a superset of this feature).

 But then I was convinced by some colleagues, that with a modern CQL
 driver that is token aware, you are actually better off (in terms of
 latency, throughput, and reliability), by doing each query separately on
 the client.

 The reasoning is that if you did this with a single query, it would
 necessarily be sent to a coordinator that wouldn't own most of the data
 that you are looking for. That coordinator would then need to fan out the
 read to all the nodes owning the partitions you are looking for.

 Far better to just do it directly on the client. The token aware client
 will send each request for a row straight to a node that owns it. With a
 separate connection open to each node, this is done in parallel from the
 get-go. Fewer hops. Less load on the coordinator. No bottlenecks. And with
 a stored procedure, very very little additional overhead to the client,
 server, or network.

 -Tupshin


 On Tue, Feb 25, 2014 at 7:48 PM, Clint Kelly wrote:

> Hi everyone,
>
> Let's say that I have a table that looks like the following:
>
> CREATE TABLE time_series_stuff (
>   key text,
>   family text,
>   version int,
>   val text,
>   PRIMARY KEY (key, family, version)
> ) WITH CLUSTERING ORDER BY (family ASC, version DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
> cqlsh:fiddle> select * from time_series_stuff ;
>
>  key