[hadoop] Counters in ColumnFamilyOutputFormat?

2011-05-19 Thread Mck
I'd like to investigate using Counters in hadoop using
ColumnFamilyOutputFormat.

But i see that this class uses outdated ..hadoop.arvo classes

Does it make sense to use counters for hadoop output?

If i try rewriting ColumnFamilyOutputFormat and friends should it be to
the normal ..avro classes, or to something else?

~mck 



Re: Inconsistent results using secondary indexes between two DC

2011-05-19 Thread Wojciech Pietrzok
Just checked. Seems to be present in CF on all nodes (in both
datacenters), but are not indexed correctly

On each node I've used sstablekeys for all CF_NAME-f-XX-Data.db files.
In cassandra-cli I've (using node that behaves correctly) made query
get CF_NAME where foo = bar, got correct number of results. Checked
using grep if all the keys are present in the lists returned by
sstablekeys - none was missing, so it seems that the rows are present
on all nodes.
When doing the same query on the nodes in the second DC (using
ConsistencyLevel.ONE) the results are invalid. Sometimes I got 15 rows
(expected, correct number of rows), 3 rows, or 10 rows. What's
interesting every time I get only 3 rows it's the same list of 3 rows
on both affected nodes.


2011/5/17 Jonathan Ellis :
> Nothing comes to mind.
>
> I'd start by using sstable2json to see if the missing rows are in the
> main data CF -- i.e., are they just unindexed, or are they missing
> completely?
>
> On Sun, May 15, 2011 at 4:33 PM, Wojciech Pietrzok  wrote:
>> Hello,
>>
>> I've noticed strange behaviour of Cassandra when using secondary indexes.
>> There are 2 Data Centers, each with 2 nodes, RF=4, on all nodes
>> Cassandra 0.7.5 is installed.
>> When I connect to one of the nodes in DC1 and perform query using
>> secondary indexes ("get ColumnFamily where column = 'foo'" in
>> cassandra-cli) I always get correct number of rows returned, no matter
>> which ConsistencyLevel is set.
>> When I connect to one of the nodes in DC2 and perform same query using
>> ConsistencyLevel LOCAL_QUORUM the results are correct. But using
>> ConsistencyLevel ONE Cassandra doesn't return correct number of rows
>> (it seems that most of the times there some of the rows are missing).
>> Tried running nodetool repair, and nodetool scrub but this doesn't seem to 
>> help.
>>
>> What might the cause of such behaviour?

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 KosciaK     mail: kosci...@gmail.com
                   www : http://kosciak.net/
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


Re: [RELEASE] Apache Cassandra 0.7.6 released

2011-05-19 Thread Sylvain Lebresne
A small error made it's way on my previous mail. The issue related to the
debian package problem is:
https://issues.apache.org/jira/browse/CASSANDRA-2481

--
Sylvain

On Wed, May 18, 2011 at 9:54 PM, Sylvain Lebresne  wrote:
> A small error in the debian setup script made it's way into the debian
> package of 0.7.6
> (more details here: https://issues.apache.org/jira/browse/CASSANDRA-2641).
> We are working on fixing the problem but we must follow the apache
> process and as
> a result this may take a little longer than we would hope.
>
> Note that if you are not using the debian package you can safely
> ignore this mail.
> Otherwise, you may want to wait a little longer before updating.
>
> We will keep you posted as to when this is resolved.
>
>
> PS: For the very impatient, you can also build the package from the
> source after having
> applied the second patch attached with the issue
> (https://issues.apache.org/jira/browse/CASSANDRA-2641).
>
> --
> Sylvain
>
> On Wed, May 18, 2011 at 12:19 PM, Sylvain Lebresne  
> wrote:
>> The Cassandra team is pleased to announce the release of Apache Cassandra
>> version 0.7.6.
>>
>> Cassandra is a highly scalable second-generation distributed database,
>> bringing together Dynamo's fully distributed design and Bigtable's
>> ColumnFamily-based data model. You can read more here:
>>
>>  http://cassandra.apache.org/
>>
>> Downloads of source and binary distributions are listed in our download
>> section:
>>
>>  http://cassandra.apache.org/download/
>>
>> This version is a bug fix release[1,3] and upgrade is highly encouraged.
>>
>> Please always pay attention to the release notes[2] before upgrading,
>> especially if you upgrade from 0.7.2 or before. Upgrade from 0.7.3 or later
>> should be a snap.
>>
>> If you were to encounter any problem, please let us know[4].
>>
>> Have fun!
>>
>>
>> [1]: http://goo.gl/VYZ2e (CHANGES.txt)
>> [2]: http://goo.gl/jMRDE (NEWS.txt)
>> [3]: http://goo.gl/6ohkb (JIRA Release Notes)
>> [4]: https://issues.apache.org/jira/browse/CASSANDRA
>>
>


RE: Cassandra CMS

2011-05-19 Thread Vivek Mishra
Hi,
Additionally please take a look at Kundera.
Kundera is an open source and currently supporting ORM over CASSANDRA, Hbase, 
MongoDB.
Support for REDIS will be there in  future.

https://github.com/impetus-opensource/Kundera

Blogs for reference are:

http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/

http://mevivs.wordpress.com/2011/02/12/hector-kundera/


-Vivek



From: da...@daotown.com [mailto:da...@daotown.com] On Behalf Of David Boxenhorn
Sent: Thursday, May 05, 2011 4:48 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra CMS

I'm looking at Magnolia at the moment (as in, this second). At first glance, it 
looks like I should be able to use Cassandra as the database:

http://documentation.magnolia-cms.com/technical-guide/content-storage-and-structure.html#Persistent_storage

If it can use a filesystem as its database, it can use Cassandra, no?
On Thu, May 5, 2011 at 2:01 PM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:
Would you think of Django as a CMS ?
http://stackoverflow.com/questions/2369793/how-to-use-cassandra-in-django-framework

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 May 2011, at 22:54, Eric tamme wrote:


Does anyone know of a content management system that can be easily
customized to use Cassandra as its database?

(Even better, if it can use Cassandra without customization!)


I think your best bet will be to look for a CMS that uses an ORM for
the storage layer and write a specific ORM for Cassandra that you can
plugin to whatever frame work the CMS uses.

-Eric





Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a 
live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor 
Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '.

Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus 
webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 .


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Support for IN clause

2011-05-19 Thread Vivek Mishra
Does CQL support IN clause?






Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a 
live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor 
Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '.

Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus 
webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 .


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Questions about using MD5 encryption with SimpleAuthenticator

2011-05-19 Thread Ted Zlatanov
On Wed, 18 May 2011 17:16:28 -0700 Sameer Farooqui  
wrote: 

SF> But even SSL/TLS is subject to attacks from tools like SSLSNIFF:
SF> http://www.thoughtcrime.org/software/sslsniff

For perfect security, unplug the server and remove the hard drive.

Ted



Re: Support for IN clause

2011-05-19 Thread Yuki Morishita
Hi,

I think IN clause for SELECT and UPDATE will be supported in v0.8.1.
See https://issues.apache.org/jira/browse/CASSANDRA-2553

2011/5/19 Vivek Mishra :
> Does CQL support IN clause?
>
>
>
>
>
>
>
> 
> Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend
> a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud
> Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud
> ‘.
>
> Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus
> webinar on May 27 by registering at
> http://www.impetus.com/webinar?eventid=42 .
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


java.io.IOError: java.io.EOFException with version 0.7.6

2011-05-19 Thread Rene Kochen
I have some severe problems on our production site.
I created the following test program to reproduce the issue with Cassandra 
0.7.6 (with empty data set).

I use the following data-model

column_metadata: []
name: Customers
column_type: Super
gc_grace_seconds: 60

I have a super-column-family with a single row.
Within this row I have a single super-column.
Within this super-column, I concurrently create, read and delete columns.

I have three threads:

- Do in a loop: add a column to the super-column.
- Do in a loop: delete a random column from the super-column.
- Do in a loop: read the super-column (with all columns).

After running the above threads concurrently, I always receive the following 
error:

ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main]
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown 
Source)
at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source)
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
at 
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
at 
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
at 
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
at 
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
at 
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at 
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
at 
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195)
at org.apache.cassandra.db.Table.getRow(Table.java:324)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readByte(Unknown Source)
at 
org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335)
at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:71)
at 
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
... 30 more


Re: [hadoop] Counters in ColumnFamilyOutputFormat?

2011-05-19 Thread Jonathan Ellis
Avro is there because (1) a long time ago (it now seems) we thought we
were going to move the main RPC layer to Avro and (2) it simplifies
using Avro for Streaming, which also seems misguided now
(https://issues.apache.org/jira/browse/CASSANDRA-1497).

Using "native" Thrift mutations makes the most sense to me now, which
would keep it similar in structure but avoid the avroToThrift copy.

On Thu, May 19, 2011 at 2:30 AM, Mck  wrote:
> I'd like to investigate using Counters in hadoop using
> ColumnFamilyOutputFormat.
>
> But i see that this class uses outdated ..hadoop.arvo classes
>
> Does it make sense to use counters for hadoop output?
>
> If i try rewriting ColumnFamilyOutputFormat and friends should it be to
> the normal ..avro classes, or to something else?
>
> ~mck
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: java.io.IOError: java.io.EOFException with version 0.7.6

2011-05-19 Thread Sylvain Lebresne
Would you have a simple script to reproduce the issue ?
And could you open a JIRA ticket.

Sylvain

On Thu, May 19, 2011 at 4:22 PM, Rene Kochen
 wrote:
> I have some severe problems on our production site.
> I created the following test program to reproduce the issue with Cassandra
> 0.7.6 (with empty data set).
>
> I use the following data-model
>
> column_metadata: []
> name: Customers
> column_type: Super
> gc_grace_seconds: 60
>
> I have a super-column-family with a single row.
> Within this row I have a single super-column.
> Within this super-column, I concurrently create, read and delete columns.
>
> I have three threads:
>
> - Do in a loop: add a column to the super-column.
> - Do in a loop: delete a random column from the super-column.
> - Do in a loop: read the super-column (with all columns).
>
> After running the above threads concurrently, I always receive the following
> error:
>
> ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main]
> java.io.IOError: java.io.EOFException
>         at
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
>         at
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
>         at
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
>         at
> java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source)
>         at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source)
>         at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
>         at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
>         at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
>         at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
>         at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
>         at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>         at
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
>         at
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
>         at
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
>         at
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
>         at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
>         at
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195)
>         at org.apache.cassandra.db.Table.getRow(Table.java:324)
>         at
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
>         at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>         at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.EOFException
>         at java.io.RandomAccessFile.readByte(Unknown Source)
>         at
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
>         at
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335)
>         at
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:71)
>         at
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
>         ... 30 more


Re: java.io.IOError: java.io.EOFException with version 0.7.6

2011-05-19 Thread Jonathan Ellis
It would be useful to post your program.

On Thu, May 19, 2011 at 9:22 AM, Rene Kochen
 wrote:
> I have some severe problems on our production site.
> I created the following test program to reproduce the issue with Cassandra
> 0.7.6 (with empty data set).
>
> I use the following data-model
>
> column_metadata: []
> name: Customers
> column_type: Super
> gc_grace_seconds: 60
>
> I have a super-column-family with a single row.
> Within this row I have a single super-column.
> Within this super-column, I concurrently create, read and delete columns.
>
> I have three threads:
>
> - Do in a loop: add a column to the super-column.
> - Do in a loop: delete a random column from the super-column.
> - Do in a loop: read the super-column (with all columns).
>
> After running the above threads concurrently, I always receive the following
> error:
>
> ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main]
> java.io.IOError: java.io.EOFException
>         at
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
>         at
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
>         at
> org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
>         at
> java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source)
>         at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source)
>         at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
>         at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
>         at
> org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
>         at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
>         at
> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
>         at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
>         at
> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
>         at
> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
>         at
> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
>         at
> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
>         at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
>         at
> org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195)
>         at org.apache.cassandra.db.Table.getRow(Table.java:324)
>         at
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
>         at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>         at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.EOFException
>         at java.io.RandomAccessFile.readByte(Unknown Source)
>         at
> org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324)
>         at
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335)
>         at
> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:71)
>         at
> org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
>         ... 30 more



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: [hadoop] Counters in ColumnFamilyOutputFormat?

2011-05-19 Thread Jeremy Hanna
FWIW, as I mentioned in the 1497 comments, the patch makes it abstract so that 
you can have any rpc/marshalling format you want with a simple extension point. 
 So if we want to move to something besides avro, or even like I mentioned do 
something with Dumbo for streaming, it's easy to extend.

On May 19, 2011, at 9:23 AM, Jonathan Ellis wrote:

> Avro is there because (1) a long time ago (it now seems) we thought we
> were going to move the main RPC layer to Avro and (2) it simplifies
> using Avro for Streaming, which also seems misguided now
> (https://issues.apache.org/jira/browse/CASSANDRA-1497).
> 
> Using "native" Thrift mutations makes the most sense to me now, which
> would keep it similar in structure but avoid the avroToThrift copy.
> 
> On Thu, May 19, 2011 at 2:30 AM, Mck  wrote:
>> I'd like to investigate using Counters in hadoop using
>> ColumnFamilyOutputFormat.
>> 
>> But i see that this class uses outdated ..hadoop.arvo classes
>> 
>> Does it make sense to use counters for hadoop output?
>> 
>> If i try rewriting ColumnFamilyOutputFormat and friends should it be to
>> the normal ..avro classes, or to something else?
>> 
>> ~mck
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com



selecting data

2011-05-19 Thread karim abbouh
i'm new using cassandra database,
i want to get data as in relationnel database:
select * from table where field="value";
i see using CLI we have just the followings commands :
get .['']  Get a slice of columns.
get .['']['']   Get a slice of sub columns.
get .[''][''] Get a column value.
get .['']['']['']  Get a sub column value.

is there a way for that.
i think using java API is possible.
cassandra version : 6.0.12


thanks for help

RE: selecting data

2011-05-19 Thread Vivek Mishra
You need to use CQL.
Additionally JDBC driver on top of CQL is part of beta release.

From: karim abbouh [karim_...@yahoo.fr]
Sent: 19 May 2011 21:41
To: user@cassandra.apache.org
Subject: selecting data

i'm new using cassandra database,
i want to get data as in relationnel database:
select * from table where field="value";
i see using CLI we have just the followings commands :
get .['']  Get a slice of columns.
get .['']['']   Get a slice of sub columns.
get .[''][''] Get a column value.
get .['']['']['']  Get a sub column value.

is there a way for that.
i think using java API is possible.
cassandra version : 6.0.12


thanks for help






Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a 
live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor 
Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '.

Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus 
webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 .


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re : selecting data

2011-05-19 Thread karim abbouh
thanks,
but is there a way using just CLI and storage-conf.xml?





De : Vivek Mishra 
À : "user@cassandra.apache.org" 
Envoyé le : Jeu 19 mai 2011, 17h 23min 15s
Objet : RE: selecting data

 
You need to use CQL.
Additionally JDBC driver on top of CQL is part of beta release.



 
From: karim abbouh [karim_...@yahoo.fr]
Sent: 19 May 2011 21:41
To: user@cassandra.apache.org
Subject: selecting data


i'm new using cassandra database,
i want to get data as in relationnel database:
select * from table where field="value";
i see using CLI we have just the followings commands :
get .['']  Get a slice of columns.
get .['']['']   Get a slice of sub columns.
get .[''][''] Get a column value.
get .['']['']['']  Get a sub column value.

is there a way for that.
i think using java API is possible.
cassandra version : 6.0.12


thanks for help





 
Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a 
live session by Head of Impetus Labs on ‘Secrets of Building a Cloud Vendor 
Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud ‘. 


Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus 
webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 .


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the 
named addressee. If received in error, please destroy and notify the sender. 
Any 
use of this email  is prohibited when received in error. Impetus does not 
represent, warrant and/or guarantee, that the integrity of this communication 
has been maintained nor that the communication is free of errors, virus, 
interception or interference.

Re: selecting data

2011-05-19 Thread Watanabe Maki
Cassandra is not a RDBMS. Only you can do is search on a key, or you need full 
scan.
You need to design your schema carefully as your application needs.


On 2011/05/20, at 1:11, karim abbouh  wrote:

> i'm new using cassandra database,
> i want to get data as in relationnel database:
> select * from table where field="value";
> i see using CLI we have just the followings commands :
> get .['']  Get a slice of 
> columns.
> get .['']['']   Get a slice of sub 
> columns.
> get .[''][''] Get a column 
> value.
> get .['']['']['']  Get a sub column 
> value.
> 
> is there a way for that.
> i think using java API is possible.
> cassandra version : 6.0.12
> 
> 
> thanks for help
> 
> 
> 


Re: Inconsistent results using secondary indexes between two DC

2011-05-19 Thread mcasandra
I am wondering if running nodetool repair will help in anyway

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Inconsistent-results-using-secondary-indexes-between-two-DC-tp632p6382819.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Commitlog Disk Full

2011-05-19 Thread Mike Malone
Just noticed this thread and figured I'd chime in since we've had similar
issues with the commit log growing too large on our clusters. Tuning down
the flush timeout wasn't really an acceptable solution for us since we
didn't want to be constantly flushing and generating extra SSTables for no
reason. So we wrote a small tool that we start in a static block in
CassandraServer that periodically checks the commit log size and flushes all
memtables if they're above some threshold.

I've attached that code. Any feedback / improvements are more than welcome.

Mike

On Thu, May 12, 2011 at 11:30 AM, Sanjeev Kulkarni wrote:

> Hey guys,
> I have a ec2 debian cluster consisting of several nodes running 0.7.5 on
> ephimeral disks.
> These are fresh installs and not upgrades.
> The commitlog is set to the smaller of the disks which is around 10G in
> size and the datadir is set to the bigger disk.
> The config file is basically the same as the one supplied by the default
> installation.
> Our applications write to the cluster. After about a day of writing we
> started noticing the commitlog disk filling up. Soon we went over the disk
> limit and writes started failing. At this point we stopped the cluster.
> Over the course of the day we inserted around 25G of data. Our columns
> values are pretty small.
> I understand that cassandra periodically cleans up the commitlog
> directories by generating sstables in datadir. Is there any way to speed up
> this movement from commitog to datadir?
> Thanks!
>
>


PeriodicMemtableFlusher.java
Description: Binary data


Re: selecting data

2011-05-19 Thread Jonathan Ellis
only 0.7+ has index support

On Thu, May 19, 2011 at 11:38 AM, Watanabe Maki  wrote:
> Cassandra is not a RDBMS. Only you can do is search on a key, or you need
> full scan.
> You need to design your schema carefully as your application needs.
>
>
> On 2011/05/20, at 1:11, karim abbouh  wrote:
>
> i'm new using cassandra database,
> i want to get data as in relationnel database:
> select * from table where field="value";
> i see using CLI we have just the followings commands :
> get .['']  Get a slice of
> columns.
> get .['']['']   Get a slice of sub
> columns.
> get .[''][''] Get a column
> value.
> get .['']['']['']  Get a sub column
> value.
>
> is there a way for that.
> i think using java API is possible.
> cassandra version : 6.0.12
>
>
> thanks for help
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Commitlog Disk Full

2011-05-19 Thread Jonathan Ellis
That's basically the approach I want to take in
https://issues.apache.org/jira/browse/CASSANDRA-2427.

On Thu, May 19, 2011 at 12:00 PM, Mike Malone  wrote:
> Just noticed this thread and figured I'd chime in since we've had similar
> issues with the commit log growing too large on our clusters. Tuning down
> the flush timeout wasn't really an acceptable solution for us since we
> didn't want to be constantly flushing and generating extra SSTables for no
> reason. So we wrote a small tool that we start in a static block in
> CassandraServer that periodically checks the commit log size and flushes all
> memtables if they're above some threshold.
> I've attached that code. Any feedback / improvements are more than welcome.
>
> Mike
>
> On Thu, May 12, 2011 at 11:30 AM, Sanjeev Kulkarni 
> wrote:
>>
>> Hey guys,
>> I have a ec2 debian cluster consisting of several nodes running 0.7.5 on
>> ephimeral disks.
>> These are fresh installs and not upgrades.
>> The commitlog is set to the smaller of the disks which is around 10G in
>> size and the datadir is set to the bigger disk.
>> The config file is basically the same as the one supplied by the default
>> installation.
>> Our applications write to the cluster. After about a day of writing we
>> started noticing the commitlog disk filling up. Soon we went over the disk
>> limit and writes started failing. At this point we stopped the cluster.
>> Over the course of the day we inserted around 25G of data. Our columns
>> values are pretty small.
>> I understand that cassandra periodically cleans up the commitlog
>> directories by generating sstables in datadir. Is there any way to speed up
>> this movement from commitog to datadir?
>> Thanks!
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re : selecting data

2011-05-19 Thread karim abbouh
for example in 0.7 how we use index ?





De : Jonathan Ellis 
À : user@cassandra.apache.org
Envoyé le : Jeu 19 mai 2011, 18h 04min 48s
Objet : Re: selecting data

only 0.7+ has index support

On Thu, May 19, 2011 at 11:38 AM, Watanabe Maki  wrote:
> Cassandra is not a RDBMS. Only you can do is search on a key, or you need
> full scan.
> You need to design your schema carefully as your application needs.
>
>
> On 2011/05/20, at 1:11, karim abbouh  wrote:
>
> i'm new using cassandra database,
> i want to get data as in relationnel database:
> select * from table where field="value";
> i see using CLI we have just the followings commands :
> get .['']  Get a slice of
> columns.
> get .['']['']   Get a slice of sub
> columns.
> get .[''][''] Get a column
> value.
> get .['']['']['']  Get a sub column
> value.
>
> is there a way for that.
> i think using java API is possible.
> cassandra version : 6.0.12
>
>
> thanks for help
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Re : selecting data

2011-05-19 Thread Sasha Dolgy
http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

On Thu, May 19, 2011 at 7:10 PM, karim abbouh  wrote:
> for example in 0.7 how we use index ?


Re: Cassandra CMS

2011-05-19 Thread Edward Capriolo
On Thu, May 19, 2011 at 8:11 AM, Vivek Mishra wrote:

>  Hi,
>
> Additionally please take a look at Kundera.
>
> Kundera is an open source and currently supporting ORM over CASSANDRA,
> Hbase, MongoDB.
>
> Support for REDIS will be there in  future.
>
>
>
> https://github.com/impetus-opensource/Kundera
>
>
>
> Blogs for reference are:
>
>
>
> http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/
>
>
>
> http://mevivs.wordpress.com/2011/02/12/hector-kundera/
>
>
>
>
>
> -Vivek
>
>
>
>
>
>
>
> *From:* da...@daotown.com [mailto:da...@daotown.com] *On Behalf Of *David
> Boxenhorn
> *Sent:* Thursday, May 05, 2011 4:48 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra CMS
>
>
>
> I'm looking at Magnolia at the moment (as in, this second). At first
> glance, it looks like I should be able to use Cassandra as the database:
>
>
> http://documentation.magnolia-cms.com/technical-guide/content-storage-and-structure.html#Persistent_storage
>
> If it can use a filesystem as its database, it can use Cassandra, no?
>
> On Thu, May 5, 2011 at 2:01 PM, aaron morton 
> wrote:
>
> Would you think of Django as a CMS ?
>
>
> http://stackoverflow.com/questions/2369793/how-to-use-cassandra-in-django-framework
>
>
>
> Cheers
>
>
>
> -
>
> Aaron Morton
>
> Freelance Cassandra Developer
>
> @aaronmorton
>
> http://www.thelastpickle.com
>
>
>
> On 5 May 2011, at 22:54, Eric tamme wrote:
>
>
>
>   Does anyone know of a content management system that can be easily
>
>  customized to use Cassandra as its database?
>
>
>
>  (Even better, if it can use Cassandra without customization!)
>
>
>
>
> I think your best bet will be to look for a CMS that uses an ORM for
> the storage layer and write a specific ORM for Cassandra that you can
> plugin to whatever frame work the CMS uses.
>
> -Eric
>
>
>
>
>
> --
>
> Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend
> a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud
> Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud
> ‘.
>
> Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus
> webinar on May 27 by registering at
> http://www.impetus.com/webinar?eventid=42 .
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

What is the status of Kundera supporting newer versions of the Cassandra
API? Last I checked the code was still built around 0.6.X.

Edward


Re: [hadoop] Counters in ColumnFamilyOutputFormat?

2011-05-19 Thread Mick Semb Wever
On Thu, 2011-05-19 at 09:23 -0500, Jonathan Ellis wrote:
> 
> Using "native" Thrift mutations makes the most sense to me now, which
> would keep it similar in structure but avoid the avroToThrift copy. 

https://issues.apache.org/jira/browse/CASSANDRA-2667


-- 
"Everything you can imagine is real." Pablo Picasso 
| http://semb.wever.org | http://sesat.no
| http://tech.finn.no   | Java XSS Filter


signature.asc
Description: This is a digitally signed message part


Re: Exception when starting

2011-05-19 Thread Brandon Williams
On Thu, May 19, 2011 at 1:51 PM, Eranda Sooriyabandara
<0704...@gmail.com> wrote:
> Hi devs,
> I tried to start the Apache Cassandra and got an exception. This is what log
> says,
>
> INFO 00:18:12,226 Logging initialized
>  INFO 00:18:12,278 Heap size: 1029701632/1029701632
>  INFO 00:18:12,281 JNA not found. Native methods will be disabled.
>  INFO 00:18:12,294 Loading settings from
> file:/home/eranda/Desktop/cassendra/apache-cassandra-0.7.6/conf/cassandra.yaml
>  INFO 00:18:12,452 DiskAccessMode 'auto' determined to be standard,
> indexAccessMode is standard
>  INFO 00:18:12,621 reading saved cache
> /var/lib/cassandra/saved_caches/system-IndexInfo-KeyCache
> ERROR 00:18:12,622 Exception encountered during startup.
> java.lang.NegativeArraySizeException

Moving to user@.  Your keycache is corrupt, just rm
/var/lib/cassandra/saved_caches/* and try again.

-Brandon


Re: Unable to add columns to empty row in Column family: Cassandra

2011-05-19 Thread anuya joshi
Thank you Narendra/Aaron. Sorry for late respponse.  PFB further information
on this.

1) How do you delete the data in the cli ? Is it a row delete e.g. del
MyCF['my-key'];
*[Anuya]:* Yeach. Exactly the same way.

2) What client are you using the insert the row the second time ? e.g.
custom thrift wrapper or pycassa
*[Anuya]:* I am using Thrift APIs in JAVA code directly. No high level
Cassandra client. I am using Cassandra client's INSERT and REMOVE methods
to insert and delete rows programmatically.

3) How is the second read done, via the cli ?
*[Anuya]:* Operations are as follows:
  1) INSERT #1: Programmatically
  2) DELETE #1: Command line
  3) INSERT #2: Programmatically

A READ opeation, after each of the above steps, is done using CLI

4) Does the same test work when you only use your app ?
*[Anuya]:* Exactly, INSERT -> DELETE -> INSERT scenario with same
row key works well if executed only from command line OR  only
programmatically. Basically, over here, I am trying to resuse the row key.
So, I create a row with a specific key, delete that row (But, Cassandra
delete does not delete a row key.  Only deletes all columns in a row.) and
then I want to insert the columns in a same row with same row id.  Correct
me if I go wrong, reusing the row key should work as per Cassandra data
model definition/structure.

5) Cassandra-cli will be using the current time as it's time stamp for the
delete. If I had to guess what was happening it would be a problem with the
timestamps your app is creating.
*[Anuya]:* Well, regarding timestamps..
My JAVA code to generate timstamps is simply: System.currentTimeMillis();
So, I also use current time as a timestamp similar to Cassandra as you
mentioned. Then, irrespective of which client was used for insert/delete
operations, this INSERT -> DELETE -> INSERT scenario should work. Because,
as per the sequence of these operations (INSERT -> DELETE -> INSERT) the
timestamp condition *TS[INSERT #2] > TS[DELETE  #1] > TS[INSERT  #1] will be
obviously satisfied. *
**
But, the fact is, this scenario does not work while switching between
clients for INSERT and DELETE opeations as mentioned in point #3 above.

So, is this a clock synchronization issue? I mean the clock used by
program to generate timstamps is out of sync with clock used by CLI?

On this, FYI, I am running linux based VMs which in turn run a Cassandra
servers. The command line client is obviously on VM and the JAVA program is
on host machine running a VM.  If the clocks of these two machines are in
sync then, I think, switching between clients should not matter?

Before I hit the send button, :), I scrutinized the clocks on VM and
host m/c. The clock on VM is exactly 4 seconds behind the clock on host m/c.

I welcome your comments on above.

   Thanks,
   Anuya










On Thu, May 12, 2011 at 4:31 PM, Narendra Sharma
wrote:

> Can u share the code?
>
>
> On Mon, May 2, 2011 at 11:34 PM, anuya joshi  wrote:
>
>> Hello,
>>
>> I am using Cassandra for my application.My Cassandra client uses Thrift
>> APIs directly. The problem I am facing currently is as follows:
>>
>> 1) I added a row and columns in it dynamically via Thrift API Client
>> 2) Next, I used command line client to delete row which actually deleted
>> all the columns in it, leaving empty row with original row id.
>> 3) Now, I am trying to add columns dynamically using client program into
>> this empty row with same row key
>> However, columns are not being inserted.
>> But, when tried from command line client, it worked correctly.
>>
>> Any pointer on this would be of great use
>>
>> Thanks in  advance,
>>
>> Regards,
>> Anuya
>>
>
>
>
> --
> Narendra Sharma
> Solution Architect
> *http://www.persistentsys.com*
> *http://narendrasharma.blogspot.com/*
>
>
>


Re: java.io.IOError: java.io.EOFException with version 0.7.6

2011-05-19 Thread kochen
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/file/n6383644/CassandraIssue.zip
CassandraIssue.zip 

This is the test program (.NET 4)

I use Cassandra 0.7.6 with standard yaml and this super-column-family:

create column family Customers
with column_type = 'Super' 
and comparator = 'BytesType'
and gc_grace = 60;

In the program: set cassandra IP, keyspace and super-column-family and press
start.
Run program on empty database for about 30 minutes and exception should pop
up in cassandra log.

I receive the following exceptions:

java.io.IOError:
org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid
column name length 0
at
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
at
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
at
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown
Source)
at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283)
at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
at
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1385)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1262)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1190)
at org.apache.cassandra.db.Table.getRow(Table.java:324)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException:
invalid column name length 0
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73)
at
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
... 30 more

java.io.IOError: java.io.EOFException
at
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
at
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
at
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown
Source)
at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.cas

Re: How to configure internode encryption in 0.8.0?

2011-05-19 Thread Sameer Farooqui
Thanks, Jeremy!

Nirmal, any advice on how to generate the key/trust stores with the correct
cipher?

- Sameer


On Wed, May 18, 2011 at 8:10 AM, Jeremy Hanna wrote:

> I'll CC Nirmal Ranganathan who implemented the internode encryption who
> might be able to give you some advice on this.
>
> On May 17, 2011, at 7:47 PM, Sameer Farooqui wrote:
>
> > Thanks for the link, Jeremy.
> >
> > I generated the keystore and truststore for inter-node communication
> using the link in the YAML file:
> >
> http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore
> >
> > Unfortunately, the default instructions in the above link used
> TLS_RSA_WITH_AES_256_CBC_SHA. So, when I start Cassandra now, I get this
> error:
> >
> > ERROR 00:10:38,734 Exception encountered during startup.
> > java.lang.IllegalArgumentException: Cannot support
> TLS_RSA_WITH_AES_256_CBC_SHA   with currently installed providers
> > at
> com.sun.net.ssl.internal.ssl.CipherSuiteList.(CipherSuiteList.j
>  ava:79)
> > at
> com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.setEnabledCipherSuit
>  es(SSLServerSocketImpl.java:166)
> > at
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.j
>  ava:55)
> >
> >
> > The YAML file states that the cipher suite for authentication should be:
> TLS_RSA_WITH_AES_128_CBC_SHA.
> >
> > This is my first time using keytool and I've searched the web to see how
> I can change the cipher from AES_256 to AES_128, but haven't found the
> answer.
> >
> > Anyone know how to change the cipher to AES_128?
> >
> > Here are the commands I used to generate the non-working keystore and
> truststore:
> >
> > 1) keytool -genkeypair -alias jdoe -keyalg RSA -validity 7 -keystore
> .keystore
> > 2) keytool -list -v -keystore .keystore
> > 3) keytool -export -alias jdoe -keystore .keystore -rfc -file jdoe.cer
> > 4) cat jdoe.cer
> > 5) keytool -import -alias jdoecert -file jdoe.cer -keystore .truststore
> > 6) keytool -list -v -keystore .truststore
> >
> >
> > - Sameer
> >
> > On Mon, May 16, 2011 at 5:35 PM, Jeremy Hanna <
> jeremy.hanna1...@gmail.com> wrote:
> > Take a look at cassandra.yaml in your 0.8 download at the very bottom.
>  There are docs and examples there.
> > e.g.
> http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.8.0-beta2/conf/cassandra.yaml
> >
> > On May 16, 2011, at 6:36 PM, Sameer Farooqui wrote:
> >
> > > I understand that 0.8.0 has configurable internode encryption
> (CASSANDRA-1567, 2152).
> > >
> > > I haven't been able to find any info on how to configure it though on
> this mailing list or the Datastax website.
> > >
> > > Can somebody point me towards how to set this up?
> > >
> > > - Sameer
> >
> >
>
>


Re: How to configure internode encryption in 0.8.0?

2011-05-19 Thread Christopher Deutsch
On Tue, May 17, 2011 at 5:47 PM, Sameer Farooqui
 wrote:
>
> Unfortunately, the default instructions in the above link 
> used TLS_RSA_WITH_AES_256_CBC_SHA. So, when I start Cassandra now, I get this 
> error:
> ERROR 00:10:38,734 Exception encountered during startup.
> java.lang.IllegalArgumentException: Cannot support 
> TLS_RSA_WITH_AES_256_CBC_SHA       with currently installed providers
>         at 
> com.sun.net.ssl.internal.ssl.CipherSuiteList.(CipherSuiteList.j      
> ava:79)
>         at 
> com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.setEnabledCipherSuit      
> es(SSLServerSocketImpl.java:166)
>         at 
> org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.j      
> ava:55)

You might want to double check that you have  the Java policy files
installed that allow strong crypto-- I found I needed to install them
when working on our encrypted Cassandra setup. They're available for
download from

http://www.oracle.com/technetwork/java/javase/downloads/index.html

Look for the link to "Java Cryptography Extension (JCE) Unlimited
Strength Jurisdiction Policy Files 6".

--
Christopher Deutsch 


Can I use secondary index with any partitioner

2011-05-19 Thread Dave Rav
Can I use secondary index with any partitioner 
 
1) RandomPartitioner
2) ByteOrderedPartitioner

Re: Cassandra Vs. Oracle Coherence

2011-05-19 Thread aaron morton
I've not used Coherence, all I know if from reading the first paragraph here 
http://www.oracle.com/technetwork/middleware/coherence/overview/index.html and 
wikipedia:) 

Cassandra is not a caching layer, it's a database. You can rely on it as your 
database, and some people find when they do they no longer need a caching 
layer. 

If you can provide some more details on your use case we can help you judge if 
cassandra is a good fit. Or if you can tell us which features of Coherence you 
think are a good fit for you we can see if there are Cassandra equivalents. 

And is should also say:

- cassandra is free
- if you have a good idea about how it can be improved, there is a chance it 
may get implemented. And if you can provide some dev time an even better 
chance. 
- you can get  free support from the community 
- there are multiple professional services companies that provide support 
http://wiki.apache.org/cassandra/ThirdPartySupport
 
Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18 May 2011, at 10:44, Karamel, Raghu wrote:

> Hi,
>  
> I am new to Cassandra and very excited with the technology. I am evaluating 
> it and trying to understand the difference between Cassandra and Oracle 
> Coherence. Precisely , looking for reasons why would some select Cassandra 
> over Oracle Coherence. Does anyone did the exercise of comparing them? 
> Appreciate if you can share some information on that.
>  
> Regrads
> -RK



Re: Knowing when there is a *real* need to add nodes

2011-05-19 Thread aaron morton
Considering disk usage is a tricky one. Compacted SSTables files will remain on 
disk until either there is not enough space, or the JVM GC runs. To measure the 
live space use the "Space used (live)" from the CFStats. "Space used (total)" 
includes the space which has been compacted and not yet deleted from disk. 

The data in deleted columns *may* be purged from disk during a minor or major 
compaction. This can happen before GCGraceSeconds has expired. It is only the 
Tombstone that must be kept around for at least GCGraceSeconds. 

I agree that 50% utilisation on the data directories is a sensible soft limit 
that will help keep you out of trouble. The space needed by the compaction 
depends on which bucket of files it is compacting,  but it will always require 
at least as much free disk space as the files it is compacting. That should 
also leave headroom for adding new nodes, just in case. Ideally when adding new 
nodes existing nodes only stream data to the new nodes. If however you are 
increasing the node count by less than a factor of 2 you may need to make 
multiple moves and the nodes may need additional space.   

To gauge the throughout I would also look at the Latency trackers on the 
o.a.c.db.StorageProxy MBean. They track the latency of complete requests 
including talking to the rest of the cluster. The metrics on the individual 
column families are concerned with the local read. 

For the pending TP stats I would guess that for the read and write pools a 
pending value consistently higher than the number of threads assigned (in the 
config) would be something to investigate. Waiting on these stages will be 
reflected in the StorageProxy latency numbers.  HintedHandoff, StreamStage and 
AntiEntropyStage will have tasks that staying the pending queue for a while. 
AFAIK the other pools should not have many (< 10) tasks in the pending queue 
and should be able to clearing the pending queue.  

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 18 May 2011, at 19:50, Tomer B wrote:

> As for static disk usage i would add this:
> 
> test: df -kh
> description: run test after compaction (check GCGraceSeconds in 
> storage-conf.xml) as only then data is expunged permanently, run on data 
> disk, assuming here commitlog disk is separated from data dir.
> green gauge: used_space < 30% of disk capacity
> yellow gauge: used space 30% - 50% of disk capacity
> red gauge: used_space > 50% of disk capacity
> comments: Compactions can require up to 100% of in use space temporarily in 
> worst case (data file dir) when approaching 50% or more of disk capacity use 
> raid0 for data dir disk if cannot try increasing your disk if cannot consider 
> adding nodes (or first consider adding nodes if that's what you wish).
> 
> 2011/5/12 Watanabe Maki 
> It's interesting topic for me too.
> How about to add measurement on static disk utilization (% used) and memory 
> utilization ( rss, JVM heap, JVM GC )?
> 
> maki
> 
> From iPhone
> 
> 
> On 2011/05/12, at 0:49, Tomer B  wrote:
>  
> > Hi
> >
> > I'm trying to predict when my cluster would soon be needing new nodes
> > added, i want a continuous graph telling my of my cluster health so
> > that when i see my cluster becomes more and more busy (I want numbers
> > & measurments) i would be able to know i need to start purchasing more
> > machines and get them into my cluster, so i want to know of that
> > beforehand.
> > I'm writing here what I came with after doing some research over net.
> > I would highly appreciate any additional gauge measurements and ranges
> > in order to test my cluster health and to know beforehand when i'm
> > going to soon need more nodes.Although i'm writing down green
> > gauge,yellow gauge,red gauge, i'm also trying to find a continuous
> > graph where i can tell where our cluster stand (as much as
> > possible...)
> >
> > Also my recommendation is always before adding new nodes:
> >
> > 1. Make sure all nodes are balanced and if not balance them.
> > 2. Separate commit log drive from data (SSTables) drive
> > 3. use mmap index only in memory and not auto
> > 4. Increase disk IO if possible.
> > 5. Avoid swapping as much as possible.
> >
> >
> > As for my gauge tests for when to add new nodes:
> >
> > test: nodetool tpstats -h 
> > green gauge: No pending column with number higher
> > yellow gauge: pending columns 100-2000
> > red gauge:Larger than 3000
> >
> > test: iostat -x -n -p -z 5 10  and iostat -xcn 5
> > green gauge: kw/s + kr/s reaches is below 25% capacity of disk io
> > yellow gauge: 20%-50%
> > red gauge: 50%+
> >
> > test: ostat -x -n -p -z 5 10 and check %b column
> > green gauge: less than 10%
> > yellow gauge:  10%-80%
> > red gauge: 90%+
> >
> > test: nodetool cfstats --host localhost
> > green gauge: “SSTable count” item does not continually grow over time
> > yellow gauge:
> > red gauge: “SSTable count” item continually grows over time

Re: Recommandation on how to organize CF

2011-05-19 Thread aaron morton
I'm a bit confused by your examples. I think you are saying...

- Standard CF called Message using the UTF8Type for column comparisons used to 
store the individual messages. Row key is the message UUID. Not sure what the 
columns are. 
- Standard CF called MessageTime using TimeUUIDType for columns comparison uses 
to store collections of messages. Row key is "messagelist:" 
for a message list, and "messagebox::" for message box. 
Not sure what the columns are. 

The best model is going to be the one that supports your read requests and the 
volume of data your are expecting. 

One way to go is to de normalise to support very fast read paths. You could 
store the entire message in one column using something like JSON to serialise 
it. Then

- MessageIndexes standard CF to store the full messages in context, there are 
three different types of rows:
* keys with   store all messages for a user, column name is 
the message TimeUUID and value is the message structure
* keys with / store the messages for a single 
message box. Columns same as below. 
* keys with // store the messages in 
a single message list. Columns as above. 

- MessageFolders CF to store the message box and message lists, two approaches:
1)  as key and each column is a message box, message lists 
are stored in a single column as JSON
2)  row for the top level message box, column for each 
message box. / for the next level, 

Or if space is a concern just store the UUID of the message in the index CF and 
add a CF to store the messages. 

It also going to depend on the management features, e.g. can you rename a 
message box / list ? Move messages around ? If so the de normalised pattern may 
not be the best as those operations will take longer. 

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19 May 2011, at 05:44, openvictor Open wrote:

> Hello all,
> 
> I know organization is a broad topic and everybody may have an idea on how to 
> do it, but I really want to have some advices and opinions and I think it 
> could be interesting to discuss this matter.
> 
> Here is my problem: I am designing a messaging system internal to a website. 
> There are 3 big structures which are Message, MessageList, MessageBox. A 
> message/messagelist is identified only by an UUID; a MessageBox is identified 
> by a name(utf8 string). A messagebox has a set of MessageList in it and a 
> messagelist has a set of message in it, all of them being UUIDs.
> Currently I have only two CF : message and message_time. Message is a 
> UTF8Type (cassandra 0.6.11, soon going for 0.8) and message_time is a 
> TimeUUIDType.
> 
> For example if I want to request all message in a certain messagelist I do : 
> message_time['messagelist:uuid(messagelist)']
> If I want information of a mesasge I do message['message:uuid(message)']
> If I want all messagelist for a certain messagebox ( called nameofbox for 
> user openvictor for this example) I do : 
> message_time['messagebox:openvictor:nameofbox']
> 
> My question to Cassandra users is : is it a good idea to regroup all those 
> things into two CF ? Is there some advantages / drawbacks of this two CFs and 
> for long term should I change my organization ?
> 
> Thank you,
> Victor



Re: Support for IN clause

2011-05-19 Thread aaron morton
Language spec is here
https://github.com/apache/cassandra/blob/trunk/doc/cql/CQL.textile
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 20 May 2011, at 01:42, Yuki Morishita wrote:

> Hi,
> 
> I think IN clause for SELECT and UPDATE will be supported in v0.8.1.
> See https://issues.apache.org/jira/browse/CASSANDRA-2553
> 
> 2011/5/19 Vivek Mishra :
>> Does CQL support IN clause?
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend
>> a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud
>> Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud
>> ‘.
>> 
>> Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus
>> webinar on May 27 by registering at
>> http://www.impetus.com/webinar?eventid=42 .
>> 
>> 
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>> 
> 
> 
> 
> -- 
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)



Re: Exception when starting

2011-05-19 Thread Eranda Sooriyabandara
Thanks Brandon,
All data in /var/lib/cassandra was corrupted and after removing them it
started normally.