[hadoop] Counters in ColumnFamilyOutputFormat?
I'd like to investigate using Counters in hadoop using ColumnFamilyOutputFormat. But i see that this class uses outdated ..hadoop.arvo classes Does it make sense to use counters for hadoop output? If i try rewriting ColumnFamilyOutputFormat and friends should it be to the normal ..avro classes, or to something else? ~mck
Re: Inconsistent results using secondary indexes between two DC
Just checked. Seems to be present in CF on all nodes (in both datacenters), but are not indexed correctly On each node I've used sstablekeys for all CF_NAME-f-XX-Data.db files. In cassandra-cli I've (using node that behaves correctly) made query get CF_NAME where foo = bar, got correct number of results. Checked using grep if all the keys are present in the lists returned by sstablekeys - none was missing, so it seems that the rows are present on all nodes. When doing the same query on the nodes in the second DC (using ConsistencyLevel.ONE) the results are invalid. Sometimes I got 15 rows (expected, correct number of rows), 3 rows, or 10 rows. What's interesting every time I get only 3 rows it's the same list of 3 rows on both affected nodes. 2011/5/17 Jonathan Ellis : > Nothing comes to mind. > > I'd start by using sstable2json to see if the missing rows are in the > main data CF -- i.e., are they just unindexed, or are they missing > completely? > > On Sun, May 15, 2011 at 4:33 PM, Wojciech Pietrzok wrote: >> Hello, >> >> I've noticed strange behaviour of Cassandra when using secondary indexes. >> There are 2 Data Centers, each with 2 nodes, RF=4, on all nodes >> Cassandra 0.7.5 is installed. >> When I connect to one of the nodes in DC1 and perform query using >> secondary indexes ("get ColumnFamily where column = 'foo'" in >> cassandra-cli) I always get correct number of rows returned, no matter >> which ConsistencyLevel is set. >> When I connect to one of the nodes in DC2 and perform same query using >> ConsistencyLevel LOCAL_QUORUM the results are correct. But using >> ConsistencyLevel ONE Cassandra doesn't return correct number of rows >> (it seems that most of the times there some of the rows are missing). >> Tried running nodetool repair, and nodetool scrub but this doesn't seem to >> help. >> >> What might the cause of such behaviour? -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- KosciaK mail: kosci...@gmail.com www : http://kosciak.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: [RELEASE] Apache Cassandra 0.7.6 released
A small error made it's way on my previous mail. The issue related to the debian package problem is: https://issues.apache.org/jira/browse/CASSANDRA-2481 -- Sylvain On Wed, May 18, 2011 at 9:54 PM, Sylvain Lebresne wrote: > A small error in the debian setup script made it's way into the debian > package of 0.7.6 > (more details here: https://issues.apache.org/jira/browse/CASSANDRA-2641). > We are working on fixing the problem but we must follow the apache > process and as > a result this may take a little longer than we would hope. > > Note that if you are not using the debian package you can safely > ignore this mail. > Otherwise, you may want to wait a little longer before updating. > > We will keep you posted as to when this is resolved. > > > PS: For the very impatient, you can also build the package from the > source after having > applied the second patch attached with the issue > (https://issues.apache.org/jira/browse/CASSANDRA-2641). > > -- > Sylvain > > On Wed, May 18, 2011 at 12:19 PM, Sylvain Lebresne > wrote: >> The Cassandra team is pleased to announce the release of Apache Cassandra >> version 0.7.6. >> >> Cassandra is a highly scalable second-generation distributed database, >> bringing together Dynamo's fully distributed design and Bigtable's >> ColumnFamily-based data model. You can read more here: >> >> http://cassandra.apache.org/ >> >> Downloads of source and binary distributions are listed in our download >> section: >> >> http://cassandra.apache.org/download/ >> >> This version is a bug fix release[1,3] and upgrade is highly encouraged. >> >> Please always pay attention to the release notes[2] before upgrading, >> especially if you upgrade from 0.7.2 or before. Upgrade from 0.7.3 or later >> should be a snap. >> >> If you were to encounter any problem, please let us know[4]. >> >> Have fun! >> >> >> [1]: http://goo.gl/VYZ2e (CHANGES.txt) >> [2]: http://goo.gl/jMRDE (NEWS.txt) >> [3]: http://goo.gl/6ohkb (JIRA Release Notes) >> [4]: https://issues.apache.org/jira/browse/CASSANDRA >> >
RE: Cassandra CMS
Hi, Additionally please take a look at Kundera. Kundera is an open source and currently supporting ORM over CASSANDRA, Hbase, MongoDB. Support for REDIS will be there in future. https://github.com/impetus-opensource/Kundera Blogs for reference are: http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/ http://mevivs.wordpress.com/2011/02/12/hector-kundera/ -Vivek From: da...@daotown.com [mailto:da...@daotown.com] On Behalf Of David Boxenhorn Sent: Thursday, May 05, 2011 4:48 PM To: user@cassandra.apache.org Subject: Re: Cassandra CMS I'm looking at Magnolia at the moment (as in, this second). At first glance, it looks like I should be able to use Cassandra as the database: http://documentation.magnolia-cms.com/technical-guide/content-storage-and-structure.html#Persistent_storage If it can use a filesystem as its database, it can use Cassandra, no? On Thu, May 5, 2011 at 2:01 PM, aaron morton mailto:aa...@thelastpickle.com>> wrote: Would you think of Django as a CMS ? http://stackoverflow.com/questions/2369793/how-to-use-cassandra-in-django-framework Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5 May 2011, at 22:54, Eric tamme wrote: Does anyone know of a content management system that can be easily customized to use Cassandra as its database? (Even better, if it can use Cassandra without customization!) I think your best bet will be to look for a CMS that uses an ORM for the storage layer and write a specific ORM for Cassandra that you can plugin to whatever frame work the CMS uses. -Eric Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Support for IN clause
Does CQL support IN clause? Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Questions about using MD5 encryption with SimpleAuthenticator
On Wed, 18 May 2011 17:16:28 -0700 Sameer Farooqui wrote: SF> But even SSL/TLS is subject to attacks from tools like SSLSNIFF: SF> http://www.thoughtcrime.org/software/sslsniff For perfect security, unplug the server and remove the hard drive. Ted
Re: Support for IN clause
Hi, I think IN clause for SELECT and UPDATE will be supported in v0.8.1. See https://issues.apache.org/jira/browse/CASSANDRA-2553 2011/5/19 Vivek Mishra : > Does CQL support IN clause? > > > > > > > > > Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend > a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud > Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud > ‘. > > Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus > webinar on May 27 by registering at > http://www.impetus.com/webinar?eventid=42 . > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. > -- Yuki Morishita t:yukim (http://twitter.com/yukim)
java.io.IOError: java.io.EOFException with version 0.7.6
I have some severe problems on our production site. I created the following test program to reproduce the issue with Cassandra 0.7.6 (with empty data set). I use the following data-model column_metadata: [] name: Customers column_type: Super gc_grace_seconds: 60 I have a super-column-family with a single row. Within this row I have a single super-column. Within this super-column, I concurrently create, read and delete columns. I have three threads: - Do in a loop: add a column to the super-column. - Do in a loop: delete a random column from the super-column. - Do in a loop: read the super-column (with all columns). After running the above threads concurrently, I always receive the following error: ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195) at org.apache.cassandra.db.Table.getRow(Table.java:324) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.io.EOFException at java.io.RandomAccessFile.readByte(Unknown Source) at org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324) at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:71) at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248) ... 30 more
Re: [hadoop] Counters in ColumnFamilyOutputFormat?
Avro is there because (1) a long time ago (it now seems) we thought we were going to move the main RPC layer to Avro and (2) it simplifies using Avro for Streaming, which also seems misguided now (https://issues.apache.org/jira/browse/CASSANDRA-1497). Using "native" Thrift mutations makes the most sense to me now, which would keep it similar in structure but avoid the avroToThrift copy. On Thu, May 19, 2011 at 2:30 AM, Mck wrote: > I'd like to investigate using Counters in hadoop using > ColumnFamilyOutputFormat. > > But i see that this class uses outdated ..hadoop.arvo classes > > Does it make sense to use counters for hadoop output? > > If i try rewriting ColumnFamilyOutputFormat and friends should it be to > the normal ..avro classes, or to something else? > > ~mck > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: java.io.IOError: java.io.EOFException with version 0.7.6
Would you have a simple script to reproduce the issue ? And could you open a JIRA ticket. Sylvain On Thu, May 19, 2011 at 4:22 PM, Rene Kochen wrote: > I have some severe problems on our production site. > I created the following test program to reproduce the issue with Cassandra > 0.7.6 (with empty data set). > > I use the following data-model > > column_metadata: [] > name: Customers > column_type: Super > gc_grace_seconds: 60 > > I have a super-column-family with a single row. > Within this row I have a single super-column. > Within this super-column, I concurrently create, read and delete columns. > > I have three threads: > > - Do in a loop: add a column to the super-column. > - Do in a loop: delete a random column from the super-column. > - Do in a loop: read the super-column (with all columns). > > After running the above threads concurrently, I always receive the following > error: > > ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main] > java.io.IOError: java.io.EOFException > at > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252) > at > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268) > at > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227) > at > java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source) > at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322) > at > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79) > at > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108) > at > org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283) > at > org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) > at > org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) > at > org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116) > at > org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130) > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195) > at org.apache.cassandra.db.Table.getRow(Table.java:324) > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.io.EOFException > at java.io.RandomAccessFile.readByte(Unknown Source) > at > org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324) > at > org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:71) > at > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248) > ... 30 more
Re: java.io.IOError: java.io.EOFException with version 0.7.6
It would be useful to post your program. On Thu, May 19, 2011 at 9:22 AM, Rene Kochen wrote: > I have some severe problems on our production site. > I created the following test program to reproduce the issue with Cassandra > 0.7.6 (with empty data set). > > I use the following data-model > > column_metadata: [] > name: Customers > column_type: Super > gc_grace_seconds: 60 > > I have a super-column-family with a single row. > Within this row I have a single super-column. > Within this super-column, I concurrently create, read and delete columns. > > I have three threads: > > - Do in a loop: add a column to the super-column. > - Do in a loop: delete a random column from the super-column. > - Do in a loop: read the super-column (with all columns). > > After running the above threads concurrently, I always receive the following > error: > > ERROR 17:09:57,036 Fatal exception in thread Thread[ReadStage:81,5,main] > java.io.IOError: java.io.EOFException > at > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252) > at > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268) > at > org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227) > at > java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source) > at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362) > at > org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322) > at > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79) > at > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) > at > org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108) > at > org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283) > at > org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) > at > org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) > at > org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) > at > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116) > at > org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130) > at > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1390) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1267) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1195) > at org.apache.cassandra.db.Table.getRow(Table.java:324) > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) > at > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.io.EOFException > at java.io.RandomAccessFile.readByte(Unknown Source) > at > org.apache.cassandra.utils.ByteBufferUtil.readShortLength(ByteBufferUtil.java:324) > at > org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:335) > at > org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:71) > at > org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248) > ... 30 more -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: [hadoop] Counters in ColumnFamilyOutputFormat?
FWIW, as I mentioned in the 1497 comments, the patch makes it abstract so that you can have any rpc/marshalling format you want with a simple extension point. So if we want to move to something besides avro, or even like I mentioned do something with Dumbo for streaming, it's easy to extend. On May 19, 2011, at 9:23 AM, Jonathan Ellis wrote: > Avro is there because (1) a long time ago (it now seems) we thought we > were going to move the main RPC layer to Avro and (2) it simplifies > using Avro for Streaming, which also seems misguided now > (https://issues.apache.org/jira/browse/CASSANDRA-1497). > > Using "native" Thrift mutations makes the most sense to me now, which > would keep it similar in structure but avoid the avroToThrift copy. > > On Thu, May 19, 2011 at 2:30 AM, Mck wrote: >> I'd like to investigate using Counters in hadoop using >> ColumnFamilyOutputFormat. >> >> But i see that this class uses outdated ..hadoop.arvo classes >> >> Does it make sense to use counters for hadoop output? >> >> If i try rewriting ColumnFamilyOutputFormat and friends should it be to >> the normal ..avro classes, or to something else? >> >> ~mck >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com
selecting data
i'm new using cassandra database, i want to get data as in relationnel database: select * from table where field="value"; i see using CLI we have just the followings commands : get .[''] Get a slice of columns. get .[''][''] Get a slice of sub columns. get .[''][''] Get a column value. get .[''][''][''] Get a sub column value. is there a way for that. i think using java API is possible. cassandra version : 6.0.12 thanks for help
RE: selecting data
You need to use CQL. Additionally JDBC driver on top of CQL is part of beta release. From: karim abbouh [karim_...@yahoo.fr] Sent: 19 May 2011 21:41 To: user@cassandra.apache.org Subject: selecting data i'm new using cassandra database, i want to get data as in relationnel database: select * from table where field="value"; i see using CLI we have just the followings commands : get .[''] Get a slice of columns. get .[''][''] Get a slice of sub columns. get .[''][''] Get a column value. get .[''][''][''] Get a sub column value. is there a way for that. i think using java API is possible. cassandra version : 6.0.12 thanks for help Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re : selecting data
thanks, but is there a way using just CLI and storage-conf.xml? De : Vivek Mishra À : "user@cassandra.apache.org" Envoyé le : Jeu 19 mai 2011, 17h 23min 15s Objet : RE: selecting data You need to use CQL. Additionally JDBC driver on top of CQL is part of beta release. From: karim abbouh [karim_...@yahoo.fr] Sent: 19 May 2011 21:41 To: user@cassandra.apache.org Subject: selecting data i'm new using cassandra database, i want to get data as in relationnel database: select * from table where field="value"; i see using CLI we have just the followings commands : get .[''] Get a slice of columns. get .[''][''] Get a slice of sub columns. get .[''][''] Get a column value. get .[''][''][''] Get a sub column value. is there a way for that. i think using java API is possible. cassandra version : 6.0.12 thanks for help Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud ‘. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: selecting data
Cassandra is not a RDBMS. Only you can do is search on a key, or you need full scan. You need to design your schema carefully as your application needs. On 2011/05/20, at 1:11, karim abbouh wrote: > i'm new using cassandra database, > i want to get data as in relationnel database: > select * from table where field="value"; > i see using CLI we have just the followings commands : > get .[''] Get a slice of > columns. > get .[''][''] Get a slice of sub > columns. > get .[''][''] Get a column > value. > get .[''][''][''] Get a sub column > value. > > is there a way for that. > i think using java API is possible. > cassandra version : 6.0.12 > > > thanks for help > > >
Re: Inconsistent results using secondary indexes between two DC
I am wondering if running nodetool repair will help in anyway -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Inconsistent-results-using-secondary-indexes-between-two-DC-tp632p6382819.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Commitlog Disk Full
Just noticed this thread and figured I'd chime in since we've had similar issues with the commit log growing too large on our clusters. Tuning down the flush timeout wasn't really an acceptable solution for us since we didn't want to be constantly flushing and generating extra SSTables for no reason. So we wrote a small tool that we start in a static block in CassandraServer that periodically checks the commit log size and flushes all memtables if they're above some threshold. I've attached that code. Any feedback / improvements are more than welcome. Mike On Thu, May 12, 2011 at 11:30 AM, Sanjeev Kulkarni wrote: > Hey guys, > I have a ec2 debian cluster consisting of several nodes running 0.7.5 on > ephimeral disks. > These are fresh installs and not upgrades. > The commitlog is set to the smaller of the disks which is around 10G in > size and the datadir is set to the bigger disk. > The config file is basically the same as the one supplied by the default > installation. > Our applications write to the cluster. After about a day of writing we > started noticing the commitlog disk filling up. Soon we went over the disk > limit and writes started failing. At this point we stopped the cluster. > Over the course of the day we inserted around 25G of data. Our columns > values are pretty small. > I understand that cassandra periodically cleans up the commitlog > directories by generating sstables in datadir. Is there any way to speed up > this movement from commitog to datadir? > Thanks! > > PeriodicMemtableFlusher.java Description: Binary data
Re: selecting data
only 0.7+ has index support On Thu, May 19, 2011 at 11:38 AM, Watanabe Maki wrote: > Cassandra is not a RDBMS. Only you can do is search on a key, or you need > full scan. > You need to design your schema carefully as your application needs. > > > On 2011/05/20, at 1:11, karim abbouh wrote: > > i'm new using cassandra database, > i want to get data as in relationnel database: > select * from table where field="value"; > i see using CLI we have just the followings commands : > get .[''] Get a slice of > columns. > get .[''][''] Get a slice of sub > columns. > get .[''][''] Get a column > value. > get .[''][''][''] Get a sub column > value. > > is there a way for that. > i think using java API is possible. > cassandra version : 6.0.12 > > > thanks for help > > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Commitlog Disk Full
That's basically the approach I want to take in https://issues.apache.org/jira/browse/CASSANDRA-2427. On Thu, May 19, 2011 at 12:00 PM, Mike Malone wrote: > Just noticed this thread and figured I'd chime in since we've had similar > issues with the commit log growing too large on our clusters. Tuning down > the flush timeout wasn't really an acceptable solution for us since we > didn't want to be constantly flushing and generating extra SSTables for no > reason. So we wrote a small tool that we start in a static block in > CassandraServer that periodically checks the commit log size and flushes all > memtables if they're above some threshold. > I've attached that code. Any feedback / improvements are more than welcome. > > Mike > > On Thu, May 12, 2011 at 11:30 AM, Sanjeev Kulkarni > wrote: >> >> Hey guys, >> I have a ec2 debian cluster consisting of several nodes running 0.7.5 on >> ephimeral disks. >> These are fresh installs and not upgrades. >> The commitlog is set to the smaller of the disks which is around 10G in >> size and the datadir is set to the bigger disk. >> The config file is basically the same as the one supplied by the default >> installation. >> Our applications write to the cluster. After about a day of writing we >> started noticing the commitlog disk filling up. Soon we went over the disk >> limit and writes started failing. At this point we stopped the cluster. >> Over the course of the day we inserted around 25G of data. Our columns >> values are pretty small. >> I understand that cassandra periodically cleans up the commitlog >> directories by generating sstables in datadir. Is there any way to speed up >> this movement from commitog to datadir? >> Thanks! > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re : selecting data
for example in 0.7 how we use index ? De : Jonathan Ellis À : user@cassandra.apache.org Envoyé le : Jeu 19 mai 2011, 18h 04min 48s Objet : Re: selecting data only 0.7+ has index support On Thu, May 19, 2011 at 11:38 AM, Watanabe Maki wrote: > Cassandra is not a RDBMS. Only you can do is search on a key, or you need > full scan. > You need to design your schema carefully as your application needs. > > > On 2011/05/20, at 1:11, karim abbouh wrote: > > i'm new using cassandra database, > i want to get data as in relationnel database: > select * from table where field="value"; > i see using CLI we have just the followings commands : > get .[''] Get a slice of > columns. > get .[''][''] Get a slice of sub > columns. > get .[''][''] Get a column > value. > get .[''][''][''] Get a sub column > value. > > is there a way for that. > i think using java API is possible. > cassandra version : 6.0.12 > > > thanks for help > > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Re : selecting data
http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes On Thu, May 19, 2011 at 7:10 PM, karim abbouh wrote: > for example in 0.7 how we use index ?
Re: Cassandra CMS
On Thu, May 19, 2011 at 8:11 AM, Vivek Mishra wrote: > Hi, > > Additionally please take a look at Kundera. > > Kundera is an open source and currently supporting ORM over CASSANDRA, > Hbase, MongoDB. > > Support for REDIS will be there in future. > > > > https://github.com/impetus-opensource/Kundera > > > > Blogs for reference are: > > > > http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/ > > > > http://mevivs.wordpress.com/2011/02/12/hector-kundera/ > > > > > > -Vivek > > > > > > > > *From:* da...@daotown.com [mailto:da...@daotown.com] *On Behalf Of *David > Boxenhorn > *Sent:* Thursday, May 05, 2011 4:48 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Cassandra CMS > > > > I'm looking at Magnolia at the moment (as in, this second). At first > glance, it looks like I should be able to use Cassandra as the database: > > > http://documentation.magnolia-cms.com/technical-guide/content-storage-and-structure.html#Persistent_storage > > If it can use a filesystem as its database, it can use Cassandra, no? > > On Thu, May 5, 2011 at 2:01 PM, aaron morton > wrote: > > Would you think of Django as a CMS ? > > > http://stackoverflow.com/questions/2369793/how-to-use-cassandra-in-django-framework > > > > Cheers > > > > - > > Aaron Morton > > Freelance Cassandra Developer > > @aaronmorton > > http://www.thelastpickle.com > > > > On 5 May 2011, at 22:54, Eric tamme wrote: > > > > Does anyone know of a content management system that can be easily > > customized to use Cassandra as its database? > > > > (Even better, if it can use Cassandra without customization!) > > > > > I think your best bet will be to look for a CMS that uses an ORM for > the storage layer and write a specific ORM for Cassandra that you can > plugin to whatever frame work the CMS uses. > > -Eric > > > > > > -- > > Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend > a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud > Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud > ‘. > > Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus > webinar on May 27 by registering at > http://www.impetus.com/webinar?eventid=42 . > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. > What is the status of Kundera supporting newer versions of the Cassandra API? Last I checked the code was still built around 0.6.X. Edward
Re: [hadoop] Counters in ColumnFamilyOutputFormat?
On Thu, 2011-05-19 at 09:23 -0500, Jonathan Ellis wrote: > > Using "native" Thrift mutations makes the most sense to me now, which > would keep it similar in structure but avoid the avroToThrift copy. https://issues.apache.org/jira/browse/CASSANDRA-2667 -- "Everything you can imagine is real." Pablo Picasso | http://semb.wever.org | http://sesat.no | http://tech.finn.no | Java XSS Filter signature.asc Description: This is a digitally signed message part
Re: Exception when starting
On Thu, May 19, 2011 at 1:51 PM, Eranda Sooriyabandara <0704...@gmail.com> wrote: > Hi devs, > I tried to start the Apache Cassandra and got an exception. This is what log > says, > > INFO 00:18:12,226 Logging initialized > INFO 00:18:12,278 Heap size: 1029701632/1029701632 > INFO 00:18:12,281 JNA not found. Native methods will be disabled. > INFO 00:18:12,294 Loading settings from > file:/home/eranda/Desktop/cassendra/apache-cassandra-0.7.6/conf/cassandra.yaml > INFO 00:18:12,452 DiskAccessMode 'auto' determined to be standard, > indexAccessMode is standard > INFO 00:18:12,621 reading saved cache > /var/lib/cassandra/saved_caches/system-IndexInfo-KeyCache > ERROR 00:18:12,622 Exception encountered during startup. > java.lang.NegativeArraySizeException Moving to user@. Your keycache is corrupt, just rm /var/lib/cassandra/saved_caches/* and try again. -Brandon
Re: Unable to add columns to empty row in Column family: Cassandra
Thank you Narendra/Aaron. Sorry for late respponse. PFB further information on this. 1) How do you delete the data in the cli ? Is it a row delete e.g. del MyCF['my-key']; *[Anuya]:* Yeach. Exactly the same way. 2) What client are you using the insert the row the second time ? e.g. custom thrift wrapper or pycassa *[Anuya]:* I am using Thrift APIs in JAVA code directly. No high level Cassandra client. I am using Cassandra client's INSERT and REMOVE methods to insert and delete rows programmatically. 3) How is the second read done, via the cli ? *[Anuya]:* Operations are as follows: 1) INSERT #1: Programmatically 2) DELETE #1: Command line 3) INSERT #2: Programmatically A READ opeation, after each of the above steps, is done using CLI 4) Does the same test work when you only use your app ? *[Anuya]:* Exactly, INSERT -> DELETE -> INSERT scenario with same row key works well if executed only from command line OR only programmatically. Basically, over here, I am trying to resuse the row key. So, I create a row with a specific key, delete that row (But, Cassandra delete does not delete a row key. Only deletes all columns in a row.) and then I want to insert the columns in a same row with same row id. Correct me if I go wrong, reusing the row key should work as per Cassandra data model definition/structure. 5) Cassandra-cli will be using the current time as it's time stamp for the delete. If I had to guess what was happening it would be a problem with the timestamps your app is creating. *[Anuya]:* Well, regarding timestamps.. My JAVA code to generate timstamps is simply: System.currentTimeMillis(); So, I also use current time as a timestamp similar to Cassandra as you mentioned. Then, irrespective of which client was used for insert/delete operations, this INSERT -> DELETE -> INSERT scenario should work. Because, as per the sequence of these operations (INSERT -> DELETE -> INSERT) the timestamp condition *TS[INSERT #2] > TS[DELETE #1] > TS[INSERT #1] will be obviously satisfied. * ** But, the fact is, this scenario does not work while switching between clients for INSERT and DELETE opeations as mentioned in point #3 above. So, is this a clock synchronization issue? I mean the clock used by program to generate timstamps is out of sync with clock used by CLI? On this, FYI, I am running linux based VMs which in turn run a Cassandra servers. The command line client is obviously on VM and the JAVA program is on host machine running a VM. If the clocks of these two machines are in sync then, I think, switching between clients should not matter? Before I hit the send button, :), I scrutinized the clocks on VM and host m/c. The clock on VM is exactly 4 seconds behind the clock on host m/c. I welcome your comments on above. Thanks, Anuya On Thu, May 12, 2011 at 4:31 PM, Narendra Sharma wrote: > Can u share the code? > > > On Mon, May 2, 2011 at 11:34 PM, anuya joshi wrote: > >> Hello, >> >> I am using Cassandra for my application.My Cassandra client uses Thrift >> APIs directly. The problem I am facing currently is as follows: >> >> 1) I added a row and columns in it dynamically via Thrift API Client >> 2) Next, I used command line client to delete row which actually deleted >> all the columns in it, leaving empty row with original row id. >> 3) Now, I am trying to add columns dynamically using client program into >> this empty row with same row key >> However, columns are not being inserted. >> But, when tried from command line client, it worked correctly. >> >> Any pointer on this would be of great use >> >> Thanks in advance, >> >> Regards, >> Anuya >> > > > > -- > Narendra Sharma > Solution Architect > *http://www.persistentsys.com* > *http://narendrasharma.blogspot.com/* > > >
Re: java.io.IOError: java.io.EOFException with version 0.7.6
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/file/n6383644/CassandraIssue.zip CassandraIssue.zip This is the test program (.NET 4) I use Cassandra 0.7.6 with standard yaml and this super-column-family: create column family Customers with column_type = 'Super' and comparator = 'BytesType' and gc_grace = 60; In the program: set cassandra IP, keyspace and super-column-family and press start. Run program on empty database for about 30 minutes and exception should pop up in cassandra log. I receive the following exceptions: java.io.IOError: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:283) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:69) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:130) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1385) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1262) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1190) at org.apache.cassandra.db.Table.getRow(Table.java:324) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:451) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column name length 0 at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73) at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248) ... 30 more java.io.IOError: java.io.EOFException at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268) at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227) at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(Unknown Source) at java.util.concurrent.ConcurrentSkipListMap.(Unknown Source) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362) at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cas
Re: How to configure internode encryption in 0.8.0?
Thanks, Jeremy! Nirmal, any advice on how to generate the key/trust stores with the correct cipher? - Sameer On Wed, May 18, 2011 at 8:10 AM, Jeremy Hanna wrote: > I'll CC Nirmal Ranganathan who implemented the internode encryption who > might be able to give you some advice on this. > > On May 17, 2011, at 7:47 PM, Sameer Farooqui wrote: > > > Thanks for the link, Jeremy. > > > > I generated the keystore and truststore for inter-node communication > using the link in the YAML file: > > > http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore > > > > Unfortunately, the default instructions in the above link used > TLS_RSA_WITH_AES_256_CBC_SHA. So, when I start Cassandra now, I get this > error: > > > > ERROR 00:10:38,734 Exception encountered during startup. > > java.lang.IllegalArgumentException: Cannot support > TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers > > at > com.sun.net.ssl.internal.ssl.CipherSuiteList.(CipherSuiteList.j > ava:79) > > at > com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.setEnabledCipherSuit > es(SSLServerSocketImpl.java:166) > > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.j > ava:55) > > > > > > The YAML file states that the cipher suite for authentication should be: > TLS_RSA_WITH_AES_128_CBC_SHA. > > > > This is my first time using keytool and I've searched the web to see how > I can change the cipher from AES_256 to AES_128, but haven't found the > answer. > > > > Anyone know how to change the cipher to AES_128? > > > > Here are the commands I used to generate the non-working keystore and > truststore: > > > > 1) keytool -genkeypair -alias jdoe -keyalg RSA -validity 7 -keystore > .keystore > > 2) keytool -list -v -keystore .keystore > > 3) keytool -export -alias jdoe -keystore .keystore -rfc -file jdoe.cer > > 4) cat jdoe.cer > > 5) keytool -import -alias jdoecert -file jdoe.cer -keystore .truststore > > 6) keytool -list -v -keystore .truststore > > > > > > - Sameer > > > > On Mon, May 16, 2011 at 5:35 PM, Jeremy Hanna < > jeremy.hanna1...@gmail.com> wrote: > > Take a look at cassandra.yaml in your 0.8 download at the very bottom. > There are docs and examples there. > > e.g. > http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.8.0-beta2/conf/cassandra.yaml > > > > On May 16, 2011, at 6:36 PM, Sameer Farooqui wrote: > > > > > I understand that 0.8.0 has configurable internode encryption > (CASSANDRA-1567, 2152). > > > > > > I haven't been able to find any info on how to configure it though on > this mailing list or the Datastax website. > > > > > > Can somebody point me towards how to set this up? > > > > > > - Sameer > > > > > >
Re: How to configure internode encryption in 0.8.0?
On Tue, May 17, 2011 at 5:47 PM, Sameer Farooqui wrote: > > Unfortunately, the default instructions in the above link > used TLS_RSA_WITH_AES_256_CBC_SHA. So, when I start Cassandra now, I get this > error: > ERROR 00:10:38,734 Exception encountered during startup. > java.lang.IllegalArgumentException: Cannot support > TLS_RSA_WITH_AES_256_CBC_SHA with currently installed providers > at > com.sun.net.ssl.internal.ssl.CipherSuiteList.(CipherSuiteList.j > ava:79) > at > com.sun.net.ssl.internal.ssl.SSLServerSocketImpl.setEnabledCipherSuit > es(SSLServerSocketImpl.java:166) > at > org.apache.cassandra.security.SSLFactory.getServerSocket(SSLFactory.j > ava:55) You might want to double check that you have the Java policy files installed that allow strong crypto-- I found I needed to install them when working on our encrypted Cassandra setup. They're available for download from http://www.oracle.com/technetwork/java/javase/downloads/index.html Look for the link to "Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files 6". -- Christopher Deutsch
Can I use secondary index with any partitioner
Can I use secondary index with any partitioner 1) RandomPartitioner 2) ByteOrderedPartitioner
Re: Cassandra Vs. Oracle Coherence
I've not used Coherence, all I know if from reading the first paragraph here http://www.oracle.com/technetwork/middleware/coherence/overview/index.html and wikipedia:) Cassandra is not a caching layer, it's a database. You can rely on it as your database, and some people find when they do they no longer need a caching layer. If you can provide some more details on your use case we can help you judge if cassandra is a good fit. Or if you can tell us which features of Coherence you think are a good fit for you we can see if there are Cassandra equivalents. And is should also say: - cassandra is free - if you have a good idea about how it can be improved, there is a chance it may get implemented. And if you can provide some dev time an even better chance. - you can get free support from the community - there are multiple professional services companies that provide support http://wiki.apache.org/cassandra/ThirdPartySupport Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18 May 2011, at 10:44, Karamel, Raghu wrote: > Hi, > > I am new to Cassandra and very excited with the technology. I am evaluating > it and trying to understand the difference between Cassandra and Oracle > Coherence. Precisely , looking for reasons why would some select Cassandra > over Oracle Coherence. Does anyone did the exercise of comparing them? > Appreciate if you can share some information on that. > > Regrads > -RK
Re: Knowing when there is a *real* need to add nodes
Considering disk usage is a tricky one. Compacted SSTables files will remain on disk until either there is not enough space, or the JVM GC runs. To measure the live space use the "Space used (live)" from the CFStats. "Space used (total)" includes the space which has been compacted and not yet deleted from disk. The data in deleted columns *may* be purged from disk during a minor or major compaction. This can happen before GCGraceSeconds has expired. It is only the Tombstone that must be kept around for at least GCGraceSeconds. I agree that 50% utilisation on the data directories is a sensible soft limit that will help keep you out of trouble. The space needed by the compaction depends on which bucket of files it is compacting, but it will always require at least as much free disk space as the files it is compacting. That should also leave headroom for adding new nodes, just in case. Ideally when adding new nodes existing nodes only stream data to the new nodes. If however you are increasing the node count by less than a factor of 2 you may need to make multiple moves and the nodes may need additional space. To gauge the throughout I would also look at the Latency trackers on the o.a.c.db.StorageProxy MBean. They track the latency of complete requests including talking to the rest of the cluster. The metrics on the individual column families are concerned with the local read. For the pending TP stats I would guess that for the read and write pools a pending value consistently higher than the number of threads assigned (in the config) would be something to investigate. Waiting on these stages will be reflected in the StorageProxy latency numbers. HintedHandoff, StreamStage and AntiEntropyStage will have tasks that staying the pending queue for a while. AFAIK the other pools should not have many (< 10) tasks in the pending queue and should be able to clearing the pending queue. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18 May 2011, at 19:50, Tomer B wrote: > As for static disk usage i would add this: > > test: df -kh > description: run test after compaction (check GCGraceSeconds in > storage-conf.xml) as only then data is expunged permanently, run on data > disk, assuming here commitlog disk is separated from data dir. > green gauge: used_space < 30% of disk capacity > yellow gauge: used space 30% - 50% of disk capacity > red gauge: used_space > 50% of disk capacity > comments: Compactions can require up to 100% of in use space temporarily in > worst case (data file dir) when approaching 50% or more of disk capacity use > raid0 for data dir disk if cannot try increasing your disk if cannot consider > adding nodes (or first consider adding nodes if that's what you wish). > > 2011/5/12 Watanabe Maki > It's interesting topic for me too. > How about to add measurement on static disk utilization (% used) and memory > utilization ( rss, JVM heap, JVM GC )? > > maki > > From iPhone > > > On 2011/05/12, at 0:49, Tomer B wrote: > > > Hi > > > > I'm trying to predict when my cluster would soon be needing new nodes > > added, i want a continuous graph telling my of my cluster health so > > that when i see my cluster becomes more and more busy (I want numbers > > & measurments) i would be able to know i need to start purchasing more > > machines and get them into my cluster, so i want to know of that > > beforehand. > > I'm writing here what I came with after doing some research over net. > > I would highly appreciate any additional gauge measurements and ranges > > in order to test my cluster health and to know beforehand when i'm > > going to soon need more nodes.Although i'm writing down green > > gauge,yellow gauge,red gauge, i'm also trying to find a continuous > > graph where i can tell where our cluster stand (as much as > > possible...) > > > > Also my recommendation is always before adding new nodes: > > > > 1. Make sure all nodes are balanced and if not balance them. > > 2. Separate commit log drive from data (SSTables) drive > > 3. use mmap index only in memory and not auto > > 4. Increase disk IO if possible. > > 5. Avoid swapping as much as possible. > > > > > > As for my gauge tests for when to add new nodes: > > > > test: nodetool tpstats -h > > green gauge: No pending column with number higher > > yellow gauge: pending columns 100-2000 > > red gauge:Larger than 3000 > > > > test: iostat -x -n -p -z 5 10 and iostat -xcn 5 > > green gauge: kw/s + kr/s reaches is below 25% capacity of disk io > > yellow gauge: 20%-50% > > red gauge: 50%+ > > > > test: ostat -x -n -p -z 5 10 and check %b column > > green gauge: less than 10% > > yellow gauge: 10%-80% > > red gauge: 90%+ > > > > test: nodetool cfstats --host localhost > > green gauge: “SSTable count” item does not continually grow over time > > yellow gauge: > > red gauge: “SSTable count” item continually grows over time
Re: Recommandation on how to organize CF
I'm a bit confused by your examples. I think you are saying... - Standard CF called Message using the UTF8Type for column comparisons used to store the individual messages. Row key is the message UUID. Not sure what the columns are. - Standard CF called MessageTime using TimeUUIDType for columns comparison uses to store collections of messages. Row key is "messagelist:" for a message list, and "messagebox::" for message box. Not sure what the columns are. The best model is going to be the one that supports your read requests and the volume of data your are expecting. One way to go is to de normalise to support very fast read paths. You could store the entire message in one column using something like JSON to serialise it. Then - MessageIndexes standard CF to store the full messages in context, there are three different types of rows: * keys with store all messages for a user, column name is the message TimeUUID and value is the message structure * keys with / store the messages for a single message box. Columns same as below. * keys with // store the messages in a single message list. Columns as above. - MessageFolders CF to store the message box and message lists, two approaches: 1) as key and each column is a message box, message lists are stored in a single column as JSON 2) row for the top level message box, column for each message box. / for the next level, Or if space is a concern just store the UUID of the message in the index CF and add a CF to store the messages. It also going to depend on the management features, e.g. can you rename a message box / list ? Move messages around ? If so the de normalised pattern may not be the best as those operations will take longer. Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 19 May 2011, at 05:44, openvictor Open wrote: > Hello all, > > I know organization is a broad topic and everybody may have an idea on how to > do it, but I really want to have some advices and opinions and I think it > could be interesting to discuss this matter. > > Here is my problem: I am designing a messaging system internal to a website. > There are 3 big structures which are Message, MessageList, MessageBox. A > message/messagelist is identified only by an UUID; a MessageBox is identified > by a name(utf8 string). A messagebox has a set of MessageList in it and a > messagelist has a set of message in it, all of them being UUIDs. > Currently I have only two CF : message and message_time. Message is a > UTF8Type (cassandra 0.6.11, soon going for 0.8) and message_time is a > TimeUUIDType. > > For example if I want to request all message in a certain messagelist I do : > message_time['messagelist:uuid(messagelist)'] > If I want information of a mesasge I do message['message:uuid(message)'] > If I want all messagelist for a certain messagebox ( called nameofbox for > user openvictor for this example) I do : > message_time['messagebox:openvictor:nameofbox'] > > My question to Cassandra users is : is it a good idea to regroup all those > things into two CF ? Is there some advantages / drawbacks of this two CFs and > for long term should I change my organization ? > > Thank you, > Victor
Re: Support for IN clause
Language spec is here https://github.com/apache/cassandra/blob/trunk/doc/cql/CQL.textile - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20 May 2011, at 01:42, Yuki Morishita wrote: > Hi, > > I think IN clause for SELECT and UPDATE will be supported in v0.8.1. > See https://issues.apache.org/jira/browse/CASSANDRA-2553 > > 2011/5/19 Vivek Mishra : >> Does CQL support IN clause? >> >> >> >> >> >> >> >> >> Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend >> a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud >> Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud >> ‘. >> >> Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus >> webinar on May 27 by registering at >> http://www.impetus.com/webinar?eventid=42 . >> >> >> NOTE: This message may contain information that is confidential, >> proprietary, privileged or otherwise protected by law. The message is >> intended solely for the named addressee. If received in error, please >> destroy and notify the sender. Any use of this email is prohibited when >> received in error. Impetus does not represent, warrant and/or guarantee, >> that the integrity of this communication has been maintained nor that the >> communication is free of errors, virus, interception or interference. >> > > > > -- > Yuki Morishita > t:yukim (http://twitter.com/yukim)
Re: Exception when starting
Thanks Brandon, All data in /var/lib/cassandra was corrupted and after removing them it started normally.