Re: time to live rows

2011-02-10 Thread Sylvain Lebresne
Kal, you may have to flush before compacting. If you insert then compact, then it's almost certain that the inserts are still in the memtable, and thus not compacted. On Tue, Feb 8, 2011 at 9:54 PM, Kallin Nagelberg wrote: > What's the secret recipe that I'm missing? I tried forcing compaction >

Re: Exceptions on 0.7.0

2011-02-10 Thread aaron morton
Shimi, You may be seeing the result of CASSANDRA-1992, are you able to test with the most recent 0.7 build ? https://hudson.apache.org/hudson/job/Cassandra-0.7/ Aaron On 10 Feb 2011, at 13:42, Dan Hendry wrote: > Out of curiosity, do you really have on the order of 1,986,622,313 elem

Re: Exceptions on 0.7.0

2011-02-10 Thread shimi
On 10 Feb 2011, at 13:42, Dan Hendry wrote: Out of curiosity, do you really have on the order of 1,986,622,313 elements (I believe elements=keys) in the cf? Dan No. I was too puzzled by the numbers On Thu, Feb 10, 2011 at 10:30 AM, aaron morton wrote: > Shimi, > You may be seeing the result

Re: regarding space taken by different column families in Cassandra

2011-02-10 Thread aaron morton
Total size includes SSTables that have been compacted, but have yet to be deleted from disk. Live space is only the space used by SSTables that are still in use. You second set of numbers looks like minor compaction has done it's job and the unused space has been reclaimed. Hope that helps.

Re: Implemeting a LRU in Cassandra

2011-02-10 Thread aaron morton
FWIW and depending on the size of data, I would use consider using sorted sets in redis http://redis.io/commands#sorted_set Where the member is the page url and the weight is time stamp, use ZRANGE to get back the top 1,000 entries in the set. Would that work for you? Aaron On 9 Feb 2011, a

Re: Exceptions on 0.7.0

2011-02-10 Thread aaron morton
I should be able to repair, install the new version and kick off nodetool repair . If you are uncertain search for cassandra-1992 on the list, there has been some discussion. You can also wait till some peeps in the states wake up if you want to be extra sure. The number if the number of co

Calculating the size of rows in KBs

2011-02-10 Thread Aditya Narayan
How can I get or calculate the size of rows/ columns ? what are the any overheads on memory for each column/row ?

Data ends up in wrong Columnfamily

2011-02-10 Thread Roland Gude
Hi, i am experiencing a strange issue. I have two applications writing to Cassandra (in different Column families in the same keyspace). The applications reside on different machines and know nothing about the existence of each other. The both produce data and write it in Cassandra with batch mu

Zurich user group

2011-02-10 Thread Sasha Dolgy
hi there, Are there any cassandra users in and around the zurich area interested in a get together. Sometimes its good to discuss usage and concepts over beverages... +sd

Re: Implemeting a LRU in Cassandra

2011-02-10 Thread Utku Can Topçu
Dear Aaron, Thank you for your suggestion. I'll be evaluating it. Since all my other use cases are implemented in Cassandra, now I had the question in my mind, if it was possible to implement the sorted set in Cassandra :) The problem here is, in a few hours I might be resolving more than 2M pag

Multiple inequality filters

2011-02-10 Thread Chema Molins
Hi, I have stumbled upon the limitation of Google AppEngine of not supporting inequality filters on more than one column. ( being able to filter values with <= and >= ). Does Cassandra support them? Or this is a general "NoSql" limitation. Thanks a lot Chema

Re: Exceptions on 0.7.0

2011-02-10 Thread shimi
I upgraded the version on all the nodes but I still gets the Exceptions. I run cleanup on one of the nodes but I don't think there is any cleanup going on. Another weird thing that I see is: INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353 CompactionIterator.java (line 135) Compacting large row

Re: Exceptions on 0.7.0

2011-02-10 Thread Attila Babo
The same problem here, even with apache-cassandra-2011-02-10_06-30-00-bin.tar.gz from hudson. I'm happy to share the full log if needed or run tests to identify the core problem which looks like an overflow for me. Database was upgraded from 0.6.8, there were no problems with it before. /Attila -

Possible application

2011-02-10 Thread Benson Margulies
Hello there, I'm trying to sort out whether Cassandra is a good pick as the data store for a problem I've got. The shape of the thing is a large number of hash tables. On a merely pretty big scale, it can all run on one pretty big machine. On a gigantic scale, which is an eventual goal, it will n

Super Slow Multi-gets

2011-02-10 Thread Bill Speirs
I have a 7 node setup with a replication factor of 1 and a read consistency of 1. I have two column families: Messages which stores millions of rows with a UUID for the row key, DateIndex which stores thousands of rows with a String as the row key. I perform 2 look-ups for my queries: 1) Fetch the

Re: Out of control memory consumption

2011-02-10 Thread Huy Le
We use Cassandra version 0.6.11. Our cache size is very small. 11 out of 12 servers have used heap size less than 500MB of 3GB allocated. Just one server that had memory usage run out of control. The issue is isolated. It turn out that one CF has a row with compacted row size of 50MB. And this

Re: Super Slow Multi-gets

2011-02-10 Thread Bill Speirs
We attempted a compaction to see if that would improve read performance (BTW: write performance is as expected, fast!). Here is the result, an ArrayOutOfBounds exception: INFO 11:48:41,070 Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/DateIndex-e-7-Da

Re: Super Slow Multi-gets

2011-02-10 Thread Utku Can Topçu
Dear Bill, How about the size of the row in the Messages CF. Is it too big? Might you be having an overhead of the bandwidth? Regards, Utku On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs wrote: > I have a 7 node setup with a replication factor of 1 and a read > consistency of 1. I have two colum

Re: Out of control memory consumption

2011-02-10 Thread Oleg Anastasyev
Huy Le springpartners.com> writes: > Our CMS settings are:    -XX:CMSInitiatingOccupancyFraction=35 \    -XX:+UseCMSInitiatingOccupancyOnly \  > Occupancy Fraction = 35 seems very low value. You instructed GC to make collection as soon as memory usage is at 35% - i.e. about 1G. This see

Re: Possible application

2011-02-10 Thread Rock, Paul
Well, you can make Cassandra work on a single box (or multiple instances on a single box if need be). My experimental/dev cluster that my team plays with to try things out is 6 nodes running on 6 rather small cloud VM's and it works fine. So I'd say yes, it work in the merely big scale. On Feb

Re: Out of control memory consumption

2011-02-10 Thread Huy Le
Yes, we had setting at 75 but JVM did not have enough time to do GC, so it abort GC'ing. We lowered it to 50, but still had issue, so we lowered it again to 35. On Thu, Feb 10, 2011 at 12:11 PM, Oleg Anastasyev wrote: > Huy Le springpartners.com> writes: > > > Our CMS settings are:-XX:

Re: Super Slow Multi-gets

2011-02-10 Thread Bill Speirs
Each message row is well under 1K. So I don't think it is network... plus all boxes are on a fast LAN. Bill- On Feb 10, 2011 11:59 AM, "Utku Can Topçu" wrote: > Dear Bill, > > How about the size of the row in the Messages CF. Is it too big? Might you > be having an overhead of the bandwidth? > >

Re: Do supercolumns have a purpose?

2011-02-10 Thread Frank LoVecchio
I've found super column families quite useful when using RandomOrderedPartioner on a low-maintenance cluster (as opposed to Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type; try doing that with one regular column family and secondary indexes (you could obviously sort on th

Cassandra documentation

2011-02-10 Thread mcasandra
I am unable to find a single source where there is a complete documentation of Cassandra. I find some blogs here and there or a short descriptions on the wiki page. Where can I get a good documentation to start with understanding Cassandra and it's administration? -- View this message in co

Re: unsubscribe

2011-02-10 Thread Eric Evans
On Wed, 2011-02-09 at 18:09 +0200, Onur AKTAS wrote: > unsubscribe http://goo.gl/nN7FG -- Eric Evans eev...@rackspace.com

Re: Cassandra documentation

2011-02-10 Thread Peter Schuller
> I am unable to find a single source where there is a complete documentation > of Cassandra. I find some blogs here and there or a short descriptions on > the wiki page. > > Where can I get a good documentation to start with understanding Cassandra > and it's administration? I believe the most co

Re: Out of control memory consumption

2011-02-10 Thread Jonathan Ellis
Sounds like you need to use a larger heap or put less stuff in it (memtables, caches). On Thu, Feb 10, 2011 at 11:17 AM, Huy Le wrote: > Yes, we had setting at 75 but JVM did not have enough time to do GC, so it > abort GC'ing.   We lowered it to 50, but still had issue, so we lowered it > again

ORM over Cassandra

2011-02-10 Thread Vivek Mishra
I understand that currently GORA is under development for release of ORM over Cassandra/HBase. Recently tried to develop some apps using Kundera. It may be worth looking into: http://code.google.com/p/kundera/. IT is JPA complined annotation based framework. For the same. recently it is

Re: ORM over Cassandra

2011-02-10 Thread Davide Palmisano
Hi, Ever heard of Apache Gora[1] [1] http://incubator.apache.org/gora/ On Thu, Feb 10, 2011 at 7:00 PM, Vivek Mishra wrote: > I understand that currently GORA is under development for release of ORM > over Cassandra/HBase. > > > > Recently tried to develop some apps using Kundera. > > It may b

Re: ORM over Cassandra

2011-02-10 Thread Davide Palmisano
Sorry, just realized you wrote it as the first line. :( Apologize. On Thu, Feb 10, 2011 at 7:30 PM, Davide Palmisano wrote: > Hi, > > Ever heard of Apache Gora[1] > > > [1] http://incubator.apache.org/gora/ > > On Thu, Feb 10, 2011 at 7:00 PM, Vivek Mishra > wrote: >> I understand that currentl

RE: ORM over Cassandra

2011-02-10 Thread Vivek Mishra
Hi, Kundera provides support for lucene based indexing and as well as second level cache support. Not sure if it is there in GORA. Vivek From: Davide Palmisano [dpalmis...@gmail.com] Sent: 11 February 2011 01:01 To: user@cassandra.apache.org Subject: Re:

Re: ORM over Cassandra

2011-02-10 Thread Davide Palmisano
> Kundera provides support for lucene based indexing and as well as second > level cache support. > Not sure if it is there in GORA. I don't know actually. But this[1] is intriguing me to try some tests with myBatis over Cassandra. [1] https://issues.apache.org/jira/browse/CASSANDRA-2124 > > Vi

Re: ORM over Cassandra

2011-02-10 Thread Jonathan Ellis
An object mapper implementing (a subset of) JPA is built into Hector now: https://github.com/rantav/hector/tree/master/object-mapper On Thu, Feb 10, 2011 at 1:00 PM, Vivek Mishra wrote: > I understand that currently GORA is under development for release of ORM > over Cassandra/HBase. > > > > Rece

Re: Calculating the size of rows in KBs

2011-02-10 Thread Aaron Morton
If you want to get the byte size of a particular row you will need to read it all back. If you connect with JConsole at look at you column families, there are attributes for the max, min and mean row sizes. In general the entire row only exists in memory when it is contained in the first Memta

Re: Data ends up in wrong Columnfamily

2011-02-10 Thread Aaron Morton
Not heard of that before, chances are it's a problem in your code. Does machine A even know the other CF name? Can you log the batch mutations you are sending? When it appears in the other CF is the data complete? There is also a Hector list, perhaps they can help. Aaron On 10/02/2011, at 11:5

Re: Calculating the size of rows in KBs

2011-02-10 Thread Aditya Narayan
Thank you Aaron!! But, If you are reading partial rows(that otherwise contain several thousands of **valueless** columns) then do the column indexes help in making the reads faster & more efficient than if they were not valueless? Perhaps, because they would only need to look up whether the asked

Re: Multiple inequality filters

2011-02-10 Thread Aaron Morton
Secondary indexes in Cassandra have similar restrictions http://www.datastax.com/docs/0.7/data_model/secondary_indexes Aaron On 11/02/2011, at 1:42 AM, Chema Molins wrote: > Hi, > > I have stumbled upon the limitation of Google AppEngine of not supporting > inequality filters on more than on

rename column family

2011-02-10 Thread Karl Hiramoto
Hi, In Mysql I do this pattern and wonder if I could do something similar with cassandra. 1. Live/Production queries always coming into LiveTable 2. Build new data with BuildTable 3.RENAME TABLE LiveTable TO OldTable, BuildTable To LiveTable 4. DROP TABLE OldTable, Goto step #2 build

Re: Super Slow Multi-gets

2011-02-10 Thread Aaron Morton
The out of bounds error normally means you have columns names that are not valid time uuids. Is that a possibility ? Aaron On 11/02/2011, at 5:55 AM, Bill Speirs wrote: > We attempted a compaction to see if that would improve read > performance (BTW: write performance is as expected, fast!).

Re: Cassandra documentation

2011-02-10 Thread Aaron Morton
There is also a book http://oreilly.com/catalog/0636920010852 I've not read it yet, so cannot comment on it's quality. Aaron On 11/02/2011, at 7:22 AM, Peter Schuller wrote: >> I am unable to find a single source where there is a complete documentation >> of Cassandra. I find some blogs here a

Re: Calculating the size of rows in KBs

2011-02-10 Thread Aaron Morton
If you are thinking about column_index_size_in_kb in Cassandra.yaml then yes. Aaron On 11/02/2011, at 9:39 AM, Aditya Narayan wrote: > Thank you Aaron!! > > But, If you are reading partial rows(that otherwise contain several > thousands of **valueless** columns) then do the column indexes hel

Re: rename column family

2011-02-10 Thread Aaron Morton
With more information I'd say this is not a good idea. I would suggest looking at why you do the table switch in the MySql version and consider if it's still necessary in the Cassandra version. Could you use prefixes in your keys that the app knows about and switch those? Aaron On 11/02/2011,

Re: rename column family

2011-02-10 Thread Aaron Morton
That should read "Without more information"AOn 11 Feb, 2011,at 10:15 AM, Aaron Morton wrote:With more information I'd say this is not a good idea. I would suggest looking at why you do the table switch in the MySql version and consider if it's still necessary in the Cassandra version. Could you

Re: Super Slow Multi-gets

2011-02-10 Thread Bill Speirs
I switched my implementation to use a thread pool of 10 threads each multi-getting 10 keys/rows. This reduces my time from 50s to 5s for fetching all 1,000 messages. I started looking through the Cassandra source to find where the parallel requests are actually made, and I believe it's in org.apac

Re: Super Slow Multi-gets

2011-02-10 Thread Aaron Morton
Assuming cassandra 0.7 in log4j-server.properties make it look like this...log4j.rootLogger=DEBUG,stdout,RAOn 11 Feb, 2011,at 10:30 AM, Bill Speirs wrote:I switched my implementation to use a thread pool of 10 threads each multi-getting 10 keys/rows. This reduces my time from 50s to 5s for fetchin

Re: Super Slow Multi-gets

2011-02-10 Thread Bill Speirs
Doesn't seem to help, I just get a bunch of messages that look like this: DEBUG - Transport open status true for client CassandraClient DEBUG - Status of releaseClient CassandraClient to queue: true DEBUG - Transport open status true for client CassandraClient And I got those before with my other

Limit on amount of CFs

2011-02-10 Thread Nick Santini
Hi, Reading in the documentation (specially on the tuning section) is clear the the number of Column Families affects the performance, in particular the amount of memory assigned to the heap. My question is: What's the hard limit on the number of CFs? Does anybody implemented an application with l

Re: Exceptions on 0.7.0

2011-02-10 Thread Aaron Morton
Can someone with a better understanding of CASSANDRA-1992 jump in ? AaronOn 11 Feb, 2011,at 02:51 AM, Attila Babo wrote:The same problem here, even with apache-cassandra-2011-02-10_06-30-00-bin.tar.gz from hudson. I'm happy to share the full log if needed or run tests to identify the core problem

Re: rename column family

2011-02-10 Thread Karl Hiramoto
On 02/10/11 22:19, Aaron Morton wrote: > That should read "Without more information" > > A > On 11 Feb, 2011,at 10:15 AM, Aaron Morton wrote: > >> With more information I'd say this is not a good idea. >> >> I would suggest looking at why you do the table switch in the MySql >> version and conside

Re: ORM over Cassandra

2011-02-10 Thread B. Todd Burruss
wiki page is here ... https://github.com/rantav/hector/wiki/Hector-Object-Mapper-(HOM) it does not handle relationships between objects yet, but does handle inheritance On 02/10/2011 12:21 PM, Jonathan Ellis wrote: An o

Re: Cassandra documentation

2011-02-10 Thread Sal Fuentes
I've read the book and overall I think its a great book. It should especially useful for those getting started. On Thu, Feb 10, 2011 at 1:06 PM, Aaron Morton wrote: > There is also a book http://oreilly.com/catalog/0636920010852 > > I've not read it yet, so cannot comment on it's quality. > > Aar

Basic Cassandra Architecture questions

2011-02-10 Thread mcasandra
I am reading interesting white paper about Dynamo. I might have to read it again :) but I have a simple question, when a request comes in which node handles the request first and how does it determine which node has the key/value? Also, how cassandra ensures that read/write always O(1) complexity?

Re: Basic Cassandra Architecture questions

2011-02-10 Thread Aaron Morton
Take a look at the introduction here http://thelastpickle.com/2011/02/07/Introduction-to-Cassandra/ I've tried to cover the basic how the cluster works questions. Let me know if you have any suggestions on how I can improve it.Short answer is a Gossip protocol is used, every node knows about every

Re: Basic Cassandra Architecture questions

2011-02-10 Thread Deming Shi
The client has to specify a node or a set of nodes in the cluster to connect to. These nodes/this node will handle the request first. In the cluster, nodes will gossip with each other about their information, so that it will know which node has the key/value. Stanley On Fri, Feb 11, 2011 at 9:36

RE: Basic Cassandra Architecture questions

2011-02-10 Thread Shu Zhang
"when a request comes in which node handles the request first" You (ie. cassandra client) always specifies the exact node to send requests to. While most higher level clients let's you specify configurations for a whole cluster, that's usually for their own basic load balancing. Each request any

Re: AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main]

2011-02-10 Thread Jonathan Ellis
sounds like CASSANDRA-1999, which is fixed for 0.7.1, although it's hard to say for sure w/o the full stack trace. You'll need to remove your HintsColumnFamily sstable files and upgrade. On Thu, Feb 10, 2011 at 8:02 PM, lztaomin wrote: > > Hi, > My two-node cluster, run the following error after

Re: Super Slow Multi-gets

2011-02-10 Thread Mark Guzman
I assume this should be set on all of the servers? Is there anything in particular one would look for in the log results? On Feb 10, 2011, at 4:37 PM, Aaron Morton wrote: > Assuming cassandra 0.7 in log4j-server.properties make it look like this... > > log4j.rootLogger=DEBUG,stdout,R > > > A

Re: Super Slow Multi-gets

2011-02-10 Thread Utku Can Topçu
Bill, It still sounds really strange. Can you reproduce it? And note down the steps; I'm sure people here would be pleased to repeat it. Regards, Utku On Fri, Feb 11, 2011 at 5:34 AM, Mark Guzman wrote: > I assume this should be set on all of the servers? Is there anything in > particular one