Re: rename column family

2011-02-13 Thread Aaron Morton
Forgot to put on the end of this, you could take that approach but it's not what CF's are designed for. Delete's are relatively cheap compared to MySql etc because most of the work is done in the compaction. My first approach would be to use row keys with prefixes, switch at the application level,

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread aaron morton
AFAIK yes. Until your row is column_index_size_in_kb in size (and in some circumstances a compaction must have run) the code has to scan through all of the columns in the row to find the 150-200 you want. From the help in cassandra.yaml # Add column indexes to a row after its contents reach t

NFS instead of local storage

2011-02-13 Thread mcasandra
I just now watched some videos about performance tunning. And it looks like most of the bottleneck could be on reads. Also, it looks like it's advisable to put commit logs on separate drive. I was wondering if it makes sense to use NFS (if we can) with netapp array which provides it's own read an

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread Aditya Narayan
Jonathan, If I ask for around 150-200 columns (totally random not sequential) from a very wide row that contains more than a million or even more columns then, is the read performance of the SliceQuery operation affected by or "depends on the length of the row" ?? (For my use case, I would use the

Re: Question about seeds in tow node cluster.

2011-02-13 Thread Aaron Morton
The can be, but it's not necessary. Aaron On 13/02/2011, at 9:04 PM, Xiaobo Gu wrote: > Hi, > If the cluster only have tow nodes, should they both in the seeds list? > > Regards, > > Xiaobo Gu

Re: Basic Cassandra Architecture questions

2011-02-13 Thread Aaron Morton
You can get consistency by using Quorum, or write at All and read at one, or write at one and read at All Start with quorum. If you read at one, then read repair will work in the background to fix the data. But the result returned to your client may be inconsistent. Aaron On 12/02/2011, at 7:

Re: Extra Large Memtables

2011-02-13 Thread Tyler Hobbs
I should note up front that the JVM simply does not handle heap sizes above 20G very well because the GC starts to become problematic. Do you read rows in a uniformly random way? If not, caching is your best bet for reducing read latencies. You should have enough space to cache all of your keys,

Re: Column name size

2011-02-13 Thread Aaron Morton
FWIW I would first try to reduce the number of columns, before reducing their name length. If you always pull back the same columns (e.g. User details) consider packing them in json dict and storing them in one column. Aaron On 12/02/2011, at 5:22 AM, Chris Burroughs wrote: > On 02/11/2011 05

Re: How to store news lists in optimal way?

2011-02-13 Thread Aaron Morton
The best way to store things depends on how you want to read them back. You could use a compound key such as user/listtype and then store the items in the lists as column were the col name is a timestamp and the col value is a packed data structure like json. As bill says, don't create a CF per

Re: rename column family

2011-02-13 Thread Aaron Morton
There are functions on the Cassandra API to rename and drop column families, see http://wiki.apache.org/cassandra/API dropping a CF does not immediately free up the disk space, see the docs. AFAIK the rename is not atomic across the cluster (that would require locks) so you best bet would be t

Re: Explaining the Replication Factor, N and W and R

2011-02-13 Thread Janne Jalkanen
> Excellent! How about adding Hinted Handoff enabled/disabled option? Sure, once I understand it ;-) /Janne

Re: Explaining the Replication Factor, N and W and R

2011-02-13 Thread Rustam Aliyev
On 13/02/2011 13:49, Janne Jalkanen wrote: Folks, as it seems that wrapping the brain around the R+W>N concept is a big hurdle for a lot of users, I made a simple web page that allows you to try out the different parameters and see how they affect the system. http://www.ecyrd.com/cassandracal

Re: Explaining the Replication Factor, N and W and R

2011-02-13 Thread Eric Evans
On Sun, 2011-02-13 at 15:49 +0200, Janne Jalkanen wrote: > as it seems that wrapping the brain around the R+W>N concept is a big > hurdle for a lot of users, I made a simple web page that allows you to > try out the different parameters and see how they affect the system. > > http://www.ecyrd.com/

Re: Does Cassandra support multiple listen_address and rpc_address?

2011-02-13 Thread Edward Capriolo
On Sun, Feb 13, 2011 at 1:39 AM, Xiaobo Gu wrote: > multiple network paths for inner-cluster communication will boost performance > > Thanks. > > Xiaobo Gu > No. Each node has a single IP. You can boost performance in a similar way with Ethernet bonding, or 10G

Re: Secondary index - keys only.

2011-02-13 Thread Jonathan Ellis
No. On Sun, Feb 13, 2011 at 8:48 AM, Shay Assulin wrote: > HI, > > Is there a way to get only the keys of indexed rows (without getting > columns) using get_indexed_slices method? > > I am using Hector to access Cassandra and I want to count rows with a > specific index - so i need to get only th

Re: delete key permanently

2011-02-13 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#range_ghosts On Sun, Feb 13, 2011 at 9:08 AM, Mark Zitnik wrote: > Hi, > > I would like to delete a key permanently in cassandra 0.7 and not receive it > in get range api. > Is it possible. > Thanks > > > -- Jonathan Ellis Project Chair, Apache Cassandra c

Re: Confused about get_slice SliceRange behavior with bloom filter

2011-02-13 Thread Jonathan Ellis
On Sun, Feb 13, 2011 at 12:37 AM, E S wrote: > I've gotten myself really confused by > http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping someone > can > help me understand what the io behavior of this operation would be. > > When I do a get_slice for a column range, will it see

delete key permanently

2011-02-13 Thread Mark Zitnik
Hi, I would like to delete a key permanently in cassandra 0.7 and not receive it in get range api. Is it possible. Thanks

Secondary index - keys only.

2011-02-13 Thread Shay Assulin
HI, Is there a way to get only the keys of indexed rows (without getting columns) using get_indexed_slices method? I am using Hector to access Cassandra and I want to count rows with a specific index - so i need to get only the keys. I am doing the following: n = 0 while (true) { i

Explaining the Replication Factor, N and W and R

2011-02-13 Thread Janne Jalkanen
Folks, as it seems that wrapping the brain around the R+W>N concept is a big hurdle for a lot of users, I made a simple web page that allows you to try out the different parameters and see how they affect the system. http://www.ecyrd.com/cassandracalculator/ Let me know if you have any suggest

Re: Limit on amount of CFs

2011-02-13 Thread Peter Schuller
> But when modeling the application I understand so far that ColumnFamily is > sort of "table with objects". In typical application there are lot of tables > so why is the mindset set towards having more or less 10 ColumnFamilies? > Even in this trivial example there are already 7 CFs > http://www.

Re: Limit on amount of CFs

2011-02-13 Thread Filip Nguyen
On 13.2.2011 11:40, Peter Schuller wrote: Reading in the documentation (specially on the tuning section) is clear the the number of Column Families affects the performance, in particular the amount of memory assigned to the heap. My question is: What's the hard limit on the number of CFs? Does a

Re: Partioning and Sorting is it CF Key or Column Key?

2011-02-13 Thread Peter Schuller
> Some questions I have: Answering two of them independently of your Java snippet; not sure what you intend to be read into it. > 1) Is partitioning based on CF.KEY or KEY of Column? From what I read it's > based on column keys and not the CF keys but want to confirm. Partitioning is based on ro

Re: Limit on amount of CFs

2011-02-13 Thread Peter Schuller
> Reading in the documentation (specially on the tuning section) is clear the > the number of Column Families affects the performance, in particular the > amount of memory assigned to the heap. > My question is: What's the hard limit on the number of CFs? > Does anybody implemented an application w

Re: Do supercolumns have a purpose?

2011-02-13 Thread David Boxenhorn
I agree, that is the way to go. Then each piece of new functionality will not have to be implemented twice. On Sat, Feb 12, 2011 at 9:41 AM, Stu Hood wrote: > I would like to continue to support super columns, but to slowly convert > them into "compound column names", since that is really all th

Question about seeds in tow node cluster.

2011-02-13 Thread Xiaobo Gu
Hi, If the cluster only have tow nodes, should they both in the seeds list? Regards, Xiaobo Gu