Re: Row caches

2012-09-20 Thread rohit reddy
Got it. Thanks for the replies On Fri, Sep 21, 2012 at 6:30 AM, aaron morton wrote: > Set the caching attribute for the CF. It defaults to keys_only, other > values are both or rows_only. > > See http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 > > Cheers > > - > Aaron Mo

Re: persistent compaction issue (1.1.4 and 1.1.5)

2012-09-20 Thread Michael Kjellman
Ended up switching the biggest offending column families back to size tiered compaction and pending compactions across the cluster dropped to 0 very quickly. On Sep 19, 2012, at 10:55 PM, "Michael Kjellman" wrote: > After changing my ss_table_size as recommended my pending compactions across

Re: Should row keys be inserted in ascending order?

2012-09-20 Thread Tyler Hobbs
Rows are actually stored on disk in the order of the hash of their keys when using RandomPartitioner. Furthermore, the rows are stored in SSTables, which are immutable, and are periodically compacted together. There's no shifting involved. This gives an overview: http://wiki.apache.org/cassandra

Re: Setting the default replication factor for Solandra cores

2012-09-20 Thread shubham srivastava
With Solandra as well you can use the Cassandra Cli to do the needful. The location would be [~/Solandra/bin/] . Regards, Shubham On Fri, Sep 21, 2012 at 6:56 AM, aaron morton wrote: > I want to set the replication factor = 2, > > This is part of the CREATE KEYSPACE command, not sure where this

Re: Correct model

2012-09-20 Thread aaron morton
> I created the following model: an UserCF, whose key is a userID generated by > TimeUUID, and a RequestCF, whose key is composite: UserUUID + timestamp. For > each user, I will store basic data and, for each request, I will insert a lot > of columns. I would consider: # User CF * row_key: use

Re: Setting the default replication factor for Solandra cores

2012-09-20 Thread aaron morton
> I want to set the replication factor = 2, This is part of the CREATE KEYSPACE command, not sure where this is in solandra. I would recommend using RF 3 as a minimum. > , and the default replications strategy to be RackAwareStrategy. That's a very old strategy. The default is NetworkTopolog

Re: Solr Use Cases

2012-09-20 Thread aaron morton
> Also, Cassandra is great for writes but not as optimized for reads. From cassandra 1.0 read throughout on a par with writes http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance You milage may vary depending on the workload. Cheers - Aaron Morton Freelanc

Re: Disk configuration in new cluster node

2012-09-20 Thread aaron morton
> Would it help if I partitioned the computing resources of my physical > machines into VMs? No. Just like cutting a cake into smaller pieces does not mean you can eat more without getting fat. In the general case, regular HDD and 1 Gbe and 8 to 16 virtual cores and 8GB to 16GB ram, you can e

Re: Row caches

2012-09-20 Thread aaron morton
Set the caching attribute for the CF. It defaults to keys_only, other values are both or rows_only. See http://www.datastax.com/dev/blog/caching-in-cassandra-1-1 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/09/2012, at 1:34 PM, Jaso

Re: Is Cassandra right for me?

2012-09-20 Thread aaron morton
> Actually, if I use community edition for now, I wouldn't be able to use > hadoop against data stored in CFS? AFAIK DSC is a packaged deployment of Apache Cassandra. You should be ale to use Hadoop against it, in the same way you can use hadoop against Apache Cassandra. You "can do" anything

Re: Using the commit log for external synchronization

2012-09-20 Thread Brian O'Neill
Along those lines... We sought to use triggers for external synchronization. If you read through this issue: https://issues.apache.org/jira/browse/CASSANDRA-1311 You'll see the idea of leveraging a commit log for synchronization, via triggers. We went ahead and implemented this concept in:

Re: Using the commit log for external synchronization

2012-09-20 Thread Michael Kjellman
+1. Would be a pretty cool feature Right now I write once to cassandra and once to kafka. On 9/20/12 4:13 PM, "Data Craftsman 木匠" wrote: >This will be a good new feature. I guess the development team don't >have time on this yet. ;) > > >On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmai

Re: Using the commit log for external synchronization

2012-09-20 Thread Data Craftsman 木匠
This will be a good new feature. I guess the development team don't have time on this yet. ;) On Thu, Sep 20, 2012 at 1:29 PM, Ben Hood <0x6e6...@gmail.com> wrote: > Hi, > > I'd like to incrementally synchronize data written to Cassandra into > an external store without having to maintain an ind

Re: Cassandra supercolumns with same name

2012-09-20 Thread Tyler Hobbs
If you're seeing that in cassandra-cli, it's possible that there are some non-printable characters in the name that the cli doesn't display, like the NUL char (ascii 0). I opened a ticket for that somewhere, but in the meantime, you may want to verify that they are identical with a real client. O

Re: [problem with OOM in nodes]

2012-09-20 Thread Tyler Hobbs
I'm not 100% that I understand your data model and read patterns correctly, but it sounds like you have large supercolumns and are requesting some of the subcolumns from individual super columns. If that's the case, the issue is that Cassandra must deserialize the entire supercolumn in memory when

Re: OOM when applying migrations

2012-09-20 Thread Tyler Hobbs
This should explain the schema issue in 1.0 that has been fixed in 1.1: http://www.datastax.com/dev/blog/the-schema-management-renaissance On Thu, Sep 20, 2012 at 10:17 AM, Jason Wee wrote: > Hi, when the heap is going more than 70% usage, you should be able to see > in the log, many flushing, o

Code example for CompositeType.Builder and SSTableSimpleUnsortedWriter

2012-09-20 Thread Edward Kibardin
Hi Everyone, I'm writing a conversion tool from CSV files to SSTable using SSTableSimpleUnsortedWriter and unable to find a good example of using CompositeType.Builder with SSTableSimpleUnsortedWriter. It also will be great if someone had an sample code for insert/update only a single value in com

Using the commit log for external synchronization

2012-09-20 Thread Ben Hood
Hi, I'd like to incrementally synchronize data written to Cassandra into an external store without having to maintain an index to do this, so I was wondering whether anybody is using the commit log to establish what updates have taken place since a given point in time? Cheers, Ben

Re: any ways to have compaction use less disk space?

2012-09-20 Thread Aaron Turner
1. Use compression 2. Used Leveled Compaction Also, 1TB/node is a lot larger then the normal recommendation... generally speaking more in the 300-400GB range. On Thu, Sep 20, 2012 at 8:10 PM, Hiller, Dean wrote: > While diskspace is cheap, nodes are not that cheap, and usually systems have > a

any ways to have compaction use less disk space?

2012-09-20 Thread Hiller, Dean
While diskspace is cheap, nodes are not that cheap, and usually systems have a 1T limit on each node which means we would love to really not add more nodes until we hit 70% disk space instead of the normal 50% that we have read about due to compaction. Is there any way to use less disk space du

Re: sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Tyler Hobbs
That's showing a client-side socket timeout. By default, the timeout for pycassa connections is fairly low, at 0.5 seconds. With the default batch insert size of 100 rows, you're probably hitting this timeout occasionally. I suggest lowering the batch size and using multiple threads for the highe

Re: Composite Column Types Storage

2012-09-20 Thread Sylvain Lebresne
> As I understand from the link below, burning column index-info onto the > sstable index files will not only eliminate sstables but also reduce disk > seeks from 3 to 2 for wide rows. Yes. > Shouldn't we be wary of the spike in heap usage by promoting column indexes > to index file? If you're t

Re: OOM when applying migrations

2012-09-20 Thread Jason Wee
Hi, when the heap is going more than 70% usage, you should be able to see in the log, many flushing, or reducing the row cache size down. Did you restart the cassandra daemon in the node that thrown OOM? On Thu, Sep 20, 2012 at 9:11 PM, Vanger wrote: > Hello, > We are trying to add new nodes to

OOM when applying migrations

2012-09-20 Thread Vanger
Hello, We are trying to add new nodes to our *6-node* cassandra cluster with RF=3 cassandra version 1.0.11. We are *adding 18 new nodes* one-by-one. First strange thing, I've noticed, is the number of completed MigrationStage in nodetool tpstats grows for every new node, while schema is not c

Re: [problem with OOM in nodes]

2012-09-20 Thread Denis Gabaydulin
p.s. Cassandra 1.1.4 On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin wrote: > Hi, all! > > We have a cluster with virtual 7 nodes (disk storage is connected to > nodes with iSCSI). The storage schema is: > > Reports:{ > 1:{ > 1:{"value1":"some val", "value2":"some val"}, > 2

Re: Losing keyspace on cassandra upgrade

2012-09-20 Thread Thomas Stets
A follow-up: Currently I'm back on version 1.1.1. I tried - unsuccessfully - the following things: 1. Create the missing keyspace on the 1.1.5 node, then copy the files back into the data directory. This failed, since the keyspace was already known on the other node in the cluster. 2. shut down

Re: Composite Column Types Storage

2012-09-20 Thread Ravikumar Govindarajan
As I understand from the link below, burning column index-info onto the sstable index files will not only eliminate sstables but also reduce disk seeks from 3 to 2 for wide rows. Our index files are always mmapped, so there is only one random seek for a named column query. I think that is a wonder

[problem with OOM in nodes]

2012-09-20 Thread Denis Gabaydulin
Hi, all! We have a cluster with virtual 7 nodes (disk storage is connected to nodes with iSCSI). The storage schema is: Reports:{ 1:{ 1:{"value1":"some val", "value2":"some val"}, 2:{"value1":"some val", "value2":"some val"} ... }, 2:{ 1:{"value1":"some

Re: sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu
forgot to mention the rpc configuration in cassandra.yaml is: rpc_timeout_in_ms: 2 and the cassandra version on production server is: 1.1.3 the cassandra version I am using on my macbook is: 1.0.10 On Thu, Sep 20, 2012 at 6:07 PM, Yan Chunlu wrote: > I am testing the performance of 1 cas

sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu
I am testing the performance of 1 cassandra node on a production server. I wrote a script to insert 1 million items into cassandra. the data is like below: *prefix = "benchmark_"* *dct = {}* *for i in range(0,100):* *key = "%s%d" % (prefix,i)* *dct[key] = "abc"*200* and the inserting

Re: Data Modeling - JSON vs Composite columns

2012-09-20 Thread Sylvain Lebresne
On Wed, Sep 19, 2012 at 3:32 PM, Brian O'Neill wrote: > That said, I'm keeping a close watch on: > https://issues.apache.org/jira/browse/CASSANDRA-3647 > > But if this is CQL only, I'm not sure how much use it will be for us > since we're coming in from different clients. > Anyone know how/if coll

Re: Invalid Counter Shard errors?

2012-09-20 Thread Alain RODRIGUEZ
Oh, i just saw your first mail. "I don't see a negative number in you paste?" (03a227f0-a5c3-11e1--b7f5e49dceff, 1, -1) and (03a227f0-a5c3-11e1--b7f5e49dceff, 1, 1) (03a227f0-a5c3-11e1--b7f5e49dceff, 4, -5000) and (03a227f0-a5c3-11e1--b7f5e49dceff, 4, 2) (03a227f0-a5c3-11e1-00

Re: Data Modeling - JSON vs Composite columns

2012-09-20 Thread Sylvain Lebresne
On Wed, Sep 19, 2012 at 2:00 PM, Roshni Rajagopal wrote: > Hi, > > There was a conversation on this some time earlier, and to continue it > > Suppose I want to associate a user to an item, and I want to also store 3 > commonly used attributes without needing to go to an entity item column > famil

Re: CQL3 - collections

2012-09-20 Thread Sylvain Lebresne
I wrote an answer on the blog post (http://www.datastax.com/dev/blog/cql3_collections#comment-127093). -- Sylvain On Thu, Sep 20, 2012 at 7:13 AM, Roshni Rajagopal wrote: > Hi, > > CQL3, has collections support as described in this link > http://www.datastax.com/dev/blog/cql3_collections > > So

Re: Invalid Counter Shard errors?

2012-09-20 Thread Alain RODRIGUEZ
"I think that's inconsistent with the hypothesis that unclean shutdown is the sole cause of these problems" I agree, we just never shut down any node, neither had any crash, and yet we have these bugs. About your side note : We know about it, but we couldn't find any other way to be able to prov

Re: Losing keyspace on cassandra upgrade

2012-09-20 Thread Thomas Stets
On Wed, Sep 19, 2012 at 5:12 PM, Michael Kjellman wrote: > Sounds like you are loosing your system keyspace. When you say nothing > important changed between yaml files do you mean with or without your > changes? > I compared the 1.1.1 cassandra.yaml (with my changes) to the cassandra.yaml distri