RE: multiple Datacenter values in PropertyFileSnitch

2013-04-11 Thread Matthias Zeilinger
I´m using for each application it´s own keyspace. What I want is to split up for different load patterns. So that 2 apps with same and very high load pattern are not clashing. For other load patterns I want to use another splitting. Is there any best practice or should I scale out, so that the co

RE: Does Memtable resides in Heap?

2013-04-11 Thread Viktor Jevdokimov
Memtables resides in heap, write rate impacts GC, more writes - more frequent and longer ParNew GC pauses. From: Jay Svc [mailto:jaytechg...@gmail.com] Sent: Friday, April 12, 2013 01:03 To: user@cassandra.apache.org Subject: Does Memtable resides in Heap? Hi Team, I have got this 8GB of RAM o

Re: CorruptedBlockException

2013-04-11 Thread Lanny Ripple
Saw this in earlier versions. Our workaround was disable; drain; snap; shutdown; delete; link from snap; restart; -ljr On Apr 11, 2013, at 9:45, wrote: > I have formulated the following theory regarding C* 1.2.2 which may be > relevant: Whenever there is a disk error during compaction of an

Broken pipe when variating a lot number of connections

2013-04-11 Thread Rodrigo Felix
Hi, I've been changing a benchmarking tool (YCSB) to vary the number of clients throughout a workload execution and, for some reason, I believe Cassandra is facing some problems to handle the variation (both up and down) on the number of connections. Each client has a connection and clients ar

Does Memtable resides in Heap?

2013-04-11 Thread Jay Svc
Hi Team, I have got this 8GB of RAM out of that 4GB allocated to Java Heap. My question is the size of Memtable does it contribute to heap size? or they are part of off-heap? Does bigger Memtable would have impact on GC and overall memory management? I am using DSE 3.0 / Cassandra 1.1.9. Thanks

Re: running cassandra on 8 GB servers

2013-04-11 Thread Nikolay Mihaylov
I am using 1.2.3, used default heap - 2 GB without JNA installed, then modified heap to 4 GB / 400 MB young generation. + JNA installed. bloom filter on the CF's is lowered (more false positives, less disk space). WARN [ScheduledTasks:1] 2013-04-11 11:09:41,899 GCInspector.java (line 142) Heap is

Re: running cassandra on 8 GB servers

2013-04-11 Thread Edward Capriolo
With that much data per node you have to raise the IndexInterval and adjust the bloom filter settings. Although the bloom filters are off heap now having that much data can but a strain on physical memory. On Thu, Apr 11, 2013 at 4:26 PM, aaron morton wrote: > > The data will be huge, I am estim

Re: running cassandra on 8 GB servers

2013-04-11 Thread aaron morton
> The data will be huge, I am estimating 4-6 TB per server. I know this is > best, but those are my resources. You will have a very unhappy time. The general rule of thumb / guideline for a HDD based system with 1G networking is 300GB to 500Gb per node. See previous discussions on this topic fo

Re: Two Cluster each with 12 nodes- Cassandra database

2013-04-11 Thread aaron morton
> I will be using `Pelops client` If you are starting out using Java I *strongly* suggest using this client https://github.com/Netflix/astyanax/ see the documentation here https://github.com/Netflix/astyanax/wiki > My understanding was to create the cluster with all the `24 nodes` as I will >

Re: CorruptedBlockException

2013-04-11 Thread aaron morton
> Whenever there is a disk error during compaction of an SS table (e.g., bad > block, out of disk space), that SStable’s files stick around forever after > Fixed in 1.1.1 https://issues.apache.org/jira/browse/CASSANDRA-2261 > We are using 1.1.5, besides that I have tried to run cleanup, with no

Re: is the select result grouped by the value of the partition key?

2013-04-11 Thread aaron morton
> Is it guaranteed that the rows are grouped by the value of the partition key? > That is, is it guaranteed that I'll get Your primary key (k1, k2) is considered in type parts (partition_key , grouping_columns). In your case the primary_key is key and the grouping column in k2. Columns are order

Re: Compaction, truncate, cqlsh problems

2013-04-11 Thread aaron morton
> Can you please elaborate on the specials of truncate? I think ed was talking about this config setting in 1.2 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L484 > It works, only sometimes it silently fails (1 in 400 runs of the truncate, > actually). The data is left in p

running cassandra on 8 GB servers

2013-04-11 Thread Nikolay Mihaylov
For one project I will need to run cassandra on following dedicated servers: Single CPU XEON 4 cores no hyper-threading, 8 GB RAM, 12 TB locally attached HDD's in some kind of RAID, visible as single HDD. I can do cluster of 20-30 such servers, may be even more. The data will be huge, I am estim

Re: (info) Abort the seek op in SSTableIdentityIterator class.

2013-04-11 Thread aaron morton
When created by the SSTableScanner the dataStart passed in is the existing file position so it may not be necessary. But it may be sane to do it and the seek() call may not result in disk reads. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton htt

Re: describe keyspace or column family query not working

2013-04-11 Thread aaron morton
tables created without COMPACT STORAGE are still visible in cassandra-cli. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 11/04/2013, at 5:40 AM, Tyler Hobbs wrote: > > On Wed, Apr 10, 2013 at 11:09 AM, Vivek Mish

Re: Two Cluster each with 12 nodes- Cassandra database

2013-04-11 Thread Jabbar Azam
Hello, I don't know what pelops is. I'm not sure why you want two clusters. I would have two clusters if I want to have data stored on totally separate servers for perhaps security reasons. If you are going to have the servers in one location then you might as well have one cluster. You'll have t

Re: multiple Datacenter values in PropertyFileSnitch

2013-04-11 Thread aaron morton
A node can only exist in one DC and one rack. Use different keyspaces as suggested. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 12/04/2013, at 1:47 AM, Jabbar Azam wrote: > Hello, > > I'm not an expert but I

Re: Two Cluster each with 12 nodes- Cassandra database

2013-04-11 Thread Raihan Jamal
Folks, Any thoughts on this? I am still in the learning process. So any guidance will be of great help. *Raihan Jamal* On Wed, Apr 10, 2013 at 10:39 PM, Raihan Jamal wrote: > I have started working on a project in which I am using `Cassandra > database`. > > Our production DBA's have setup

[RELEASE] Apache Cassandra 1.2.4 released

2013-04-11 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.2.4. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassand

is the select result grouped by the value of the partition key?

2013-04-11 Thread Sorin Manolache
Hello, Let us consider that we have a table t created as follows: create table t(k1 vachar, k2 varchar, value varchar, primary key (k1, k2)); Its contents is a m x a n y z 0 9 z 1 8 and I perform a select * from p where k1 in ('a', 'z'); Is it guaranteed that the rows are grouped by the val

Re: Compaction, truncate, cqlsh problems

2013-04-11 Thread Ondřej Černoš
Hi, I have JNA (cassandra only complains about obsolete version - Obsolete version of JNA present; unable to read errno. Upgrade to JNA 3.2.7 or later - I have stock centos version 3.2.4). Usage of separate CFs for each test run is difficult to set up. Can you please elaborate on the specials of

Re: Column index vs Row index vs Denormalizing

2013-04-11 Thread Coen Stevens
Thanks for the feedback! We will be going forward by implementing and deploying the proposed model, and test it out. Cheers, Coen On Thu, Apr 11, 2013 at 12:21 PM, aaron morton wrote: > Retrieving the latest 1000 tweets (of a given day) is trivial by > requesting the streamTweets columnFamily.

Re: Compaction, truncate, cqlsh problems

2013-04-11 Thread Edward Capriolo
If you do not have JNA truncate has to fork an 'ln -s'' command for the snapshots. I think that makes it un-predicatable. Truncate has its own timeout value now (separate from the other timeouts). If possible I think it is better to make each test use it's own CF and avoid truncate entirely. On T

RE: CorruptedBlockException

2013-04-11 Thread moshe.kranc
I have formulated the following theory regarding C* 1.2.2 which may be relevant: Whenever there is a disk error during compaction of an SS table (e.g., bad block, out of disk space), that SStable's files stick around forever after, and do not subsequently get deleted by normal compaction (minor

Re: CorruptedBlockException

2013-04-11 Thread Alexis Rodríguez
Aaron, It seems that we are in the same situation as Nury, we are storing a lot of files of ~5MB in a CF. This happens in a test cluster, with one node using cassandra 1.1.5, we have commitlog in a different partition than the data directory. Normally our tests use nearly 13 GB in data, but when

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Bingo! Thanks to both of you. (the C* community rocks) A few hours worth of work, and I've got a working REST-based photo repository backed by C* using the CQL java driver. =) rock on, thanks again, -brian On Thu, Apr 11, 2013 at 9:33 AM, Sylvain Lebresne wrote: > > I assume I'm doing someth

Compaction, truncate, cqlsh problems

2013-04-11 Thread Ondřej Černoš
Hi, I use C* 1.2.3 and CQL3. I integrated cassandra into our testing environment. In order to make the tests repeatable I truncate all the tables that need to be empty before the test run via ssh session to the host cassandra runs on and by running cqlsh where I issue the truncate. It works, onl

(info) Abort the seek op in SSTableIdentityIterator class.

2013-04-11 Thread dong.yajun
Hello, I read the source code of SSTableIdentityIterator with v-1.0.9, and I thought the following code is not necessary, did I miss anything? RandomAccessReader file = (RandomAccessReader) input; file.seek(this.dataStart); here, the value of dataStart is assigned

Re: multiple Datacenter values in PropertyFileSnitch

2013-04-11 Thread Jabbar Azam
Hello, I'm not an expert but I don't think you can do what you want. The way to separate data for applications on the same cluster is to use different tables for different applications or use multiple keyspaces, a keyspace per application. The replication factor you specify for each keyspace speci

Re: Blobs in CQL?

2013-04-11 Thread Sylvain Lebresne
> I assume I'm doing something wrong in the select. Am I incorrectly using > the ResultSet? > You're incorrectly using the returned ByteBuffer. But you should not feel bad, that API kinda sucks. The short version is that .array() returns the backing array of the ByteBuffer. But there is no guara

Re: Blobs in CQL?

2013-04-11 Thread Gabriel Ciuloaica
That's right, there is some padding there... So, instead of getting calling array(), you have to do something like: byte[] data = resultSet.one().getBytes("data"); int length = data.remaining(); blobBytes = new byte[length]; data.get(blobBytes, 0, length); Gabi On 4/11/13 4:09 PM, Brian O'Nei

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Sylvain, Interesting, when I look at the actual bytes returned, I see the byte array is prefixed with the keyspace and table name. I assume I'm doing something wrong in the select. Am I incorrectly using the ResultSet? -brian On Thu, Apr 11, 2013 at 9:09 AM, Brian O'Neill wrote: > Yep, it wor

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Yep, it worked like a charm. (PreparedStatement avoided the hex conversion) But now, I'm seeing a few extra bytes come back in the select…. (I'll keep digging, but maybe you have some insight?) I see this: ERROR [2013-04-11 13:05:03,461] com.skookle.dao.RepositoryDao: repository.add() byte.lengt

multiple Datacenter values in PropertyFileSnitch

2013-04-11 Thread Matthias Zeilinger
Hi, I would like to create big cluster for many applications. Within this cluster I would like to separate the data for each application, which can be easily done via different virtual datacenters and the correct replication strategy. What I would like to know, if I can specify for 1 node multip

Re: Blobs in CQL?

2013-04-11 Thread Sylvain Lebresne
> Hopefully, the prepared statement doesn't do the conversion. > It does not. > (I'm not sure if it is a limitation of the CQL protocol itself) > > thanks again, > -brian > > > > --- > Brian O'Neill > Lead Architect, Software Development > Health Market Science > The Science of Better Results >

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Cool. That might be it. I'll take a look at PreparedStatement. For query building, I took a look under the covers, and even when I was passing in a ByteBuffer, it runs through the following code in the java-driver: Utils.java: if (value instanceof ByteBuffer) { sb.append("0x"); s

Re: Blobs in CQL?

2013-04-11 Thread Gabriel Ciuloaica
I'm not using the query builder but the PreparedStatement. Here is the sample code: https://gist.github.com/devsprint/5363023 Gabi On 4/11/13 3:27 PM, Brian O'Neill wrote: Great! Thanks Gabriel. Do you have an example? (are using QueryBuilder?) I couldn't find the part of the API that allowe

Re: Blobs in CQL?

2013-04-11 Thread Brian O'Neill
Great! Thanks Gabriel. Do you have an example? (are using QueryBuilder?) I couldn't find the part of the API that allowed you to pass in the byte array. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King o

Re: Blobs in CQL?

2013-04-11 Thread Gabriel Ciuloaica
Hi Brian, I'm using the blobs to store images in cassandra(1.2.3) using the java-driver version 1.0.0-beta1. There is no need to convert a byte array into hex. Br, Gabi On 4/11/13 3:21 PM, Brian O'Neill wrote: I started playing around with the CQL driver. Has anyone used blobs with it yet?

Blobs in CQL?

2013-04-11 Thread Brian O'Neill
I started playing around with the CQL driver. Has anyone used blobs with it yet? Are you forced to convert a byte[] to hex? (e.g. I have a photo that I want to store in C* using the java-driver API) -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mo

Re: Exception for version 1.1.0

2013-04-11 Thread aaron morton
Fixed in 1.1.11 due out soon https://issues.apache.org/jira/browse/CASSANDRA-5284 Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/04/2013, at 7:35 PM, Winsdom Chen wrote: > Hi, > I've lot of assertion error i

Re: CDH4 + Cassandra 1.2 Integration Issue

2013-04-11 Thread aaron morton
cqlsh in cassandra 1.2 defaults to cql 3. - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/04/2013, at 6:55 PM, Gurminder Gill wrote: > Ah ha. So, the client defaults to CQL 2. Anyway of changing that? I tired > libthrif

Re: Cassandra 1.2.2 cluster + raspberry

2013-04-11 Thread aaron morton
> I've already tried to set internode_compression: none in my yaml files. What version are you on? If you've set internode_compression to none and restarted? Can you double check. The code stack shows cassandra deciding that the connection should be compressed. Cheers - Aaron

Re: other questions about // RE: batch_mutate

2013-04-11 Thread aaron morton
> Is it true the coordinator node treats them as __independent__ > communications/requests to replicas (even if in that case, the replicas are > the same for every request) ? A row mutation is a request to store columns in one or more CF's using one row key. It is treated as indivisible by the c

Re: data modeling from batch_mutate point of view

2013-04-11 Thread aaron morton
> b) the "batch_mutate" advantages are better, for the communication > "client<=>coordinator node" __and__ for the communications "coordinator > node<=>replicas". Yes. A single row mutation can write to many CFs. > Is there any experience out there about such data modeling (option_a vs > optio

Re: Column index vs Row index vs Denormalizing

2013-04-11 Thread aaron morton
> Retrieving the latest 1000 tweets (of a given day) is trivial by requesting > the streamTweets columnFamily. If you normally want to get the most recent items use a reverse comparator on the column name see http://thelastpickle.com/2011/10/03/Reverse-Comparators/ > Getting the latest tweets

Re: Added column does not sort as the last column at

2013-04-11 Thread aaron morton
To reduce possibilities, have you changed a super CF to a standard CF recently ? Can you isolate this to specific CF ? Have you changed the comparators / schema recently ? Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle