Re: is there a key to sstable index file?

2013-07-18 Thread Michał Michalski

Thanks! :-)

M.

W dniu 18.07.2013 08:42, Jean-Armel Luce pisze:

@Michal : look a this for the improvement of read performance  :
https://issues.apache.org/jira/browse/CASSANDRA-2498

Best regards.
Jean Armel


2013/7/18 Michał Michalski 


SSTables are immutable - once they're written to disk, they cannot be
changed.

On read C* checks *all* SSTables [1], but to make it faster, it uses Bloom
Filters, that can tell you if a row is *not* in a specific SSTable, so you
don't have to read it at all. However, *if* you read it in case you have
to, you don't read a whole SSTable - there's an in-memory Index Sample,
that is used for binary search and returning only a (relatively) small
block of real (full, on-disk) index, which you have to scan  to find a
place to retrieve the data from SSTable. Additionally you have a KeyCache
to make reads faster - it points location of data in SSTable, so you don't
have to touch Index Sample and Index at all.

Once C* retrieves all data "parts" (including the Memtable part),
timestamps are used to find the most recent version of data.

[1] I believe that it's not true for all cases, as I saw a piece of code
somewhere in the source, that starts checking SSTables in order from the
newest to the oldest one (in terms of data timestamps - AFAIR SSTable
MetaData stores info about smallest and largest timestamp in SSTable), and
once the newest data for all columns are retrieved (assuming that schema is
defined), retrieving data stops and older SSTables are not checked. If
someone could confirm that it works this way and it's not something that I
saw in my dream and now believe it's real, I'd be glad ;-)

W dniu 17.07.2013 22:58, S Ahmed pisze:

  Since SSTables are mutable, and they are ordered, does this mean that

there
is a index of key ranges that each SS table holds, and the value could be
1
more sstables that have to be scanned and then the latest one is chosen?

e.g. Say I write a value "abc" to CF1.  This gets stored in a sstable.

Then I write "def" to CF1, this gets stored in another sstable eventually.

How when I go to fetch the value, it has to scan 2 sstables and then
figure
out which is the latest entry correct?

So is there an index of key's to sstables, and there can be 1 or more
sstables per key?

(This is assuming compaction hasn't occurred yet).










Re: InvalidRequestException(why:Not enough bytes to read value of component 0)

2013-07-18 Thread Vivek Mishra
+1 for Sylvain's answer.

This normally happens, if validation class for column value(s) differs.

-Vivek


On Thu, Jul 18, 2013 at 12:08 PM, Sylvain Lebresne wrote:

> I don't know Hector very much really, but I highly suspect that
> "ts.toString()" is wrong, since composite column names are not strings. So
> again, not a Hector expert, but I can't really see how converting the
> composite into string could work.
>
> --
> Sylvain
>
>
> On Wed, Jul 17, 2013 at 11:14 PM, Rahul Gupta wrote:
>
>>  Getting error while trying to persist data in a Column Family having a
>> CompositeType comparator
>>
>> ** **
>>
>> *Using Cassandra ver 1.1.9.7*
>>
>> *Hector Core ver 1.1.5 API ( which uses Thrift 1.1.10)*
>>
>> ** **
>>
>> *Created Column Family using cassandra-cli:*
>>
>> ** **
>>
>> create column family event_counts
>>
>> with comparator = 'CompositeType(DateType,UTF8Type)'
>>
>> and key_validation_class = 'UUIDType'
>>
>> and default_validation_class = 'CounterColumnType';
>>
>> ** **
>>
>> *Persistence Code(*sumLoad.java):
>>
>> ** **
>>
>> *import* me.prettyprint.cassandra.serializers.StringSerializer;
>>
>> *import* me.prettyprint.cassandra.serializers.DateSerializer;
>>
>> *import* me.prettyprint.cassandra.service.CassandraHostConfigurator;
>>
>> *import* me.prettyprint.hector.api.Cluster;
>>
>> *import* me.prettyprint.hector.api.Keyspace;
>>
>> *import* me.prettyprint.hector.api.beans.Composite;
>>
>> *import* me.prettyprint.hector.api.beans.HCounterColumn;
>>
>> *import* me.prettyprint.hector.api.factory.HFactory;
>>
>> *import* me.prettyprint.hector.api.mutation.Mutator;
>>
>> *import* java.sql.Date;
>>
>> *import* java.util.logging.Level;
>>
>> ** **
>>
>> *public* *class* sumLoad { 
>>
>> 
>>
>> *final* *static* Cluster *cluster* = HFactory.*
>> getOrCreateCluster*("Dev", *new* CassandraHostConfigurator("
>> 100.10.0.6:9160"));
>>
>> *final* *static* Keyspace *keyspace* = HFactory.*
>> createKeyspace*("Events", *cluster*);
>>
>> *final* *static* StringSerializer *ss* =
>> StringSerializer.*get*();
>>
>> ** **
>>
>> *private* *boolean* storeCounts(String vKey, String
>> counterCF, Date dateStr, String vStr, *long* value)
>>
>> {
>>
>> *try*
>>
>> {
>>
>>Mutator m1 = HFactory.*createMutator*(*
>> keyspace*, StringSerializer.*get*());
>>
>>
>>
>>Composite ts = *new* Composite();
>>
>>ts.addComponent(dateStr, DateSerializer.*get*());*
>> ***
>>
>>ts.addComponent(vStr, StringSerializer.*get*());**
>> **
>>
>>HCounterColumn hColumn_ts = HFactory.*
>> createCounterColumn*(ts.toString(), value, StringSerializer.*get*());
>>
>> 
>>
>>m1.insertCounter(vKey, counterCF, hColumn_ts);
>>
>>m1.execute(); 
>>
>>*return* *true*;
>>
>> }
>>
>> *catch*(Exception ex)
>>
>> {
>>
>> LOGGER.*log*(Level.*WARNING*, "Unable
>> to store record", ex);  
>>
>> }
>>
>> *return* *false*;
>>
>> }   
>>
>> ** **
>>
>> *public* *static* *void* main(String[] args) {
>>
>> ** **
>>
>> Date vDate = *new* Date(0);
>>
>> sumLoad SumLoad = *new* sumLoad();
>>
>> SumLoad.storeCounts("b9874e3e-4a0e-4e60-ae23-c3f1e575af93",
>> "event_counts", vDate, "StoreThisString", 673);
>>
>> }
>>
>> ** **
>>
>> }
>>
>> ** **
>>
>> *Error:*
>>
>> ** **
>>
>> [main] INFO me.prettyprint.cassandra.service.JmxMonitor - Registering JMX
>> me.prettyprint.cassandra.service_Dev:ServiceType=hector,MonitorType=hector
>> 
>>
>> Unable to store record
>>
>> *me.prettyprint.hector.api.exceptions.HInvalidRequestException*:
>> InvalidRequestException(why:Not enough bytes to read value of component 0)
>> 
>>
>> ** **
>>
>> ** **
>>
>> *Rahul Gupta*
>> This e-mail and the information, including any attachments, it contains
>> are intended to be a confidential communication only to the person or
>> entity to whom it is addressed and may contain information that is
>> privileged. If the reader of this message is not the intended recipient,
>> you are hereby notified that any dissemination, distribution or copying of
>> this communication is strictly prohibited. If you have received this
>> communication in error, please immediately notify the sender and destroy
>> the ori

Re: Intresting issue with getting Order By to work...

2013-07-18 Thread aaron morton
Here are some posts about CQL and Thrift

 http://thelastpickle.com/2013/01/11/primary-keys-in-cql/
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
http://www.datastax.com/dev/blog/thrift-to-cql3

Hope that helps. 

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/07/2013, at 11:38 PM, Tony Anecito  wrote:

> Thanks for the answers.
>  
> The reason why I ask is it is stated the composite keys are not the same as 
> Primary Key. I found no examples for thrift where it specifcally said the 
> composite key is a primary key required by order by. All the examples where 
> the words primary key were used were with CQL examples and I am seeing 
> postings where people had issues with Order By but no answers like what you 
> said.
>  
> If there was better documentation for Cassandra with working examples and 
> explnations about the differences between CQL and CLI I would not need to ask 
> questions on the users groups. I have also spotted major issues and tried to 
> help understand them for all users.
>  
> -Tony
> 
> From: aaron morton 
> To: Cassandra User  
> Sent: Wednesday, July 17, 2013 4:06 AM
> Subject: Re: Intresting issue with getting Order By to work...
> 
> > The use of Order By requires Primary Key which appears to be only supported 
> > by by using CQL and not Cassandra-cli. 
> Order By in CQL is the also supported on the thrift interface. 
> 
> When using thrift the order you get the columns back is the order the 
> Comparator puts them in. If you want them reversed the thrift API supports 
> that. 
> 
> > I read that thrift clients will not work with CQL created tables due to 
> > extra things created by the CQL. If so how can I create Primary Keys and be 
> > supported by thrift based clients??
> No.
> Do not access CQL tables with the thrift API. 
> 
> > Seems like Cassandra-cli should support creation of compound primary keys or
> It does. 
> See help on the CompositeType
> 
> > Also CQL tables are not visible via cli.so I can not see details on what 
> > was created by CQL and the cqlsh script has errors according to the latest 
> > Python windows program I tried.
> They are visible for read access. 
> 
> > I will post to Datastax the same question 
> Please ask questions to one group at a time so people do not waste their time 
> providing answers you already have. 
> 
> Cheers
> 
> 
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com/
> 
> On 17/07/2013, at 3:44 PM, Tony Anecito  wrote:
> 
> > Hi All,
> > 
> > Well I got most everything working I wanted using Cassandra then discovered 
> > I needed to use an Order By. I am using Cassandra 1.2.5.
> > The use of Order By requires Primary Key which appears to be only supported 
> > by by using CQL and not Cassandra-cli. So I dropped my table created uisng 
> > CLI and used CQL and was able to create a "Table". But when I went to 
> > insert data that worked fine on the cli created table I now get an 
> > exception:
> > Error while inserting 
> > com.datastax.driver.core.exceptions.InvalidQueryException: Unknown 
> > identifier type.
> > 
> > I read that thrift clients will not work with CQL created tables due to 
> > extra things created by the CQL. If so how can I create Primary Keys and be 
> > supported by thrift based clients??
> > 
> > I will post to Datastax the same question but trying to understand how to 
> > resolve cli vs CQL issue like this. Seems like Cassandra-cli should support 
> > creation of compound primary keys or CQL should create tables readable by 
> > thrift based clients. Is there some meta column info people should add?
> > Also CQL tables are not visible via cli.so I can not see details on what 
> > was created by CQL and the cqlsh script has errors according to the latest 
> > Python windows program I tried.
> > 
> > Thanks,
> > -Tony
> > 
> > 
> 
> 



Re: Minimum CPU and RAM for Cassandra and Hadoop Cluster

2013-07-18 Thread aaron morton
IMHO you will want 4 cores and 4 to 8 GB for each VM to run both Cassandra and 
Hadoop on the nodes. 

For comparison people often use an EC2 m1.xlarge which has 4 cores and 16GB. 

Also, I recommend anyone starting experiments with Cassandra and Hadoop use 
DataStax Enterprise. So you can focus on using Cassandra and Hadoop and not on 
installing Cassandra and Hadoop and HDFS and Hive etc. You can evaluate it in 
non production environments free of change 
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise 
Once you know if Cassandra and Hadoop is where you want to go, then dig into 
the installation side of things. 

Hope that helps. 

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/07/2013, at 3:42 AM, Martin Arrowsmith  
wrote:

> Even if we have one machine, will splitting it up into 2 nodes via a VM make 
> a difference ?
> Can it simulate 2 nodes of half the computing power ? Also, yes, this will 
> just be a test playground
> and not in production.
> 
> Thank you,
> 
> Martin
> 
> 
> On Mon, Jul 15, 2013 at 1:56 PM, Nate McCall  wrote:
> Good point. Just to be clear - my suggestions all assume this is a
> testing/playground/get a feel setup. This is a bad idea for
> performance testing (not to mention anywhere near production).
> 
> On Mon, Jul 15, 2013 at 3:02 PM, Tim Wintle  wrote:
> > I might be missing something, but if it is all on one machine then why use
> > Cassandra or hadoop?
> >
> > Sent from my phone
> >
> > On 13 Jul 2013 01:16, "Martin Arrowsmith" 
> > wrote:
> >>
> >> Dear Cassandra experts,
> >>
> >> I have an HP Proliant ML350 G8 server, and I want to put virtual
> >> servers on it. I would like to put the maximum number of nodes
> >> for a Cassandra + Hadoop cluster. I was wondering - what is the
> >> minimum RAM and memory per node I that I need to have Cassandra + Hadoop
> >> before the performance decreases are not worth the extra nodes?
> >>
> >> Also, what is the suggested typical number of CPU cores / Node ? Would
> >> it make sense to have 1 core / node ? Less than that ?
> >>
> >> Any insight is appreciated! Thanks very much for your time!
> >>
> >> Martin
> 



Re: is there a key to sstable index file?

2013-07-18 Thread aaron morton
This webinar I did a few months ago goes through the read and write path 

http://www.youtube.com/watch?v=zFCjekgK7ZY

I get to that about 29 minutes in. 

slides 
http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/07/2013, at 9:56 AM, Robert Coli  wrote:

> On Wed, Jul 17, 2013 at 1:53 PM, S Ahmed  wrote:
> So is there an index of key's to sstables, and there can be 1 or more 
> sstables per key?
> 
> There are bloom filters, which answer the question "is my row key definitely 
> not in this SSTable"?
> 
> There is also the Key Cache, which is a list of SSTables a given row key is 
> known to be in, and at what offset.
> 
> =Rob



Re: AbstractCassandraDaemon.java (line 134) Exception in thread

2013-07-18 Thread aaron morton
> Double check the stack size is to set 100K see 
> https://github.com/apache/cassandra/blob/cassandra-1.1/conf/cassandra-env.sh#L187
Sorry that was a late night typo, should have been 180K like in the link. 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/07/2013, at 10:19 AM, Dave Brosius  wrote:

> 
> What is your -Xss set to. If it's below 256m, set it there, and see if you 
> still have the issues.
> 
> 
> 
> - Original Message -
> From: "Julio Quierati"  
> Sent: Tue, July 16, 2013 14:26
> Subject: AbstractCassandraDaemon.java (line 134) Exception in thread
> 
> Hello,
>  
> At least 2 times a day I'm having hundreds of log entry related to exception 
> below, the network bottleneck seems, anyone know how to solve, or encountered 
> this problem
>  
> Using
>   C* version 1.1.4
> &n bsp;  and
> Java version "1.7.0_21"
> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
> AWS m1.xlarge
>  
> ERROR [Thrift:967696] 2013-07-16 12:36:12,514 AbstractCassandraDaemon.java 
> (line 134) Exception in thread Thread[Thrift:967696,5,main]
> java.lang.StackOverflowError
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:150)
>   at java.net.SocketInputStream.read(SocketInputStream.java:121)
>   at java.io.BufferedInputStream.fill(Bu fferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>   at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>   at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>   at org.apache.thrift.protocol.TBinaryProtocol.readAll(T 
> BinaryProtocol.java:378)
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
>   at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> ERROR [Thrift:966717] 2013-07-16 12:36:17,152 AbstractCassandraDaem on.java 
> (line 134) Exception in thread Thread[Thrift:966717,5,main]
> java.lang.StackOverflowError
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:150)
>   at java.net.SocketInputStream.read(SocketInputStream.java:121)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>   at org.apache.thrift.transport.T Transport.readAll(TTransport.java:84)
>   at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>   at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>   at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>   at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:22)
>   at org.apache.c 
> assandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
>  INFO [OptionalTasks:1] 2013-07-16 12:38:07,847 MeteredFlusher.java (line 62) 
> flushing high-traffic column family CFS(Keyspace='zm', 
> ColumnFamily='deviceConnectionLog') (estimated 200441950 bytes)
>  INFO [OptionalTasks:1] 2013-07-16 12:38:07,848 ColumnFamilyStore.java (line 
> 659) Enqueuing flush of 
> Memtable-deviceConnectionLog@64568169(20293642/200441950 serialized/live 
> bytes, 114490 ops)
>  INFO [FlushWriter:

Re: Fp chance for column level bloom filter

2013-07-18 Thread aaron morton
It's dropped in 1.2, if that's a concern then probably a good reason to 
upgrade. 

https://issues.apache.org/jira/browse/CASSANDRA-5492

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/07/2013, at 1:09 PM, Takenori Sato  wrote:

> Hi,
> 
> I thought memory consumption of column level bloom filter will become a big 
> concern when a row becomes very wide like more than tens of millions of 
> columns. 
> 
> But I read from source(1.0.7) that fp chance for column level bloom filter is 
> hard-coded as 0.160, which is very high. So seems not.
> 
> Is this correct?
> 
> Thanks,
> Takenori



Re: sstable size ?

2013-07-18 Thread aaron morton
Does this help ? 
http://www.mail-archive.com/user@cassandra.apache.org/msg30973.html

Can you pull the data off the node so you can test it somewhere safe ? 

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/07/2013, at 2:20 PM, "Langston, Jim"  wrote:

> Thanks, this does look like what I'm experiencing. Can someone 
> post a walkthrough ? The README and the sstablesplit script 
> don't seem to cover it use in any detail.
> 
> Jim
> 
> From: Colin Blower 
> Reply-To: 
> Date: Wed, 17 Jul 2013 16:49:59 -0700
> To: "user@cassandra.apache.org" 
> Subject: Re: sstable size ?
> 
> Take a look at the very recent thread called 'Alternate "major compaction"'. 
> There are some ideas in there about splitting up a large SSTable.
> 
> http://www.mail-archive.com/user@cassandra.apache.org/msg30956.html
> 
> 
> On 07/17/2013 04:17 PM, Langston, Jim wrote:
>> Hi all,
>> 
>> Is there a way to get an SSTable to a smaller size ? By this I mean that I 
>> currently have an SSTable that is nearly 1.2G, so that subsequent SSTables
>> when they compact are trying to grow to that size. The result is that when 
>> the min_compaction_threshold reaches it value and a compaction is needed, 
>> the compaction is taking a long time as the file grows (it is currently at 
>> 52MB and
>> takes ~22s to compact).
>> 
>> I'm not sure how the SSTable initially grew to its current size of 1.2G, 
>> since the
>> servers have been up for a couple of years. I hadn't noticed until I just 
>> upgraded to 1.2.6,
>> but now I see it affects everything. 
>> 
>> 
>> Jim
> 
> 
> -- 
> Colin Blower
> Software Engineer
> Barracuda Networks Inc.
> +1 408-342-5576 (o)



Re: sstable size ?

2013-07-18 Thread Langston, Jim
I saw that msg in the thread, I pulled the git files and it looks
like a suite of tools, do I install them on their own ? do I replace the
current ones ? its production data but I can copy the data to where
I want and experiment.

Jim

From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thu, 18 Jul 2013 21:41:24 +1200
To: mailto:user@cassandra.apache.org>>
Subject: Re: sstable size ?

Does this help ? 
http://www.mail-archive.com/user@cassandra.apache.org/msg30973.html

Can you pull the data off the node so you can test it somewhere safe ?

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/07/2013, at 2:20 PM, "Langston, Jim" 
mailto:jim.langs...@compuware.com>> wrote:

Thanks, this does look like what I'm experiencing. Can someone
post a walkthrough ? The README and the sstablesplit script
don't seem to cover it use in any detail.

Jim

From: Colin Blower mailto:cblo...@barracuda.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Wed, 17 Jul 2013 16:49:59 -0700
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: sstable size ?

Take a look at the very recent thread called 'Alternate "major compaction"'. 
There are some ideas in there about splitting up a large SSTable.

http://www.mail-archive.com/user@cassandra.apache.org/msg30956.html


On 07/17/2013 04:17 PM, Langston, Jim wrote:
Hi all,

Is there a way to get an SSTable to a smaller size ? By this I mean that I
currently have an SSTable that is nearly 1.2G, so that subsequent SSTables
when they compact are trying to grow to that size. The result is that when
the min_compaction_threshold reaches it value and a compaction is needed,
the compaction is taking a long time as the file grows (it is currently at 52MB 
and
takes ~22s to compact).

I'm not sure how the SSTable initially grew to its current size of 1.2G, since 
the
servers have been up for a couple of years. I hadn't noticed until I just 
upgraded to 1.2.6,
but now I see it affects everything.


Jim


--
Colin Blower
Software Engineer
Barracuda Networks Inc.
+1 408-342-5576 (o)



Re: Intresting issue with getting Order By to work...

2013-07-18 Thread Vladimir Prudnikov
I'm not an expert, still learning C* but can tell something about your
questions.

1) You have to understand that CQL row is not the same as row that C* uses
to store data and which is accessible through the Trift interface. Primary
key in terms of CQL is not the same as Row key.

2) You have to be clear what you want to order: raw columns, rows or CQL
rows. If you want to get ordered slices of raw rows you have to use Order
Preserving Partitioner (which is not recommended, depends on you schema);
If you want to order columns, it can be done easily; If you want to order
CQL rows you have to have a composite primary key with at least 2 columns
and you can order only by the second column in primary key.

3) As far as I know column families created from CLI or Thrift will be
accessible from CQL, but not opposite.

I hope experts will correct me if I'm wrong.


On 17/07/2013, at 3:44 PM, Tony Anecito  wrote:
>
> > Hi All,
> >
> > Well I got most everything working I wanted using Cassandra then
> discovered I needed to use an Order By. I am using Cassandra 1.2.5.
> > The use of Order By requires Primary Key which appears to be only
> supported by by using CQL and not Cassandra-cli. So I dropped my table
> created uisng CLI and used CQL and was able to create a "Table". But when I
> went to insert data that worked fine on the cli created table I now get an
> exception:
> > Error while inserting
> com.datastax.driver.core.exceptions.InvalidQueryException: Unknown
> identifier type.
> >
> > I read that thrift clients will not work with CQL created tables due to
> extra things created by the CQL. If so how can I create Primary Keys and be
> supported by thrift based clients??
> >
> > I will post to Datastax the same question but trying to understand how
> to resolve cli vs CQL issue like this. Seems like Cassandra-cli should
> support creation of compound primary keys or CQL should create tables
> readable by thrift based clients. Is there some meta column info people
> should add?
> > Also CQL tables are not visible via cli.so I can not see details on what
> was created by CQL and the cqlsh script has errors according to the latest
> Python windows program I tried.
> >
> > Thanks,
> > -Tony
> >
> >
>
>
>
>


-- 
Vladimir Prudnikov


Re: Pig load data with cassandrastorage and slice filter param

2013-07-18 Thread Miguel Angel Martin junquera
hi A:

Thank you for responding to my e-mail.

Sorry if i did not express my questions/doubts well.

I try to use the slice feature with CassandraStorage LOAD but I do not
 know how to do it. I cannot  find any detailed documentation about it.

I found only the  references in my last mail.

Yet i don run any command  because I dont know if Composite  column Family
the best solution to load data filtering range by day, or define
supercolumns  and how this will work  with cassandraStorage-Slice LOADS in
PIG and how to create the LOAD statment with slice in cassandraStorage. So I
am working  *trial and error  method *at this develop.


I will appreciate any help or example how I must define cassandra data CF
for filter by day (composite CF with timestamp or string  with format
YYY-MM-dd  or super column, or any other feature) and  Pig example  that
load cassandra data with slice feature.

Thanks in advance





2013/7/17 aaron morton 

> Not sure I understand the question. What was the command that failed?
>
> Cheers
>
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/07/2013, at 11:03 PM, Miguel Angel Martin junquera <
> mianmarjun.mailingl...@gmail.com> wrote:
>
> > hi all
> >
> > I trying to load data from cassandra with slice params option but ther
> are  no much info about how to use i. I found  only a quick reference in
> readme.txt in cassandra project  .../examples/pig
> >
> > ...
> > Slices on columns can also be specified:
> > grunt> rows = LOAD
> 'cassandra://MyKeyspace/MyColumnFamily?slice_start=C2&slice_end=C4&limit=1&reversed=true'
> USING CassandraStorage();
> > Binary values for slice_start and slice_end can be escaped such as
> '\u0255'
> > ...
> >
> >
> > I want to filter the initial load data by day o range dates and I only
> found this info about cassandra and pig
> >
> >   •
> http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
> >   •
> http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
> >
> >
> > I,m going  to try to do a test with  dummy data with Composite column
> Family like  anuniqueIDGenerate:timestamp for example or
>  anuniqueIDGenerate:stringdate where date is a string with dornat YYY-MM-dd
> for example
> >
> > Another option is use Supercolumn family by day for example ad try to
> use slice with this feature
> >
> >
> > Or another option is create a custom load cassandra but perhaps It´s
> more complex and I could this features.
> >
> > I will appreciate any help or example how I must define cassandra data
> and  Pig example load with slice.
> >
> > Thanks in advance and king regards
> >
> >
> >
>
>


Re: Huge query Cassandra limits

2013-07-18 Thread cesare cugnasco
Thank you Aaron,  your advice about a newer client it is really
interesting. We will take in account it!

Here, some numbers about our tests: we found that more or less that with
more than 500k elements (multiplying rows and columns requested) there was
the inflection point, and so asking for more the performance can only
decrease. The combination was performing better was querying for 500 rows
at a time with 1000 columns while different combinations, such as 125 rows
for 4000 columns or 1000 rows for 500 columns, were about the 15% slower.
Other combinations have even bigger differences.

It was a cluster of 16 nodes, with 24GBs or ram, sata-2 SSDs and 8-cores
CPUs@2.6 GHz.

The issue is this memory limit can be reached with many combinations of row
and columns. Broadly speaking, in using more rows or columns there is a
trade-off between better having a better parallelization and and higher
overhead.
If you consider it depends also on the number of nodes in the cluster, the
memory available  and the number of rows and column the query needs, the
problem of how  optimally divide a request  becomes quite  complex.

Does these numbers make sense for you?

Cheers


2013/7/17 aaron morton 

> >  In ours tests,  we found there's a significant performance difference
> between various  configurations and we are studying a policy to optimize
> it. The doubt is that, if the needing of issuing multiple requests is
> caused only by a fixable implementation detail, would make pointless do
> this study.
> if you provide your numbers we can see if you are getting expected results.
>
> There are some limiting factors. Using the thrift API the max message size
> is 15 MB. And each row you ask for becomes (roughly) RF number of tasks in
> the thread pools on replicas. When you ask for 1000 rows it creates
> (roughly) 3,000 tasks in the replicas. If you have other clients trying to
> do reads at the same time this can cause delays to their reads.
>
> Like everything in computing, more is not always better. Run some tests to
> try multi gets with different sizes and see where improvements in the
> overall throughput begin to decline.
>
> Also consider using a newer client with token aware balancing and async
> networking. Again though, if you try to read everything at once you are
> going to have a bad day.
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/07/2013, at 8:24 PM, cesare cugnasco 
> wrote:
>
> > Hi Rob,
> > of course, we could issue multiple requests, but then we should
>  consider which is the optimal way to split the query in smaller ones.
> Moreover, we should choose how many of sub-query run in parallel.
> >  In ours tests,  we found there's a significant performance difference
> between various  configurations and we are studying a policy to optimize
> it. The doubt is that, if the needing of issuing multiple requests is
> caused only by a fixable implementation detail, would make pointless do
> this study.
> >
> > Does anyone made similar analysis?
> >
> >
> > 2013/7/16 Robert Coli 
> >
> > On Tue, Jul 16, 2013 at 4:46 AM, cesare cugnasco <
> cesare.cugna...@gmail.com> wrote:
> > We  are working on porting some life science applications to Cassandra,
> but we have to deal with its limits managing huge queries. Our queries are
> usually multiget_slice ones: many rows with many columns each.
> >
> > You are not getting much "win" by increasing request size in Cassandra,
> and you expose yourself to "lose" such as you have experienced.
> >
> > Is there some reason you cannot just issue multiple requests?
> >
> > =Rob
> >
>
>


RE: InvalidRequestException(why:Not enough bytes to read value of component 0)

2013-07-18 Thread Rahul Gupta
Thank You. Your solution worked...!!

Changed below line:
HCounterColumn hColumn_ts = HFactory.createCounterColumn(ts.toString(), 
value, StringSerializer.get());
To this:

HCounterColumn hColumn_tsCG = HFactory.createCounterColumn(tsCG, 
value, CompositeSerializer.get());

Rahul Gupta
This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohibited. If you have received this communication in error, please 
immediately notify the sender and destroy the original message.

From: Vivek Mishra [mailto:mishra.v...@gmail.com]
Sent: Thursday, July 18, 2013 4:23 AM
To: user@cassandra.apache.org
Subject: Re: InvalidRequestException(why:Not enough bytes to read value of 
component 0)

+1 for Sylvain's answer.

This normally happens, if validation class for column value(s) differs.

-Vivek

On Thu, Jul 18, 2013 at 12:08 PM, Sylvain Lebresne 
mailto:sylv...@datastax.com>> wrote:
I don't know Hector very much really, but I highly suspect that "ts.toString()" 
is wrong, since composite column names are not strings. So again, not a Hector 
expert, but I can't really see how converting the composite into string could 
work.

--
Sylvain

On Wed, Jul 17, 2013 at 11:14 PM, Rahul Gupta 
mailto:rgu...@dekaresearch.com>> wrote:
Getting error while trying to persist data in a Column Family having a 
CompositeType comparator

Using Cassandra ver 1.1.9.7
Hector Core ver 1.1.5 API ( which uses Thrift 1.1.10)

Created Column Family using cassandra-cli:

create column family event_counts
with comparator = 'CompositeType(DateType,UTF8Type)'
and key_validation_class = 'UUIDType'
and default_validation_class = 'CounterColumnType';

Persistence Code(sumLoad.java):

import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.cassandra.serializers.DateSerializer;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.beans.Composite;
import me.prettyprint.hector.api.beans.HCounterColumn;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
import java.sql.Date;
import java.util.logging.Level;

public class sumLoad {

final static Cluster cluster = 
HFactory.getOrCreateCluster("Dev", new 
CassandraHostConfigurator("100.10.0.6:9160"));
final static Keyspace keyspace = 
HFactory.createKeyspace("Events", cluster);
final static StringSerializer ss = StringSerializer.get();

private boolean storeCounts(String vKey, String counterCF, Date 
dateStr, String vStr, long value)
{
try
{
   Mutator m1 = HFactory.createMutator(keyspace, 
StringSerializer.get());

   Composite ts = new Composite();
   ts.addComponent(dateStr, DateSerializer.get());
   ts.addComponent(vStr, StringSerializer.get());
   HCounterColumn hColumn_ts = 
HFactory.createCounterColumn(ts.toString(), value, StringSerializer.get());

   m1.insertCounter(vKey, counterCF, hColumn_ts);
   m1.execute();
   return true;
}
catch(Exception ex)
{
LOGGER.log(Level.WARNING, "Unable to 
store record", ex);
}
return false;
}

public static void main(String[] args) {

Date vDate = new Date(0);
sumLoad SumLoad = new sumLoad();
SumLoad.storeCounts("b9874e3e-4a0e-4e60-ae23-c3f1e575af93", "event_counts", 
vDate, "StoreThisString", 673);
}

}

Error:

[main] INFO me.prettyprint.cassandra.service.JmxMonitor - Registering JMX 
me.prettyprint.cassandra.service_Dev:ServiceType=hector,MonitorType=hector
Unable to store record
me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
InvalidRequestException(why:Not enough bytes to read value of component 0)


Rahul Gupta
This e-mail and the information, including any attachments, it contains are 
intended to be a confidential communication only to the person or entity to 
whom it is addressed and may contain information that is privileged. If the 
reader of this message is not the intended recipient, you are hereby notified 
that any dissemination, distribution or copying of this communication is 
strictly prohib

Exception while writing compsite column names

2013-07-18 Thread ANAND_BALARAMAN
Hi

I have an issue while inserting a composite column name to one of the Cassandra 
column families. Below is a detailed description of what I had done and stuck 
up at.
Please let me know where I had went wrong.

Requirement:
--
   Rowkey-> RowIdString
   Column name   -> TEXT1 : value1 : TEXT2 : value2 : TEXT3
   Column value -> value3

Column family definition:
---
create column family CompositeColumnNameTest
   WITH 
comparator='CompositeType(UTF8Type,UTF8Type,UTF8Type,UTF8Type,UTF8Type)'
   AND key_validation_class=UTF8Type
   WITH compression_options={sstable_compression:SnappyCompressor, 
chunk_length_kb:64};

Code:

String RowIdString = "1234";

   Composite composite = new Composite();
   composite.addComponent("TEXT1", StringSerializer.get());
   composite.addComponent("value1", StringSerializer.get());
   composite.addComponent("TEXT2", StringSerializer.get());
   composite.addComponent("value3", StringSerializer.get());
   composite.addComponent("TEXT3", StringSerializer.get());

   Column column = new Column(composite.serialize());
   column.setValue("value3".getBytes());
   column.setTimestamp(System.currentTimeMillis());

   // push data to cassandra
   batchMutate.addInsertion(RowIdString, "CompositeColumnNameTest", column);
   keyspaceServiceImpl.batchMutate(batchMutate);

Exception:
-
me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
InvalidRequestException(why:Not enough bytes to read value of component 0)
   at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
   at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
   at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)


Thanks in advance
-Anand



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Intresting issue with getting Order By to work...

2013-07-18 Thread Tony Anecito
Many Thanks Aaron!!
 
As I work more with CQL and CLI as some other posting I have seen regarding 
usage I am thinking that CLI for keyspace and Column Family setup and 
maintenance is best
 while CQL for queries/inserts ect is best. Mainly I am thinking this because 
of better control over the schema using CLI.
 
I will look over your links carefully.
-Tony
  


 From: aaron morton 
To: Cassandra User  
Sent: Thursday, July 18, 2013 2:21 AM
Subject: Re: Intresting issue with getting Order By to work...
  


Here are some posts about CQL and Thrift

 http://thelastpickle.com/2013/01/11/primary-keys-in-cql/
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
http://www.datastax.com/dev/blog/thrift-to-cql3

Hope that helps. 

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com/  

On 17/07/2013, at 11:38 PM, Tony Anecito  wrote:

Thanks for the answers. 
>  
>The reason why I ask is it is stated the composite keys are not the same as 
>Primary Key. I found no examples for thrift where it specifcally said the 
>composite key is a primary key required by order by. All the examples where 
>the words primary key were used were with CQL examples and I am seeing 
>postings where people had issues with Order By but no answers like what you 
>said. 
>  
>If there was better documentation for Cassandra with working examples and 
>explnations about the differences between CQL and CLI I would not need to ask 
>questions on the users groups. I have also spotted major issues and tried to 
>help understand them for all users. 
>  
>-Tony 
>
> 
>
>
>From: aaron morton 
>To: Cassandra User  
>Sent: Wednesday, July 17, 2013 4:06 AM
>Subject: Re: Intresting issue with getting Order By to work...
> 
>
>> The use of Order By requires Primary Key which appears to be only supported 
>> by by using CQL and not Cassandra-cli. 
>Order By in CQL is the also supported on the thrift interface. 
>
>When using thrift the order you get the columns back is the order the 
>Comparator puts them in. If you want them reversed the thrift API supports 
>that. 
>
>> I read that thrift clients will not work with CQL created tables due to 
>> extra things created by the CQL. If so how can I create Primary Keys and be 
>> supported by thrift based clients??
>No.
>Do not access CQL tables with the thrift API. 
>
>> Seems like Cassandra-cli should support creation of compound primary keys or
>It does. 
>See help on the CompositeType
>
>> Also CQL tables are not visible via cli.so I can not see details on what was 
>> created by CQL and the cqlsh script has errors according to the latest 
>> Python windows program I tried.
>They
 are visible for read access. 
>
>> I will post to Datastax the same question 
>Please ask questions to one group at a time so people do not waste their time 
>providing answers you already have. 
>
>Cheers
>
>
>-
>Aaron Morton
>Cassandra Consultant
>New Zealand
>
>@aaronmorton
>http://www.thelastpickle.com/
>
>On 17/07/2013, at 3:44 PM, Tony Anecito  wrote:
>
>> Hi All,
>> 
>> Well I got most everything working I wanted using Cassandra then discovered 
>> I needed to use an Order By. I am using Cassandra 1.2.5.
>> The use of Order By requires Primary Key which appears to be only supported 
>> by by using CQL and not Cassandra-cli. So I dropped my table created uisng 
>> CLI and used CQL and was able to create a "Table". But when I went
 to insert data that worked fine on the cli created table I now get an 
exception:
>> Error while inserting 
>> com.datastax.driver.core.exceptions.InvalidQueryException: Unknown 
>> identifier type.
>> 
>> I read that thrift clients will not work with CQL created tables due to 
>> extra things created by the CQL. If so how can I create Primary Keys and be 
>> supported by thrift based clients??
>> 
>> I will post to Datastax the same question but trying to understand how to 
>> resolve cli vs CQL issue like this. Seems like Cassandra-cli should support 
>> creation of compound primary keys or CQL should create tables readable by 
>> thrift based clients. Is there some meta column info people should add?
>> Also CQL tables are not visible via cli.so I can not see details on what was 
>> created by CQL and the cqlsh script has errors according to the latest 
>> Python windows program I tried.
>> 
>> Thanks,
>> -Tony
>> 
>> 
>
>
>

Re: Intresting issue with getting Order By to work...

2013-07-18 Thread Tony Anecito
Many thanks Vladimir I am starting to see what you are talking about.
 
Yeah all I want to do is a simple Order By via SQL but having the column family 
setup using CLI to support that is a bit of a challenge for me at the moment 
since the two are at different levels but I prefer CLI for column family setup 
and just need an example of one at CLI level setup that translates to working 
Order By at CQL level.
 
Many Thanks!
-Tony
 


 From: Vladimir Prudnikov 
To: user@cassandra.apache.org 
Sent: Thursday, July 18, 2013 3:54 AM
Subject: Re: Intresting issue with getting Order By to work...
  


I'm not an expert, still learning C* but can tell something about your 
questions.

1) You have to understand that CQL row is not the same as row that C* uses to 
store data and which is accessible through the Trift interface. Primary key in 
terms of CQL is not the same as Row key. 

2) You have to be clear what you want to order: raw columns, rows or CQL rows. 
If you want to get ordered slices of raw rows you have to use Order Preserving 
Partitioner (which is not recommended, depends on you schema); If you want to 
order columns, it can be done easily; If you want to order CQL rows you have to 
have a composite primary key with at least 2 columns and you can order only by 
the second column in primary key. 

3) As far as I know column families created from CLI or Thrift will be 
accessible from CQL, but not opposite.

I hope experts will correct me if I'm wrong. 




On 17/07/2013, at 3:44 PM, Tony Anecito  wrote:
>>
>>> Hi All,
>>> 
>>> Well I got most everything working I wanted using Cassandra then discovered 
>>> I needed to use an Order By. I am using Cassandra 1.2.5.
>>> The use of Order By requires Primary Key which appears to be only supported 
>>> by by using CQL and not Cassandra-cli. So I dropped my table created uisng 
>>> CLI and used CQL and was able to create a "Table". But when I went
 to insert data that worked fine on the cli created table I now get an 
exception:
>>> Error while inserting 
>>> com.datastax.driver.core.exceptions.InvalidQueryException: Unknown 
>>> identifier type.
>>> 
>>> I read that thrift clients will not work with CQL created tables due to 
>>> extra things created by the CQL. If so how can I create Primary Keys and be 
>>> supported by thrift based clients??
>>> 
>>> I will post to Datastax the same question but trying to understand how to 
>>> resolve cli vs CQL issue like this. Seems like Cassandra-cli should support 
>>> creation of compound primary keys or CQL should create tables readable by 
>>> thrift based clients. Is there some meta column info people should add?
>>> Also CQL tables are not visible via cli.so I can not see details on what 
>>> was created by CQL and the cqlsh script has errors according to the latest 
>>> Python windows program I tried.
>>> 
>>> Thanks,
>>> -Tony
>>> 
>>> 
>>
>>
>>
>


-- 
Vladimir Prudnikov 

Re: Exception while writing compsite column names

2013-07-18 Thread Vivek Mishra
Looks like validation class for composite column value is different than
UTF8Type? Though code suggests it is:
   composite.addComponent("TEXT1", StringSerializer.get());

Please validate.

-Vivek


On Thu, Jul 18, 2013 at 7:41 PM,  wrote:

>  Hi
>
>
>
> I have an issue while inserting a composite column name to one of the
> Cassandra column families. Below is a detailed description of what I had
> done and stuck up at.
>
> Please let me know where I had went wrong.
>
>
>
> Requirement:
>
> --
>
>Rowkey-> RowIdString
>
>Column name   -> TEXT1 : value1 : TEXT2 : value2 : TEXT3
>
>Column value -> value3
>
>
>
> Column family definition:
>
> ---
>
> create column family CompositeColumnNameTest
>
>WITH
> comparator='CompositeType(UTF8Type,UTF8Type,UTF8Type,UTF8Type,UTF8Type)'
>
>AND key_validation_class=UTF8Type
>
>WITH compression_options={sstable_compression:SnappyCompressor,
> chunk_length_kb:64};
>
>
>
> Code:
>
> 
>
> String RowIdString = "1234";
>
>
>
>Composite composite = new Composite();
>
>composite.addComponent("TEXT1", StringSerializer.get());
>
>composite.addComponent("value1", StringSerializer.get());
>
>composite.addComponent("TEXT2", StringSerializer.get());
>
>composite.addComponent("value3", StringSerializer.get());
>
>composite.addComponent("TEXT3", StringSerializer.get());
>
>
>
>Column column = new Column(composite.serialize());
>
>column.setValue("value3".getBytes());
>
>column.setTimestamp(System.currentTimeMillis());
>
>
>
>// push data to cassandra
>
>batchMutate.addInsertion(RowIdString, "CompositeColumnNameTest",
> column);
>
>keyspaceServiceImpl.batchMutate(batchMutate);
>
>
>
> Exception:
>
> -
>
> me.prettyprint.hector.api.exceptions.HInvalidRequestException:
> InvalidRequestException(why:Not enough bytes to read value of component 0)
>
>at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
>
>at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
>
>at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
>
>at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
>
>at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
>
>at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
>
>
>
>
>
> Thanks in advance
>
> -Anand
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: Exception while writing compsite column names

2013-07-18 Thread ANAND_BALARAMAN
I had been using the StringSerilaizer.get() for all UTF8Type fields so far. Do 
not think I need to check the code.
Do you suspect the column family definition?

-Anand

From: Vivek Mishra [mailto:mishra.v...@gmail.com]
Sent: Thursday, July 18, 2013 11:29 AM
To: user@cassandra.apache.org
Subject: Re: Exception while writing compsite column names

Looks like validation class for composite column value is different than 
UTF8Type? Though code suggests it is:
   composite.addComponent("TEXT1", StringSerializer.get());

Please validate.

-Vivek

On Thu, Jul 18, 2013 at 7:41 PM, 
mailto:anand_balara...@homedepot.com>> wrote:
Hi

I have an issue while inserting a composite column name to one of the Cassandra 
column families. Below is a detailed description of what I had done and stuck 
up at.
Please let me know where I had went wrong.

Requirement:
--
   Rowkey-> RowIdString
   Column name   -> TEXT1 : value1 : TEXT2 : value2 : TEXT3
   Column value -> value3

Column family definition:
---
create column family CompositeColumnNameTest
   WITH 
comparator='CompositeType(UTF8Type,UTF8Type,UTF8Type,UTF8Type,UTF8Type)'
   AND key_validation_class=UTF8Type
   WITH compression_options={sstable_compression:SnappyCompressor, 
chunk_length_kb:64};

Code:

String RowIdString = "1234";

   Composite composite = new Composite();
   composite.addComponent("TEXT1", StringSerializer.get());
   composite.addComponent("value1", StringSerializer.get());
   composite.addComponent("TEXT2", StringSerializer.get());
   composite.addComponent("value3", StringSerializer.get());
   composite.addComponent("TEXT3", StringSerializer.get());

   Column column = new Column(composite.serialize());
   column.setValue("value3".getBytes());
   column.setTimestamp(System.currentTimeMillis());

   // push data to cassandra
   batchMutate.addInsertion(RowIdString, "CompositeColumnNameTest", column);
   keyspaceServiceImpl.batchMutate(batchMutate);

Exception:
-
me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
InvalidRequestException(why:Not enough bytes to read value of component 0)
   at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
   at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
   at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)


Thanks in advance
-Anand



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special 

Corrupted sstable and sstableloader

2013-07-18 Thread Jan Kesten

Hello together,

today I experienced a problem while loading a snapshot from our 
cassandra cluster to test cluster. The cluster has six nodes and I took 
a snapshot from all nodes concurrently and tried to import them in the 
other cluster.


From 5 out of 6 nodes importing went well with no errors. But one 
snapshot of one node cannot be imported - I tried serveral times. I got 
the following while running sstableloader:


ERROR 09:13:06,084 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.io.IOException: Datenübergabe 
unterbrochen (broken pipe)

at com.google.common.base.Throwables.propagate(Throwables.java:160)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)

at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
at 
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

... 3 more
Exception in thread "Streaming to /172.17.2.216:1" 
java.lang.RuntimeException: java.io.IOException: Datenübergabe 
unterbrochen (broken pipe)

at com.google.common.base.Throwables.propagate(Throwables.java:160)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)

at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
at 
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

... 3 more

I suspect that the sstable on the node is corrupted in some way - and a 
scrub and repair should fix that I suppose.


Since the original cluster has a replication factor of 3 - shoudn't the 
import from 5 of 6 snapshots contain all data? Or is the sstableloader 
tool too clever and avoids importing double data?


Thanks for hints,
Jan

--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9
enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Re: Corrupted sstable and sstableloader

2013-07-18 Thread sankalp kohli
sstable might be corrupted due to bad disk. In that case, replication does
not matter.


On Thu, Jul 18, 2013 at 8:52 AM, Jan Kesten  wrote:

> Hello together,
>
> today I experienced a problem while loading a snapshot from our cassandra
> cluster to test cluster. The cluster has six nodes and I took a snapshot
> from all nodes concurrently and tried to import them in the other cluster.
>
> From 5 out of 6 nodes importing went well with no errors. But one snapshot
> of one node cannot be imported - I tried serveral times. I got the
> following while running sstableloader:
>
> ERROR 09:13:06,084 Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.io.IOException: Datenübergabe
> unterbrochen (broken pipe)
> at com.google.common.base.**Throwables.propagate(**
> Throwables.java:160)
> at org.apache.cassandra.utils.**WrappedRunnable.run(**
> WrappedRunnable.java:32)
> at java.util.concurrent.**ThreadPoolExecutor.runWorker(**
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.**java:724)
> Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
> at sun.nio.ch.FileChannelImpl.**transferTo0(Native Method)
> at sun.nio.ch.FileChannelImpl.**transferToDirectly(**
> FileChannelImpl.java:420)
> at sun.nio.ch.FileChannelImpl.**transferTo(FileChannelImpl.**java:552)
> at org.apache.cassandra.**streaming.compress.**
> CompressedFileStreamTask.**stream(**CompressedFileStreamTask.java:**93)
> at org.apache.cassandra.**streaming.FileStreamTask.**
> runMayThrow(FileStreamTask.**java:91)
> at org.apache.cassandra.utils.**WrappedRunnable.run(**
> WrappedRunnable.java:28)
> ... 3 more
> Exception in thread "Streaming to /172.17.2.216:1"
> java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen
> (broken pipe)
> at com.google.common.base.**Throwables.propagate(**
> Throwables.java:160)
> at org.apache.cassandra.utils.**WrappedRunnable.run(**
> WrappedRunnable.java:32)
> at java.util.concurrent.**ThreadPoolExecutor.runWorker(**
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.**java:724)
> Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
> at sun.nio.ch.FileChannelImpl.**transferTo0(Native Method)
> at sun.nio.ch.FileChannelImpl.**transferToDirectly(**
> FileChannelImpl.java:420)
> at sun.nio.ch.FileChannelImpl.**transferTo(FileChannelImpl.**java:552)
> at org.apache.cassandra.**streaming.compress.**
> CompressedFileStreamTask.**stream(**CompressedFileStreamTask.java:**93)
> at org.apache.cassandra.**streaming.FileStreamTask.**
> runMayThrow(FileStreamTask.**java:91)
> at org.apache.cassandra.utils.**WrappedRunnable.run(**
> WrappedRunnable.java:28)
> ... 3 more
>
> I suspect that the sstable on the node is corrupted in some way - and a
> scrub and repair should fix that I suppose.
>
> Since the original cluster has a replication factor of 3 - shoudn't the
> import from 5 of 6 snapshots contain all data? Or is the sstableloader tool
> too clever and avoids importing double data?
>
> Thanks for hints,
> Jan
>
> --
> Jan Kesten, mailto:j.kes...@enercast.de
> Tel.: +49 561/4739664-0 FAX: -9
> enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
> http://www.enercast.de Online-Prognosen für erneuerbare Energien
> Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz
>
> Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich
> geschützte Informationen enthalten. Falls Sie nicht der angegebene
> Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde,
> benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie
> diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie
> diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben.
> Vielen Dank.
>
> This e-mail and any attachment may contain confidential and/or privileged
> information. If you are not the named addressee or if this transmission has
> been addressed to you in error, please notify us immediately by reply
> e-mail and then delete this e-mail and any attachment from your system.
> Please understand that you must not copy this e-mail or any attachment or
> disclose the contents to any other person. Thank you for your cooperation.
>
>


Re: alter column family ?

2013-07-18 Thread Robert Coli
On Wed, Jul 17, 2013 at 7:23 PM, Langston, Jim
wrote:

>  As a follow up – I did upgrade the cluster to 1.2.6 and that
> did take care of the issue. The upgrade went very smoothly,
> the longest part was being thorough on the configuration
> files, but I was able to able to quickly update the schema's
> after restarting the cluster.
>

Glad to hear it!

=Rob


Re: Intresting issue with getting Order By to work...

2013-07-18 Thread Robert Coli
On Thu, Jul 18, 2013 at 8:12 AM, Tony Anecito  wrote:

> As I work more with CQL and CLI as some other posting I have seen
> regarding usage I am thinking that CLI for keyspace and Column Family setup
> and maintenance is best
>  while CQL for queries/inserts ect is best. Mainly I am thinking this
> because of better control over the schema using CLI.
>

The question is not really CQL vs CLI, it's COMPACT STORAGE vs.
(NON-COMPACT) CQL storage. Picking one or the other strongly informs
whether you want to also use Thrift or CQL (respectively) as an interface.

=Rob


Re: Corrupted sstable and sstableloader

2013-07-18 Thread Jan Kesten
Hi,

I think it might be corrupted due to a poweroutage. Apart from this issue 
reading the data with consistency level quorum (I have three replicas) did not 
issue an error - only the import to a different cluster.

So, if I import all nodes except the one with the corrupted sstable - shoudn't 
I import two of the three replicas, so that the data is complete?


Von meinem iPhone gesendet

Am 18.07.2013 um 19:06 schrieb sankalp kohli :

> sstable might be corrupted due to bad disk. In that case, replication does 
> not matter.
> 
> 
> On Thu, Jul 18, 2013 at 8:52 AM, Jan Kesten  wrote:
>> Hello together,
>> 
>> today I experienced a problem while loading a snapshot from our cassandra 
>> cluster to test cluster. The cluster has six nodes and I took a snapshot 
>> from all nodes concurrently and tried to import them in the other cluster.
>> 
>> From 5 out of 6 nodes importing went well with no errors. But one snapshot 
>> of one node cannot be imported - I tried serveral times. I got the following 
>> while running sstableloader:
>> 
>> ERROR 09:13:06,084 Error in ThreadPoolExecutor
>> java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen 
>> (broken pipe)
>> at com.google.common.base.Throwables.propagate(Throwables.java:160)
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:724)
>> Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
>> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>> at 
>> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)
>> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
>> at 
>> org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
>> at 
>> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ... 3 more
>> Exception in thread "Streaming to /172.17.2.216:1" 
>> java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen 
>> (broken pipe)
>> at com.google.common.base.Throwables.propagate(Throwables.java:160)
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:724)
>> Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
>> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>> at 
>> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)
>> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
>> at 
>> org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
>> at 
>> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>> at 
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> ... 3 more
>> 
>> I suspect that the sstable on the node is corrupted in some way - and a 
>> scrub and repair should fix that I suppose.
>> 
>> Since the original cluster has a replication factor of 3 - shoudn't the 
>> import from 5 of 6 snapshots contain all data? Or is the sstableloader tool 
>> too clever and avoids importing double data?
>> 
>> Thanks for hints,
>> Jan
>> 
>> -- 
>> Jan Kesten, mailto:j.kes...@enercast.de
>> Tel.: +49 561/4739664-0 FAX: -9
>> enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
>> http://www.enercast.de Online-Prognosen für erneuerbare Energien
>> Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz
>> 
>> Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
>> geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
>> sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
>> benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie 
>> diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie 
>> diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. 
>> Vielen Dank.
>> 
>> This e-mail and any attachment may contain confidential and/or privileged 
>> information. If you are not the named addressee or if this transmission has 
>> been addressed to you in error, please notify us immediately by reply e-mail 
>> and then delete this e-mail and any attachment from your system. Please 
>> understand that you must not copy this e-mail or any attachment or disclose 
>> the contents to any other person. Thank you for your cooperation.
> 


Re: Corrupted sstable and sstableloader

2013-07-18 Thread Robert Coli
On Thu, Jul 18, 2013 at 10:17 AM, Jan Kesten  wrote:

> I think it might be corrupted due to a poweroutage. Apart from this issue
> reading the data with consistency level quorum (I have three replicas) did
> not issue an error - only the import to a different cluster.
>
> So, if I import all nodes except the one with the corrupted sstable -
> shoudn't I import two of the three replicas, so that the data is complete?
>

Why not just determine which SSTable is corrupt, remove it from the restore
set, then run a repair when you're done to be totally sure all data is on
all nodes?

=Rob


Re: Exception while writing compsite column names

2013-07-18 Thread Vivek Mishra
Yes. can you please share output of describe keyspace which contains "
CompositeColumnNameTest"
What is the datatype for column values?

-Vive


On Thu, Jul 18, 2013 at 9:17 PM,  wrote:

>  I had been using the StringSerilaizer.get() for all UTF8Type fields so
> far. Do not think I need to check the code.
>
> Do you suspect the column family definition?
>
>
>
> -Anand
>
>
>
> *From:* Vivek Mishra [mailto:mishra.v...@gmail.com]
> *Sent:* Thursday, July 18, 2013 11:29 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Exception while writing compsite column names
>
>
>
> Looks like validation class for composite column value is different than
> UTF8Type? Though code suggests it is:
>
>composite.addComponent("TEXT1", StringSerializer.get());
>
>
>
> Please validate.
>
>
>
> -Vivek
>
>
>
> On Thu, Jul 18, 2013 at 7:41 PM,  wrote:
>
> Hi
>
>
>
> I have an issue while inserting a composite column name to one of the
> Cassandra column families. Below is a detailed description of what I had
> done and stuck up at.
>
> Please let me know where I had went wrong.
>
>
>
> Requirement:
>
> --
>
>Rowkey-> RowIdString
>
>Column name   -> TEXT1 : value1 : TEXT2 : value2 : TEXT3
>
>Column value -> value3
>
>
>
> Column family definition:
>
> ---
>
> create column family CompositeColumnNameTest
>
>WITH
> comparator='CompositeType(UTF8Type,UTF8Type,UTF8Type,UTF8Type,UTF8Type)'
>
>AND key_validation_class=UTF8Type
>
>WITH compression_options={sstable_compression:SnappyCompressor,
> chunk_length_kb:64};
>
>
>
> Code:
>
> 
>
> String RowIdString = "1234";
>
>
>
>Composite composite = new Composite();
>
>composite.addComponent("TEXT1", StringSerializer.get());
>
>composite.addComponent("value1", StringSerializer.get());
>
>composite.addComponent("TEXT2", StringSerializer.get());
>
>composite.addComponent("value3", StringSerializer.get());
>
>composite.addComponent("TEXT3", StringSerializer.get());
>
>
>
>Column column = new Column(composite.serialize());
>
>column.setValue("value3".getBytes());
>
>column.setTimestamp(System.currentTimeMillis());
>
>
>
>// push data to cassandra
>
>batchMutate.addInsertion(RowIdString, "CompositeColumnNameTest",
> column);
>
>keyspaceServiceImpl.batchMutate(batchMutate);
>
>
>
> Exception:
>
> -
>
> me.prettyprint.hector.api.exceptions.HInvalidRequestException:
> InvalidRequestException(why:Not enough bytes to read value of component 0)
>
>at
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
>
>at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
>
>at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
>
>at
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
>
>at
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
>
>at
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
>
>
>
>
>
> Thanks in advance
>
> -Anand
>
>
>  --
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any appli

RE: Exception while writing compsite column names

2013-07-18 Thread ANAND_BALARAMAN
Output of describe command is:

[default@Test] describe CompositeColumnNameTest;
ColumnFamily: CompositeColumnNameTest
  Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Default column value validator: org.apache.cassandra.db.marshal.UTF8Type
  Columns sorted by: 
org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Ty
pe,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Replicate on write: true
  Caching: KEYS_ONLY
  Bloom Filter FP chance: default
  Built indexes: []
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
chunk_length_kb: 64
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor

Vivek. The problem was not with this composite column.
I was loading yet another column (not composite) which was getting data in the 
below format:
a40be0d3-5c49-446d-a835-ef6af29c016e
bc79d3ef-18fd-4891-8b2c-1fdae404318c
bf15fa45-176c-4749-8481-2c8a2f9dd70f

It is a key (String) which I need to store along with the other data. If I 
leave out this column alone, my loader program works well.
Till it comes to my program, it is displayed as String. But, during insert, I 
get the exception.
Could you please suggest what data type am I supposed to use for such data?

Thanks and Regards
Anand

From: Vivek Mishra [mailto:mishra.v...@gmail.com]
Sent: Thursday, July 18, 2013 1:24 PM
To: user@cassandra.apache.org
Subject: Re: Exception while writing compsite column names

Yes. can you please share output of describe keyspace which contains 
"CompositeColumnNameTest"
What is the datatype for column values?

-Vive

On Thu, Jul 18, 2013 at 9:17 PM, 
mailto:anand_balara...@homedepot.com>> wrote:
I had been using the StringSerilaizer.get() for all UTF8Type fields so far. Do 
not think I need to check the code.
Do you suspect the column family definition?

-Anand

From: Vivek Mishra [mailto:mishra.v...@gmail.com]
Sent: Thursday, July 18, 2013 11:29 AM
To: user@cassandra.apache.org
Subject: Re: Exception while writing compsite column names

Looks like validation class for composite column value is different than 
UTF8Type? Though code suggests it is:
   composite.addComponent("TEXT1", StringSerializer.get());

Please validate.

-Vivek

On Thu, Jul 18, 2013 at 7:41 PM, 
mailto:anand_balara...@homedepot.com>> wrote:
Hi

I have an issue while inserting a composite column name to one of the Cassandra 
column families. Below is a detailed description of what I had done and stuck 
up at.
Please let me know where I had went wrong.

Requirement:
--
   Rowkey-> RowIdString
   Column name   -> TEXT1 : value1 : TEXT2 : value2 : TEXT3
   Column value -> value3

Column family definition:
---
create column family CompositeColumnNameTest
   WITH 
comparator='CompositeType(UTF8Type,UTF8Type,UTF8Type,UTF8Type,UTF8Type)'
   AND key_validation_class=UTF8Type
   WITH compression_options={sstable_compression:SnappyCompressor, 
chunk_length_kb:64};

Code:

String RowIdString = "1234";

   Composite composite = new Composite();
   composite.addComponent("TEXT1", StringSerializer.get());
   composite.addComponent("value1", StringSerializer.get());
   composite.addComponent("TEXT2", StringSerializer.get());
   composite.addComponent("value3", StringSerializer.get());
   composite.addComponent("TEXT3", StringSerializer.get());

   Column column = new Column(composite.serialize());
   column.setValue("value3".getBytes());
   column.setTimestamp(System.currentTimeMillis());

   // push data to cassandra
   batchMutate.addInsertion(RowIdString, "CompositeColumnNameTest", column);
   keyspaceServiceImpl.batchMutate(batchMutate);

Exception:
-
me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
InvalidRequestException(why:Not enough bytes to read value of component 0)
   at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
   at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
   at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(Key

Incorrect row data size

2013-07-18 Thread Paul Ingalls
I'm seeing quite a few of these on pretty much all of the nodes of my 1.2.6 
cluster.  Is this something I should be worried about?  If so, do I need to run 
upgradesstables or run a scrub?

ERROR [CompactionExecutor:4] 2013-07-18 18:49:02,609 CassandraDaemon.java (line 
192) Exception in thread Thread[CompactionExecutor:4,1,main] 
java.lang.AssertionError: incorrect row data size 72128792 written to 
/mnt/datadrive/lib/cassandra/data/fanzo/tweets_by_affiliation/fanzo-tweets_by_affiliation-tmp-ic-918-Data.db;
 correct is 72148465 at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:162) 
 at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
 
 at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
 
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
 
 at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
 
 at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:211)
 
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 
 at java.util.concurrent.FutureTask.run(FutureTask.java:166) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:724)

Thanks!

Paul

Re: sstable size ?

2013-07-18 Thread Langston, Jim
I have been looking at the stuff in the zip file, and also the
sstablesplit command script. This script is looking for a java
class StandaloneSplitter located in the package org.apache.cassandra.tools.

Where is this package located ? I looked in the lib directory but nothing 
contains
the class. Is this something I need to get as well ?

Thanks,

Jim

From: "Langston, Jim" 
mailto:jim.langs...@compuware.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thu, 18 Jul 2013 10:28:39 +
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: sstable size ?

I saw that msg in the thread, I pulled the git files and it looks
like a suite of tools, do I install them on their own ? do I replace the
current ones ? its production data but I can copy the data to where
I want and experiment.

Jim

From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thu, 18 Jul 2013 21:41:24 +1200
To: mailto:user@cassandra.apache.org>>
Subject: Re: sstable size ?

Does this help ? 
http://www.mail-archive.com/user@cassandra.apache.org/msg30973.html

Can you pull the data off the node so you can test it somewhere safe ?

Cheers

-
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/07/2013, at 2:20 PM, "Langston, Jim" 
mailto:jim.langs...@compuware.com>> wrote:

Thanks, this does look like what I'm experiencing. Can someone
post a walkthrough ? The README and the sstablesplit script
don't seem to cover it use in any detail.

Jim

From: Colin Blower mailto:cblo...@barracuda.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Wed, 17 Jul 2013 16:49:59 -0700
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: sstable size ?

Take a look at the very recent thread called 'Alternate "major compaction"'. 
There are some ideas in there about splitting up a large SSTable.

http://www.mail-archive.com/user@cassandra.apache.org/msg30956.html


On 07/17/2013 04:17 PM, Langston, Jim wrote:
Hi all,

Is there a way to get an SSTable to a smaller size ? By this I mean that I
currently have an SSTable that is nearly 1.2G, so that subsequent SSTables
when they compact are trying to grow to that size. The result is that when
the min_compaction_threshold reaches it value and a compaction is needed,
the compaction is taking a long time as the file grows (it is currently at 52MB 
and
takes ~22s to compact).

I'm not sure how the SSTable initially grew to its current size of 1.2G, since 
the
servers have been up for a couple of years. I hadn't noticed until I just 
upgraded to 1.2.6,
but now I see it affects everything.


Jim


--
Colin Blower
Software Engineer
Barracuda Networks Inc.
+1 408-342-5576 (o)



Re: sstable size ?

2013-07-18 Thread Nate McCall
https://github.com/pcmanus/cassandra/tree/sstable_split/src/java/org/apache/cassandra/tools

You'll have to clone Sylvain's 'sstable_split' branch and build from there.

(Commiter folks: this is helpful. @Sylvain - can you commit a patch
under this ticket (or wherever):
https://issues.apache.org/jira/browse/CASSANDRA-4766 - happy to
review).

On Thu, Jul 18, 2013 at 1:59 PM, Langston, Jim
 wrote:
> I have been looking at the stuff in the zip file, and also the
> sstablesplit command script. This script is looking for a java
> class StandaloneSplitter located in the package org.apache.cassandra.tools.
>
> Where is this package located ? I looked in the lib directory but nothing
> contains
> the class. Is this something I need to get as well ?
>
> Thanks,
>
> Jim
>
> From: "Langston, Jim" 
> Reply-To: 
> Date: Thu, 18 Jul 2013 10:28:39 +
>
> To: "user@cassandra.apache.org" 
> Subject: Re: sstable size ?
>
> I saw that msg in the thread, I pulled the git files and it looks
> like a suite of tools, do I install them on their own ? do I replace the
> current ones ? its production data but I can copy the data to where
> I want and experiment.
>
> Jim
>
> From: aaron morton 
> Reply-To: 
> Date: Thu, 18 Jul 2013 21:41:24 +1200
> To: 
> Subject: Re: sstable size ?
>
> Does this help ?
> http://www.mail-archive.com/user@cassandra.apache.org/msg30973.html
>
> Can you pull the data off the node so you can test it somewhere safe ?
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/07/2013, at 2:20 PM, "Langston, Jim" 
> wrote:
>
> Thanks, this does look like what I'm experiencing. Can someone
> post a walkthrough ? The README and the sstablesplit script
> don't seem to cover it use in any detail.
>
> Jim
>
> From: Colin Blower 
> Reply-To: 
> Date: Wed, 17 Jul 2013 16:49:59 -0700
> To: "user@cassandra.apache.org" 
> Subject: Re: sstable size ?
>
> Take a look at the very recent thread called 'Alternate "major compaction"'.
> There are some ideas in there about splitting up a large SSTable.
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg30956.html
>
>
> On 07/17/2013 04:17 PM, Langston, Jim wrote:
>
> Hi all,
>
> Is there a way to get an SSTable to a smaller size ? By this I mean that I
> currently have an SSTable that is nearly 1.2G, so that subsequent SSTables
> when they compact are trying to grow to that size. The result is that when
> the min_compaction_threshold reaches it value and a compaction is needed,
> the compaction is taking a long time as the file grows (it is currently at
> 52MB and
> takes ~22s to compact).
>
> I'm not sure how the SSTable initially grew to its current size of 1.2G,
> since the
> servers have been up for a couple of years. I hadn't noticed until I just
> upgraded to 1.2.6,
> but now I see it affects everything.
>
>
> Jim
>
>
>
> --
> Colin Blower
> Software Engineer
> Barracuda Networks Inc.
> +1 408-342-5576 (o)
>
>


Re: sstable size ?

2013-07-18 Thread Langston, Jim
Thanks, was heading down that path .. after the build it
creates a 1.1.6 cassandra snapshot, I'm currently on 1.2.6 - will I
be able to use the tool ?

Jim

On 7/18/13 3:45 PM, "Nate McCall"  wrote:

>https://github.com/pcmanus/cassandra/tree/sstable_split/src/java/org/apach
>e/cassandra/tools
>
>You'll have to clone Sylvain's 'sstable_split' branch and build from
>there.
>
>(Commiter folks: this is helpful. @Sylvain - can you commit a patch
>under this ticket (or wherever):
>https://issues.apache.org/jira/browse/CASSANDRA-4766 - happy to
>review).
>
>On Thu, Jul 18, 2013 at 1:59 PM, Langston, Jim
> wrote:
>> I have been looking at the stuff in the zip file, and also the
>> sstablesplit command script. This script is looking for a java
>> class StandaloneSplitter located in the package
>>org.apache.cassandra.tools.
>>
>> Where is this package located ? I looked in the lib directory but
>>nothing
>> contains
>> the class. Is this something I need to get as well ?
>>
>> Thanks,
>>
>> Jim
>>
>> From: "Langston, Jim" 
>> Reply-To: 
>> Date: Thu, 18 Jul 2013 10:28:39 +
>>
>> To: "user@cassandra.apache.org" 
>> Subject: Re: sstable size ?
>>
>> I saw that msg in the thread, I pulled the git files and it looks
>> like a suite of tools, do I install them on their own ? do I replace the
>> current ones ? its production data but I can copy the data to where
>> I want and experiment.
>>
>> Jim
>>
>> From: aaron morton 
>> Reply-To: 
>> Date: Thu, 18 Jul 2013 21:41:24 +1200
>> To: 
>> Subject: Re: sstable size ?
>>
>> Does this help ?
>> http://www.mail-archive.com/user@cassandra.apache.org/msg30973.html
>>
>> Can you pull the data off the node so you can test it somewhere safe ?
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 18/07/2013, at 2:20 PM, "Langston, Jim" 
>> wrote:
>>
>> Thanks, this does look like what I'm experiencing. Can someone
>> post a walkthrough ? The README and the sstablesplit script
>> don't seem to cover it use in any detail.
>>
>> Jim
>>
>> From: Colin Blower 
>> Reply-To: 
>> Date: Wed, 17 Jul 2013 16:49:59 -0700
>> To: "user@cassandra.apache.org" 
>> Subject: Re: sstable size ?
>>
>> Take a look at the very recent thread called 'Alternate "major
>>compaction"'.
>> There are some ideas in there about splitting up a large SSTable.
>>
>> http://www.mail-archive.com/user@cassandra.apache.org/msg30956.html
>>
>>
>> On 07/17/2013 04:17 PM, Langston, Jim wrote:
>>
>> Hi all,
>>
>> Is there a way to get an SSTable to a smaller size ? By this I mean
>>that I
>> currently have an SSTable that is nearly 1.2G, so that subsequent
>>SSTables
>> when they compact are trying to grow to that size. The result is that
>>when
>> the min_compaction_threshold reaches it value and a compaction is
>>needed,
>> the compaction is taking a long time as the file grows (it is currently
>>at
>> 52MB and
>> takes ~22s to compact).
>>
>> I'm not sure how the SSTable initially grew to its current size of 1.2G,
>> since the
>> servers have been up for a couple of years. I hadn't noticed until I
>>just
>> upgraded to 1.2.6,
>> but now I see it affects everything.
>>
>>
>> Jim
>>
>>
>>
>> --
>> Colin Blower
>> Software Engineer
>> Barracuda Networks Inc.
>> +1 408-342-5576 (o)
>>
>>
>




Recommended data size for Reads/Writes in Cassandra

2013-07-18 Thread hajjat
Hi,

Is there a recommended data size for Reads/Writes in Cassandra? I tried
inserting 10 MB objects and the latency I got was pretty high. Also, I was
never able to insert larger objects (say 50 MB) since Cassandra kept
crashing when I tried that.

Here is my experiment setup: 
I used two Large VMs in EC2 within the same data-center. Inserts have ALL
consistency (strong consistency).  The latencies were as follows:
Data size:  10 MB   1 MB100 Bytes
Latency:250ms   50ms8ms

I've also done the same for two Large VMs across two data-centers. The
latencies were around:
Data size:  10 MB   1 MB100 Bytes
Latency:1200ms  800ms   80ms

1) Ain't the 10 MB latency extremely high? 
2) Is there a recommended data size to use with Cassandra (e.g., a few bytes
up to 1 MB)?
3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
anybody know why? I thought the max data size should be up to 2 GB?

Thanks,
Mohammad

PS. Here is my python code I use to insert into Cassandra. I put my
stopwatch timers around the insert statement:
fh = open(TEST_FILE,'r')
data = str(fh.read())

POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
timeout=None)
USER = ColumnFamily(POOL, 'User')
USER.insert('Ali', {'data':
data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


MailBox Impl

2013-07-18 Thread Kanwar Sangha
Hi  - We are planning on using Cassandra for an IMAP based implementation. 
There are some questions that we are stuck with -


1)  Each user will have a pre-defined mailbox size (say 10 MB). We need to 
maintain a field to check if the mail-box size exceeds the predefined size. 
Will using the counter family be appropriate ?

2)  Also, we need to have retention for only 10 days. After 10 days, the 
previous days data will be removed. We plan to have TTL defined per message. 
But if we do that, how does the counter in question 1 get updated with the 
space cleaned due to deletion ?

3)  Do we NOT have TTL and manage the deletions within the application 
itself ?

Thanks,
Kanwar



Re: Recommended data size for Reads/Writes in Cassandra

2013-07-18 Thread Andrey Ilinykh
there is a limit of thrift message ( thrift_max_message_length_in_mb), by
default it is 64m if I'm not mistaken. This is your limit.


On Thu, Jul 18, 2013 at 2:03 PM, hajjat  wrote:

> Hi,
>
> Is there a recommended data size for Reads/Writes in Cassandra? I tried
> inserting 10 MB objects and the latency I got was pretty high. Also, I was
> never able to insert larger objects (say 50 MB) since Cassandra kept
> crashing when I tried that.
>
> Here is my experiment setup:
> I used two Large VMs in EC2 within the same data-center. Inserts have ALL
> consistency (strong consistency).  The latencies were as follows:
> Data size:  10 MB   1 MB100 Bytes
> Latency:250ms   50ms8ms
>
> I've also done the same for two Large VMs across two data-centers. The
> latencies were around:
> Data size:  10 MB   1 MB100 Bytes
> Latency:1200ms  800ms   80ms
>
> 1) Ain't the 10 MB latency extremely high?
> 2) Is there a recommended data size to use with Cassandra (e.g., a few
> bytes
> up to 1 MB)?
> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
> anybody know why? I thought the max data size should be up to 2 GB?
>
> Thanks,
> Mohammad
>
> PS. Here is my python code I use to insert into Cassandra. I put my
> stopwatch timers around the insert statement:
> fh = open(TEST_FILE,'r')
> data = str(fh.read())
>
> POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
> timeout=None)
> USER = ColumnFamily(POOL, 'User')
> USER.insert('Ali', {'data':
>
> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
>
>
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: sstable size ?

2013-07-18 Thread Nate McCall
Without digging I'd so no - the SSTable versions will be pretty different.

You could test this pretty easily in isolation though just on a local
instance - I think the issue will be 1.1.6 reading the 1.2.6 SSTable
as the other way should be backwards compatible. Someone jump in if
i'm wrong?

On Thu, Jul 18, 2013 at 3:46 PM, Langston, Jim
 wrote:
> Thanks, was heading down that path .. after the build it
> creates a 1.1.6 cassandra snapshot, I'm currently on 1.2.6 - will I
> be able to use the tool ?
>
> Jim
>
> On 7/18/13 3:45 PM, "Nate McCall"  wrote:
>
>>https://github.com/pcmanus/cassandra/tree/sstable_split/src/java/org/apach
>>e/cassandra/tools
>>
>>You'll have to clone Sylvain's 'sstable_split' branch and build from
>>there.
>>
>>(Commiter folks: this is helpful. @Sylvain - can you commit a patch
>>under this ticket (or wherever):
>>https://issues.apache.org/jira/browse/CASSANDRA-4766 - happy to
>>review).
>>
>>On Thu, Jul 18, 2013 at 1:59 PM, Langston, Jim
>> wrote:
>>> I have been looking at the stuff in the zip file, and also the
>>> sstablesplit command script. This script is looking for a java
>>> class StandaloneSplitter located in the package
>>>org.apache.cassandra.tools.
>>>
>>> Where is this package located ? I looked in the lib directory but
>>>nothing
>>> contains
>>> the class. Is this something I need to get as well ?
>>>
>>> Thanks,
>>>
>>> Jim
>>>
>>> From: "Langston, Jim" 
>>> Reply-To: 
>>> Date: Thu, 18 Jul 2013 10:28:39 +
>>>
>>> To: "user@cassandra.apache.org" 
>>> Subject: Re: sstable size ?
>>>
>>> I saw that msg in the thread, I pulled the git files and it looks
>>> like a suite of tools, do I install them on their own ? do I replace the
>>> current ones ? its production data but I can copy the data to where
>>> I want and experiment.
>>>
>>> Jim
>>>
>>> From: aaron morton 
>>> Reply-To: 
>>> Date: Thu, 18 Jul 2013 21:41:24 +1200
>>> To: 
>>> Subject: Re: sstable size ?
>>>
>>> Does this help ?
>>> http://www.mail-archive.com/user@cassandra.apache.org/msg30973.html
>>>
>>> Can you pull the data off the node so you can test it somewhere safe ?
>>>
>>> Cheers
>>>
>>> -
>>> Aaron Morton
>>> Cassandra Consultant
>>> New Zealand
>>>
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 18/07/2013, at 2:20 PM, "Langston, Jim" 
>>> wrote:
>>>
>>> Thanks, this does look like what I'm experiencing. Can someone
>>> post a walkthrough ? The README and the sstablesplit script
>>> don't seem to cover it use in any detail.
>>>
>>> Jim
>>>
>>> From: Colin Blower 
>>> Reply-To: 
>>> Date: Wed, 17 Jul 2013 16:49:59 -0700
>>> To: "user@cassandra.apache.org" 
>>> Subject: Re: sstable size ?
>>>
>>> Take a look at the very recent thread called 'Alternate "major
>>>compaction"'.
>>> There are some ideas in there about splitting up a large SSTable.
>>>
>>> http://www.mail-archive.com/user@cassandra.apache.org/msg30956.html
>>>
>>>
>>> On 07/17/2013 04:17 PM, Langston, Jim wrote:
>>>
>>> Hi all,
>>>
>>> Is there a way to get an SSTable to a smaller size ? By this I mean
>>>that I
>>> currently have an SSTable that is nearly 1.2G, so that subsequent
>>>SSTables
>>> when they compact are trying to grow to that size. The result is that
>>>when
>>> the min_compaction_threshold reaches it value and a compaction is
>>>needed,
>>> the compaction is taking a long time as the file grows (it is currently
>>>at
>>> 52MB and
>>> takes ~22s to compact).
>>>
>>> I'm not sure how the SSTable initially grew to its current size of 1.2G,
>>> since the
>>> servers have been up for a couple of years. I hadn't noticed until I
>>>just
>>> upgraded to 1.2.6,
>>> but now I see it affects everything.
>>>
>>>
>>> Jim
>>>
>>>
>>>
>>> --
>>> Colin Blower
>>> Software Engineer
>>> Barracuda Networks Inc.
>>> +1 408-342-5576 (o)
>>>
>>>
>>
>
>


Re: Recommended data size for Reads/Writes in Cassandra

2013-07-18 Thread Tyler Hobbs
The default limit is 16mb, but realistically you should try to keep writes
under 10mb, breaking up large values into multiple columns/rows if
necessary.


On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh  wrote:

> there is a limit of thrift message ( thrift_max_message_length_in_mb), by
> default it is 64m if I'm not mistaken. This is your limit.
>
>
> On Thu, Jul 18, 2013 at 2:03 PM, hajjat  wrote:
>
>> Hi,
>>
>> Is there a recommended data size for Reads/Writes in Cassandra? I tried
>> inserting 10 MB objects and the latency I got was pretty high. Also, I was
>> never able to insert larger objects (say 50 MB) since Cassandra kept
>> crashing when I tried that.
>>
>> Here is my experiment setup:
>> I used two Large VMs in EC2 within the same data-center. Inserts have ALL
>> consistency (strong consistency).  The latencies were as follows:
>> Data size:  10 MB   1 MB100 Bytes
>> Latency:250ms   50ms8ms
>>
>> I've also done the same for two Large VMs across two data-centers. The
>> latencies were around:
>> Data size:  10 MB   1 MB100 Bytes
>> Latency:1200ms  800ms   80ms
>>
>> 1) Ain't the 10 MB latency extremely high?
>> 2) Is there a recommended data size to use with Cassandra (e.g., a few
>> bytes
>> up to 1 MB)?
>> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
>> anybody know why? I thought the max data size should be up to 2 GB?
>>
>> Thanks,
>> Mohammad
>>
>> PS. Here is my python code I use to insert into Cassandra. I put my
>> stopwatch timers around the insert statement:
>> fh = open(TEST_FILE,'r')
>> data = str(fh.read())
>>
>> POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
>> timeout=None)
>> USER = ColumnFamily(POOL, 'User')
>> USER.insert('Ali', {'data':
>>
>> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive
>> at Nabble.com.
>>
>
>


-- 
Tyler Hobbs
DataStax 


Re: MailBox Impl

2013-07-18 Thread sankalp kohli
Conter wont be updated when the old data is TTLed. I am not sure whether
you can use triggers https://issues.apache.org/jira/browse/CASSANDRA-1311


On Thu, Jul 18, 2013 at 2:30 PM, Kanwar Sangha  wrote:

>  Hi  - We are planning on using Cassandra for an IMAP based
> implementation. There are some questions that we are stuck with –
>
> ** **
>
> **1)  **Each user will have a pre-defined mailbox size (say 10 MB).
> We need to maintain a field to check if the mail-box size exceeds the
> predefined size. Will using the counter family be appropriate ?
>
> **2)  **Also, we need to have retention for only 10 days. After 10
> days, the previous days data will be removed. We plan to have TTL defined
> per message. But if we do that, how does the counter in question 1 get
> updated with the space cleaned due to deletion ?
>
> **3)  **Do we NOT have TTL and manage the deletions within the
> application itself ? 
>
> ** **
>
> Thanks,
>
> Kanwar
>
>   
>


Re: Recommended data size for Reads/Writes in Cassandra

2013-07-18 Thread Mohammad Hajjat
Thanks Andrey and Tyler! That was useful :)

Do you guys have any idea why the 10 MB writes took a lot of time in my
case although I'm using Large VMs which have plenty of resources? Or do you
think this latency is expected?
I'm trying to see how much time is spent in the network versus processing
CPU cycles of the nodes; any suggestion for a good profiling tool?



On Thu, Jul 18, 2013 at 5:50 PM, Tyler Hobbs  wrote:

> The default limit is 16mb, but realistically you should try to keep writes
> under 10mb, breaking up large values into multiple columns/rows if
> necessary.
>
>
> On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh wrote:
>
>> there is a limit of thrift message ( thrift_max_message_length_in_mb), by
>> default it is 64m if I'm not mistaken. This is your limit.
>>
>>
>> On Thu, Jul 18, 2013 at 2:03 PM, hajjat  wrote:
>>
>>> Hi,
>>>
>>> Is there a recommended data size for Reads/Writes in Cassandra? I tried
>>> inserting 10 MB objects and the latency I got was pretty high. Also, I
>>> was
>>> never able to insert larger objects (say 50 MB) since Cassandra kept
>>> crashing when I tried that.
>>>
>>> Here is my experiment setup:
>>> I used two Large VMs in EC2 within the same data-center. Inserts have ALL
>>> consistency (strong consistency).  The latencies were as follows:
>>> Data size:  10 MB   1 MB100 Bytes
>>> Latency:250ms   50ms8ms
>>>
>>> I've also done the same for two Large VMs across two data-centers. The
>>> latencies were around:
>>> Data size:  10 MB   1 MB100 Bytes
>>> Latency:1200ms  800ms   80ms
>>>
>>> 1) Ain't the 10 MB latency extremely high?
>>> 2) Is there a recommended data size to use with Cassandra (e.g., a few
>>> bytes
>>> up to 1 MB)?
>>> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
>>> anybody know why? I thought the max data size should be up to 2 GB?
>>>
>>> Thanks,
>>> Mohammad
>>>
>>> PS. Here is my python code I use to insert into Cassandra. I put my
>>> stopwatch timers around the insert statement:
>>> fh = open(TEST_FILE,'r')
>>> data = str(fh.read())
>>>
>>> POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
>>> timeout=None)
>>> USER = ColumnFamily(POOL, 'User')
>>> USER.insert('Ali', {'data':
>>>
>>> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
>>> Sent from the cassandra-u...@incubator.apache.org mailing list archive
>>> at Nabble.com.
>>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax 
>



-- 
*Mohammad Hajjat*
*Ph.D. Student*
*Electrical and Computer Engineering*
*Purdue University*


Re: Recommended data size for Reads/Writes in Cassandra

2013-07-18 Thread Tyler Hobbs
Large writes can sometimes put a lot of heap/GC pressure on the node, which
can be an additional source of latency.  Use the query tracing in Cassandra
1.2+ to get a better picture of where the latency is.


On Thu, Jul 18, 2013 at 6:18 PM, Mohammad Hajjat  wrote:

> Thanks Andrey and Tyler! That was useful :)
>
> Do you guys have any idea why the 10 MB writes took a lot of time in my
> case although I'm using Large VMs which have plenty of resources? Or do you
> think this latency is expected?
> I'm trying to see how much time is spent in the network versus processing
> CPU cycles of the nodes; any suggestion for a good profiling tool?
>
>
>
> On Thu, Jul 18, 2013 at 5:50 PM, Tyler Hobbs  wrote:
>
>> The default limit is 16mb, but realistically you should try to keep
>> writes under 10mb, breaking up large values into multiple columns/rows if
>> necessary.
>>
>>
>> On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh wrote:
>>
>>> there is a limit of thrift message ( thrift_max_message_length_in_mb),
>>> by default it is 64m if I'm not mistaken. This is your limit.
>>>
>>>
>>> On Thu, Jul 18, 2013 at 2:03 PM, hajjat  wrote:
>>>
 Hi,

 Is there a recommended data size for Reads/Writes in Cassandra? I tried
 inserting 10 MB objects and the latency I got was pretty high. Also, I
 was
 never able to insert larger objects (say 50 MB) since Cassandra kept
 crashing when I tried that.

 Here is my experiment setup:
 I used two Large VMs in EC2 within the same data-center. Inserts have
 ALL
 consistency (strong consistency).  The latencies were as follows:
 Data size:  10 MB   1 MB100 Bytes
 Latency:250ms   50ms8ms

 I've also done the same for two Large VMs across two data-centers. The
 latencies were around:
 Data size:  10 MB   1 MB100 Bytes
 Latency:1200ms  800ms   80ms

 1) Ain't the 10 MB latency extremely high?
 2) Is there a recommended data size to use with Cassandra (e.g., a few
 bytes
 up to 1 MB)?
 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
 anybody know why? I thought the max data size should be up to 2 GB?

 Thanks,
 Mohammad

 PS. Here is my python code I use to insert into Cassandra. I put my
 stopwatch timers around the insert statement:
 fh = open(TEST_FILE,'r')
 data = str(fh.read())

 POOL = ConnectionPool(keyspace, server_list=['localhost:9160'],
 timeout=None)
 USER = ColumnFamily(POOL, 'User')
 USER.insert('Ali', {'data':

 data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL)




 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive
 at Nabble.com.

>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>
>
>
> --
> *Mohammad Hajjat*
> *Ph.D. Student*
> *Electrical and Computer Engineering*
> *Purdue University*
>



-- 
Tyler Hobbs
DataStax 


Re: Intresting issue with getting Order By to work...

2013-07-18 Thread Tony Anecito
Hi Rob,
 
Thanks for the feedback. I had heard about this in regards to CQL created table 
not being visible to CLI but have not seen any examples of setting up CQL 
"table" to be visible by CLI.
 
Best Regards,
-Tony
 
 


 From: Robert Coli 
To: "user@cassandra.apache.org" ; Tony Anecito 
 
Sent: Thursday, July 18, 2013 10:16 AM
Subject: Re: Intresting issue with getting Order By to work...
  


On Thu, Jul 18, 2013 at 8:12 AM, Tony Anecito  wrote: 

As I work more with CQL and CLI as some other posting I have seen regarding 
usage I am thinking that CLI for keyspace and Column Family setup and 
maintenance is best
> while CQL for queries/inserts ect is best. Mainly I am thinking this because 
>of better control over the schema using CLI. 

The question is not really CQL vs CLI, it's COMPACT STORAGE vs. (NON-COMPACT) 
CQL storage. Picking one or the other strongly informs whether you want to also 
use Thrift or CQL (respectively) as an interface. 

=Rob

CL1 and CLQ with 5 nodes cluster and 3 alives node

2013-07-18 Thread cbert...@libero.it
Hi all,
I'm experiencing some problems after 3 years of cassandra in production (from 
0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory 
Exception.
In the log I can read the warn about the few heap available ... now I'm 
increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing the 
size of rows and memtables thresholds. Other tips?

Now a question -- why with 2 nodes offline all my application stop providing 
the service, even when a Consistency Level One read is invoked?
I'd expected this behaviour:

CL1 operations keep working
more than 80% of CLQ operations working (nodes offline where 2 and 5 in a 
clockwise key distribution only writes to fifth node should impact to node 2)
most of all CLALL operations (that I don't use) failing

The situation instead was that I had ALL services stop responding throwing a 
TTransportException ...

Thanks in advance

Carlo