Re: Reduce Cassandra GC

2013-04-17 Thread Joel Samuelsson
You're right, it's probably hard. I should have provided more data.

I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the
log indicates that JNA is working, please correct me if I'm wrong:
CLibrary.java (line 111) JNA mlockall successful

Total amount of RAM is 4GB.

My description of data size was very bad. Sorry about that. Data set size
is 12.3 GB per node, compressed.

Heap size is 998.44MB according to nodetool info.
Key cache is 49MB bytes according to nodetool info.
Row cache size is 0 bytes acoording to nodetool info.
Max new heap is 205MB kbytes according to Memory Pool "Par Eden Space" max
in jconsole.
Memtable is left at default which should give it 333MB according to
documentation (uncertain where I can verify this).

Our production cluster seems similar to your dev cluster so possibly
increasing the heap to 2GB might help our issues.

I am still interested in getting rough estimates of how much heap will be
needed as data grows. Other than empirical studies how would I go about
getting such estimates?


2013/4/16 Viktor Jevdokimov 

>  How one could provide any help without any knowledge about your cluster,
> node and environment settings?
>
> ** **
>
> 40GB was calculated from 2 nodes with RF=2 (each has 100% data range),
> 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any
> overhead (sstable, bloom filters and indexes).
>
> ** **
>
> With ParNew GC time such as yours even if it is a swapping issue I could
> say only that heap size is too small.
>
> ** **
>
> Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is
> JNA installed and used? What is total amount of RAM?
>
> ** **
>
> Just for a DEV environment we use 3 virtual machines with 4GB RAM and use
> 2GB heap without any GC issue with amount of data from 0 to 16GB compressed
> on each node. Memtable space sized to 100MB, New Heap 400MB.
>
> ** **
>Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider 
> Take a ride with Adform's Rich Media Suite
>  [image: Adform News] 
> [image: Adform awarded the Best Employer 2012]
> 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
> *Sent:* Tuesday, April 16, 2013 12:52
> *To:* user@cassandra.apache.org
> *Subject:* Re: Reduce Cassandra GC
>
> ** **
>
> How do you calculate the heap / data size ratio? Is this a linear ratio?**
> **
>
> ** **
>
> Each node has slightly more than 12 GB right now though.
>
> ** **
>
> 2013/4/16 Viktor Jevdokimov 
>
> For a >40GB of data 1GB of heap is too low.
>
>  
>
> Best regards / Pagarbiai
>
> *Viktor Jevdokimov*
>
> Senior Developer
>
> ** **
>
> Email: viktor.jevdoki...@adform.com
>
> Phone: +370 5 212 3063, Fax +370 5 261 0453
>
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>
> Follow us on Twitter: @adforminsider 
> 
>
> Take a ride with Adform's Rich Media Suite
> 
>
> [image: Adform News] 
>
> [image: Adform awarded the Best Employer 
> 2012]
> 
>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies. 
>
> ** **
>
> *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
> *Sent:* Tuesday, April 16, 2013 10:47
> *To:* user@cassandra.apache.org
> *Subject:* Reduce Cassandra GC
>
>  
>
> Hi,
>
>  
>
> We have a small production cluster with two nodes. The load on the nodes
> is very small, around 20 reads / sec and about the same for writes. There
> are around 2.5 million keys in the cluster and a RF of 2.*

Key-Token mapping in cassandra

2013-04-17 Thread Ravikumar Govindarajan
We would like to map multiple keys to a single token in cassandra. I
believe this should be possible now with CASSANDRA-1034

Ex:

Key1 --> 123/IMAGE
Key2 --> 123/DOCUMENTS
Key3 --> 123/MULTIMEDIA

I would like all keys with "123" as prefix to be mapped to a single token.

Is this possible? What should be the Partitioner that I should most likely
extend and write my own to achieve the desired result?

--
Ravi


InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Andre Tavares
Hi,

I am getting an exception when I run Hadoop with Cassandra that follows:

WARN org.apache.hadoop.mapred.Child (main): Error running child
java.lang.RuntimeException: InvalidRequestException(why:Start key's token
sorts after end token)
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:453)

I don't know what exactly this message means and how to solve the problem
... I am using Priam for manager my cluster in Cassandra over Elastic
Map/Reduce on Amazon ...

Any hint helps ...

Thanks,

Andre


Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Hiller, Dean
I literally jut replied to your stackoverflow comment then saw this email.  I 
need the whole stack trace.  My guess is the ColFamily is configured for one 
sort method where map/reduce is using another or something when querying but 
that's just a guess.

Dean

From: Andre Tavares mailto:andre...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, April 17, 2013 6:47 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: InvalidRequestException: Start key's token sorts after end token

know what exactly this message means a


Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Andre Tavares
Dean,

sorry,  but I saw your comments on Stackoverflow (
http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr)
just after I sent this message ...

and I think you may be right about the sort method,  but Priam sets
 Cassandra partitioner with "RandomPartitioner", and maybe the correct
could be "Murmur3Partitioner" when we use Hadoop (I am not sure too) ... if
that is true I got a problem because I can't change the partitioner with
Priam (I think it only works with RandomPartitioner) ...

Andre

2013/4/17 Hiller, Dean 

> I literally jut replied to your stackoverflow comment then saw this email.
>  I need the whole stack trace.  My guess is the ColFamily is configured for
> one sort method where map/reduce is using another or something when
> querying but that's just a guess.
>
> Dean
>
> From: Andre Tavares mailto:andre...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Wednesday, April 17, 2013 6:47 AM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: InvalidRequestException: Start key's token sorts after end token
>
> know what exactly this message means a
>


Getting error while inserting data in cassandra table using Java with JDBC

2013-04-17 Thread himanshu.joshi

Hi,


When I am trying to insert the data into a table using Java with JDBC, I 
am getting the error


InvalidRequestException(why:cannot parse 'Jo' as hex bytes)

My insert quarry is:
insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10);

This insert quarry is running successfully from CQLSH command prompt but 
not from the code


The quarry I have used to create the table in CQLSH is:

CREATE TABLE temp (
  id bigint PRIMARY KEY,
  dt_stamp timestamp,
  name text,
  url_id bigint,
  value text
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};



I guess the problem may because of undefined 
key_validation_class,default_validation_class and comparator etc.

Is there any way to define these attributes using CQLSH ?
I have already tried ASSUME command but it also have not resolved the 
problem.


I am a beginner in cassandra and need your guidance.

--
Thanks & Regards,
Himanshu Joshi



Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Hiller, Dean
What's the stack trace you see?  At the time, I was thinking column scan not 
row scan as perhaps your code or priam's code was doing a column slice within a 
row set and the columns are sorted by Integer while priam is passing in UTF8 or 
vice-versa.  Ie. Do we know if this is a column sorting issue or a row one?

Dean

From: Andre Tavares mailto:andre...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Wednesday, April 17, 2013 7:09 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: InvalidRequestException: Start key's token sorts after end token

Dean,

sorry,  but I saw your comments on Stackoverflow 
(http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr
 ) just after I sent this message ...

and I think you may be right about the sort method,  but Priam sets  Cassandra 
partitioner with "RandomPartitioner", and maybe the correct could be 
"Murmur3Partitioner" when we use Hadoop (I am not sure too) ... if that is true 
I got a problem because I can't change the partitioner with Priam (I think it 
only works with RandomPartitioner) ...

Andre

2013/4/17 Hiller, Dean mailto:dean.hil...@nrel.gov>>
I literally jut replied to your stackoverflow comment then saw this email.  I 
need the whole stack trace.  My guess is the ColFamily is configured for one 
sort method where map/reduce is using another or something when querying but 
that's just a guess.

Dean

From: Andre Tavares 
mailto:andre...@gmail.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Date: Wednesday, April 17, 2013 6:47 AM
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Subject: InvalidRequestException: Start key's token sorts after end token

know what exactly this message means a



looking at making astyanax asynchronous but cassandra-thrift-1.1.1 doesn't look right

2013-04-17 Thread Hiller, Dean
Is cassandra-thrift-1.1.1.jar the generated code?  I see a send() and recv() 
but I don't see a send(Callback cb) that is typicaly of true asynchronous 
platforms.  Ie. I don't know when to call recv myself obviously if I am trying 
to make astyanax truly asynchronous.

The reason I ask is we have a 100k row upload that with synchronous 20 threads 
takes around 30 seconds and with simulation, we predict this would be done in 3 
seconds with an asynch api as our threads would not get held up like they do 
now.  I guess we can try to crank it up to 100 threads to get it running a bit 
faster for now :( :(.

Thanks,
Dean


Re: How to stop Cassandra and then restart it in windows?

2013-04-17 Thread Raihan Jamal
Hello,

Can anyone provide any help on this?

Thanks in advance.






*Raihan Jamal*


On Tue, Apr 16, 2013 at 6:50 PM, Raihan Jamal  wrote:

> Hello,
>
> I installed single node cluster in my local dev box which is running
> Windows 7 and it was working fine. Due to some reason, I need to restart my
> desktop and then after that whenever I am doing like this on the command
> prompt, it always gives me the below exception-
>
> S:\Apache Cassandra\apache-cassandra-1.2.3\bin>cassandra -f
> Starting Cassandra Server
> Error: Exception thrown by the agent : java.rmi.server.ExportException:
> Port already in use: 7199; nested exception is:
> java.net.BindException: Address already in use: JVM_Bind
>
>
> Meaning port being used somewhere. I have made some changes in *cassandra.yaml
> *file so I need to shutdown the Cassandra server and then restart it
> again.
>
> Can anybody help me with this?
>
> Thanks for the help.
>
>
>


Re: Added extra column as composite key while creation counter column family

2013-04-17 Thread Robert Coli
On Tue, Apr 16, 2013 at 10:29 PM, Kuldeep Mishra
wrote:

> cassandra 1.2.0
>
> Is it a bug in  1.2.0 ?
>

While I can't speak to this specific issue, 1.2.0 has meaningful known
issues. I suggest upgrade to 1.2.3(/4) ASAP.

=Rob


Re: Thrift message length exceeded

2013-04-17 Thread Lanny Ripple
That was our first thought.  Using maven's dependency tree info we verified
that we're using the expected (cass 1.2.3) jars

$ mvn dependency:tree | grep thrift
[INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
[INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile

I've also dumped the final command run by the hadoop we use (CDH3u5) and
verified it's not sneaking thrift in on us.


On Tue, Apr 16, 2013 at 4:36 PM, aaron morton wrote:

> Can you confirm the you are using the same thrift version that ships 1.2.3
> ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/04/2013, at 10:17 AM, Lanny Ripple  wrote:
>
> A bump to say I found this
>
>
> http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
>
> so others are seeing similar behavior.
>
> From what I can see of org.apache.cassandra.hadoop nothing has changed
> since 1.1.5 when we didn't see such things but sure looks like there's a
> bug that's slipped in (or been uncovered) somewhere.  I'll try to narrow
> down to a dataset and code that can reproduce.
>
> On Apr 10, 2013, at 6:29 PM, Lanny Ripple  wrote:
>
> We are using Astyanax in production but I cut back to just Hadoop and
> Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
>
> We do have some extremely large rows but we went from everything working
> with 1.1.5 to almost everything carping with 1.2.3.  Something has changed.
>  Perhaps we were doing something wrong earlier that 1.2.3 exposed but
> surprises are never welcome in production.
>
> On Apr 10, 2013, at 8:10 AM,  wrote:
>
> I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6
> to 0.8
> Turns out the Thrift message really was too long.
> The mystery to me: Why no complaints in previous versions? Were some
> checks added in Thrift or Hector?
>
> -Original Message-
> From: Lanny Ripple [mailto:la...@spotright.com]
> Sent: Tuesday, April 09, 2013 6:17 PM
> To: user@cassandra.apache.org
> Subject: Thrift message length exceeded
>
> Hello,
>
> We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran
> sstableupgrades and got the ring on its feet and we are now seeing a new
> issue.
>
> When we run MapReduce jobs against practically any table we find the
> following errors:
>
> 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader:
> Loaded the native-hadoop library
> 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid
> exited with exit code 0
> 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
> 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running
> child
> java.lang.RuntimeException: org.apache.thrift.TException: Message length
> exceeded: 106
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
> at org.apache.hadoop.mapred.Child.main(Child.java:260)
> Caused by: org.apache.thrift.TException: Message length exceeded: 106
> at
> org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
> at org.apache.cassandra.thrift.Column.read(Column.java:528)
> at
> org.apache.cassa

Re: MySQL Cluster performing faster than Cassandra cluster on single table

2013-04-17 Thread aaron morton
How many threads / processes do you have performing the writes? 
How big are the mutations ? 
Where are you measuring the latency ? 

Look at the nodetool cfhistograms to see the time it takes for a single node to 
perform a write. 
Look at the nodetool proxyhistograms to see the end to end request latency from 
the coordinator. 
^ the number on the left is microseconds for both. 

Generally cassandra does well with more clients. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 2:56 PM, Jabbar Azam  wrote:

> MySQL cluster also has the index in ram.  So with lots of rows the ram 
> becomes a limiting factor.
> 
> That's what my colleague found and hence why were sticking with Cassandra.
> 
> On 16 Apr 2013 21:05, "horschi"  wrote:
> 
> 
> Ah, I see, that makes sense. Have you got a source for the storing of 
> hundreds of gigabytes? And does Cassandra not store anything in memory?
> It stores bloom filters and index-samples in memory. But they are much 
> smaller than the actual data and they can be configured.
>  
> 
> Yeah, my dataset is small at the moment - perhaps I should have chosen 
> something larger for the work I'm doing (University dissertation), however, 
> it is far too late to change now!
> On paper mysql-cluster looks great. But in daily use its not as nice as 
> Cassandra (where you have machines dying, networks splitting, etc.).
> 
> cheers,
> Christian



Re: differences between DataStax Community Edition and Cassandra package

2013-04-17 Thread aaron morton
It's the same as the Apache version, but DSC comes with samples and the free 
version of Ops Centre. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 6:36 PM, Francisco Trujillo  wrote:

> Hi everyone
>  
> Probably this question has been formulated for someone in the past. We are 
> using apache Cassandra 1.6 now and we are planning to update the version. 
> Datastax provides their own Cassandra package called “Datastax Community 
> Edition”. I know that the Datastax package have some tools to manage the 
> cluster like visual interfaces, but
>  
> is there some important difference in the database itself is we compared with 
> the same apache Cassandra that we can download from 
> http://cassandra.apache.org/?
>  
> Thanks for your help in advanced



Re: Cassandra Client Recommendation

2013-04-17 Thread aaron morton
One node on the native binary protocol, AFAIK it's still considered beta in 1.2

Also +1 for Astyanax

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 6:50 PM, Francisco Trujillo  wrote:

> Hi
>  
> We are using Cassandra 1.6 at this moment. We start to work with Hector, 
> because it is the first recommendation that you can find in a simple google 
> search for java clients Cassandra.
>  
> We start using Hector but when we start to have non dynamically column 
> families, that can be managed using cql, we start to use astyanax because:
> -  It is easy to understand the code even for people who has never 
> worked with Cassandra.
> -  The cql implementation offer more capabilities
> -  Astyanax is prepared to use Cql 3 and with hector we experienced 
> some problems (probably our fault, but with Astyanax everything works from 
> the beginning).
> -  Astyanax allow to use compound primary keys.
>  
> In next months we are going to substitute Hector by Astyanax totally but at 
> this moment we are using both:
>  
> -  Astyanax for cql.
> -  Hector for dynamic column families.
>  
>  
> From: Techy Teck [mailto:comptechge...@gmail.com] 
> Sent: woensdag 17 april 2013 8:14
> To: user
> Subject: Re: Cassandra Client Recommendation
>  
> Thanks Everton for the suggestion. Couple of questions-
>  
> 1) Does Astyanax client have any problem with previous version of Cassandra?
> 2) You said one problem, that it will consume more memory? Can you elaborate 
> that slightly? What do you mean by that?
> 3) Does Astyanax supports asynch capabilities?
>  
> 
> On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima  
> wrote:
> Hi Techy,
> 
> We are using Astyanax with cassandra 1.2.4. 
> 
> beneficits:
>  * It is so easy to configure and use.
>  * Good wiki
>  * Mantained by Netflix
>  * Solution to manage the store of big files (more than 15mb)
>  * Solution to read all rows efficiently
> 
> problems:
>  * It consume more memory
>  
> 
> 2013/4/16 Techy Teck 
> Hello,
> 
> I have recently started working with Cassandra Database. Now I am in the 
> process of evaluating which Cassandra client I should go forward with.
> I am mainly interested in these three-
> 
> --1)  Astyanax client
> 
> 2--)  New Datastax client that uses Binary protocol.
> 
> --3)  Pelops client
> 
>  
> 
> Can anyone provide some thoughts on this? Some advantages and disadvantages 
> for these three will be great start for me.
> 
>  
> 
> Keeping in mind, we are running Cassandra 1.2.2 in production environment.
>  
> 
> Thanks for the help.
> 
> 
> 
> -- 
> Everton Lima Aleixo
> Bacharel em Ciência da Computação pela UFG
> Mestrando em Ciência da Computação pela UFG
> Programador no LUPA
>  



Re: Reduce Cassandra GC

2013-04-17 Thread aaron morton
> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) 
> GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600
This does not say that the heap is full. 
ParNew is GC activity for the new heap, which is typically a smaller part of 
the overall heap. 

It sounds like you are running with defaults for the memory config, which is 
generally a good idea. But 4GB total memory for a node is on the small size.

Try some changes, edit the cassandra-env.sh file and change

MAX_HEAP_SIZE="2G"
HEAP_NEWSIZE="400M"

You may also want to try:

MAX_HEAP_SIZE="2G"
HEAP_NEWSIZE="800M"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" 
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"

The size of the new heap generally depends on the number of cores available, 
see the commends in the -env file. 

An older discussion about memory use, not that in 1.2 the bloom filters (and 
compression data) are off heap now.
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html  

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 11:06 PM, Joel Samuelsson  wrote:

> You're right, it's probably hard. I should have provided more data.
> 
> I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the 
> log indicates that JNA is working, please correct me if I'm wrong:
> CLibrary.java (line 111) JNA mlockall successful
> 
> Total amount of RAM is 4GB.
> 
> My description of data size was very bad. Sorry about that. Data set size is 
> 12.3 GB per node, compressed.
> 
> Heap size is 998.44MB according to nodetool info. 
> Key cache is 49MB bytes according to nodetool info.
> Row cache size is 0 bytes acoording to nodetool info. 
> Max new heap is 205MB kbytes according to Memory Pool "Par Eden Space" max in 
> jconsole.
> Memtable is left at default which should give it 333MB according to 
> documentation (uncertain where I can verify this).
> 
> Our production cluster seems similar to your dev cluster so possibly 
> increasing the heap to 2GB might help our issues.
> 
> I am still interested in getting rough estimates of how much heap will be 
> needed as data grows. Other than empirical studies how would I go about 
> getting such estimates?
> 
> 
> 2013/4/16 Viktor Jevdokimov 
> How one could provide any help without any knowledge about your cluster, node 
> and environment settings?
> 
>  
> 
> 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 
> 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any 
> overhead (sstable, bloom filters and indexes).
> 
>  
> 
> With ParNew GC time such as yours even if it is a swapping issue I could say 
> only that heap size is too small.
> 
>  
> 
> Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is 
> JNA installed and used? What is total amount of RAM?
> 
>  
> 
> Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB 
> heap without any GC issue with amount of data from 0 to 16GB compressed on 
> each node. Memtable space sized to 100MB, New Heap 400MB.
> 
>  
> 
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
> 
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider
> Take a ride with Adform's Rich Media Suite
> 
>  
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
> 
> From: Joel Samuelsson [mailto:samuelsson.j...@gmail.com] 
> Sent: Tuesday, April 16, 2013 12:52
> To: user@cassandra.apache.org
> Subject: Re: Reduce Cassandra GC
> 
>  
> 
> How do you calculate the heap / data size ratio? Is this a linear ratio?
> 
>  
> 
> Each node has slightly more than 12 GB right now though.
> 
>  
> 
> 2013/4/16 Viktor Jevdokimov 
> 
> For a >40GB of data 1GB of heap is too low.
> 
>  
> 
> Best regards / Pagarbiai
> 
> Viktor Jevdokimov
> 
> Senior Developer
> 
>  
> 
> Email: viktor.jevdoki...@adform.com
> 
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> 
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> 
> Follow us on Twitter: @adforminsider
> 
> Take a ride with Adform's Rich Media Suite
> 
> 
> 
> 
> 
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property o

Re: Key-Token mapping in cassandra

2013-04-17 Thread aaron morton
> CASSANDRA-1034
That ticket is about removing an assumption which was not correct. 

> I would like all keys with "123" as prefix to be mapped to a single token.
Why? 
it's not possible nor desirable IMHO. Tokens are used to identify a single row 
internally. 
 
Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 11:25 PM, Ravikumar Govindarajan 
 wrote:

> We would like to map multiple keys to a single token in cassandra. I believe 
> this should be possible now with CASSANDRA-1034
> 
> Ex:
> 
> Key1 --> 123/IMAGE
> Key2 --> 123/DOCUMENTS
> Key3 --> 123/MULTIMEDIA
> 
> I would like all keys with "123" as prefix to be mapped to a single token.
> 
> Is this possible? What should be the Partitioner that I should most likely 
> extend and write my own to achieve the desired result?
> 
> --
> Ravi



Re: Getting error while inserting data in cassandra table using Java with JDBC

2013-04-17 Thread aaron morton
What version are you using ?
And what JDBC driver ? 

Sounds like the driver is not converting the value to bytes for you. 
 
> I guess the problem may because of undefined 
> key_validation_class,default_validation_class and comparator etc.
If you are using CQL these are not relevant. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 1:31 AM, himanshu.joshi  wrote:

> Hi,
> 
> 
> When I am trying to insert the data into a table using Java with JDBC, I am 
> getting the error
> 
> InvalidRequestException(why:cannot parse 'Jo' as hex bytes)
> 
> My insert quarry is:
> insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10);
> 
> This insert quarry is running successfully from CQLSH command prompt but not 
> from the code
> 
> The quarry I have used to create the table in CQLSH is:
> 
> CREATE TABLE temp (
>  id bigint PRIMARY KEY,
>  dt_stamp timestamp,
>  name text,
>  url_id bigint,
>  value text
> ) WITH
>  bloom_filter_fp_chance=0.01 AND
>  caching='KEYS_ONLY' AND
>  comment='' AND
>  dclocal_read_repair_chance=0.00 AND
>  gc_grace_seconds=864000 AND
>  read_repair_chance=0.10 AND
>  replicate_on_write='true' AND
>  populate_io_cache_on_flush='false' AND
>  compaction={'class': 'SizeTieredCompactionStrategy'} AND
>  compression={'sstable_compression': 'SnappyCompressor'};
> 
> 
> 
> I guess the problem may because of undefined 
> key_validation_class,default_validation_class and comparator etc.
> Is there any way to define these attributes using CQLSH ?
> I have already tried ASSUME command but it also have not resolved the problem.
> 
> I am a beginner in cassandra and need your guidance.
> 
> -- 
> Thanks & Regards,
> Himanshu Joshi
> 



Multi datacenter setup question

2013-04-17 Thread More, Sandeep R
Hello,
My test setup consist of two datacenters DC1 and DC2.
DC2 has a offset of 10 as you can see in the following ring command.

I have two questions:

1)  Let's say in this case I insert a key at DC2 and its token is, let's 
say 85070591730234615865843651857942052874, in this case will it be owned by 
DC2 ? and then replicated on DC1 ? i.e. who owns it.

2)  Notice that the Owns distribution is not even, is this something I 
should be worrying about ?

I am using Cassandra 1.0.12.

Following is the ring command output:

Address DC  RackStatus State   LoadOwns
Token


85070591730234615865843651857942052874
10.0.0.1   DC1 RAC-1   Up Normal  101.73 KB   50.00%  0
10.0.0.2   DC2 RAC-1   Up Normal  92.55 KB0.00%   10
10.0.0.3   DC1 RAC-1   Up Normal  115.09 KB   50.00%  
85070591730234615865843651857942052864
10.0.0.4   DC2 RAC-1   Up Normal  101.62 KB   0.00%   
85070591730234615865843651857942052874




Using an EC2 cluster from the outside.

2013-04-17 Thread maillists0
I have a working 3 node cluster in a single ec2 region and I need to hit it
from our datacenter. As you'd expect, the client gets the internal
addresses of the nodes back.

Someone on irc mentioned using the public IP for rpc and binding that
address to the box. I see that mentioned in an old list mail but I don't
get exactly how this is supposed to work. I could really use either a link
to something with explicit directions or a detailed explanation.

Should cassandra use the public IPs for everything -- listen, b'cast, and
rpc? What should cassandra.yaml look like? Is the idea to use the public
addresses for cassandra but route the requests between nodes over the lan
using nat?

Any help or suggestion is appreciated.


Re: Cassandra Client Recommendation

2013-04-17 Thread Techy Teck
Thanks Aaron for the suggestion. I am not sure, I was able to understand
regarding one node thing you mentioned on the native binary protocol? Can
you please elaborate that?



On Wed, Apr 17, 2013 at 11:21 AM, aaron morton wrote:

> One node on the native binary protocol, AFAIK it's still considered beta
> in 1.2
>
> Also +1 for Astyanax
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/04/2013, at 6:50 PM, Francisco Trujillo 
> wrote:
>
> > Hi
> >
> > We are using Cassandra 1.6 at this moment. We start to work with Hector,
> because it is the first recommendation that you can find in a simple google
> search for java clients Cassandra.
> >
> > We start using Hector but when we start to have non dynamically column
> families, that can be managed using cql, we start to use astyanax because:
> > -  It is easy to understand the code even for people who has
> never worked with Cassandra.
> > -  The cql implementation offer more capabilities
> > -  Astyanax is prepared to use Cql 3 and with hector we
> experienced some problems (probably our fault, but with Astyanax everything
> works from the beginning).
> > -  Astyanax allow to use compound primary keys.
> >
> > In next months we are going to substitute Hector by Astyanax totally but
> at this moment we are using both:
> >
> > -  Astyanax for cql.
> > -  Hector for dynamic column families.
> >
> >
> > From: Techy Teck [mailto:comptechge...@gmail.com]
> > Sent: woensdag 17 april 2013 8:14
> > To: user
> > Subject: Re: Cassandra Client Recommendation
> >
> > Thanks Everton for the suggestion. Couple of questions-
> >
> > 1) Does Astyanax client have any problem with previous version of
> Cassandra?
> > 2) You said one problem, that it will consume more memory? Can you
> elaborate that slightly? What do you mean by that?
> > 3) Does Astyanax supports asynch capabilities?
> >
> >
> > On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima 
> wrote:
> > Hi Techy,
> >
> > We are using Astyanax with cassandra 1.2.4.
> >
> > beneficits:
> >  * It is so easy to configure and use.
> >  * Good wiki
> >  * Mantained by Netflix
> >  * Solution to manage the store of big files (more than 15mb)
> >  * Solution to read all rows efficiently
> >
> > problems:
> >  * It consume more memory
> >
> >
> > 2013/4/16 Techy Teck 
> > Hello,
> >
> > I have recently started working with Cassandra Database. Now I am in the
> process of evaluating which Cassandra client I should go forward with.
> > I am mainly interested in these three-
> >
> > --1)  Astyanax client
> >
> > 2--)  New Datastax client that uses Binary protocol.
> >
> > --3)  Pelops client
> >
> >
> >
> > Can anyone provide some thoughts on this? Some advantages and
> disadvantages for these three will be great start for me.
> >
> >
> >
> > Keeping in mind, we are running Cassandra 1.2.2 in production
> environment.
> >
> >
> > Thanks for the help.
> >
> >
> >
> > --
> > Everton Lima Aleixo
> > Bacharel em Ciência da Computação pela UFG
> > Mestrando em Ciência da Computação pela UFG
> > Programador no LUPA
> >
>
>


Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread aaron morton
If you Hadoop task supplying both a start and finish key for the slice ? You 
probably only want the start. 

Provide the full call stack and the code in your hadoop task. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 1:34 AM, "Hiller, Dean"  wrote:

> What's the stack trace you see?  At the time, I was thinking column scan not 
> row scan as perhaps your code or priam's code was doing a column slice within 
> a row set and the columns are sorted by Integer while priam is passing in 
> UTF8 or vice-versa.  Ie. Do we know if this is a column sorting issue or a 
> row one?
> 
> Dean
> 
> From: Andre Tavares mailto:andre...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Date: Wednesday, April 17, 2013 7:09 AM
> To: "user@cassandra.apache.org" 
> mailto:user@cassandra.apache.org>>
> Subject: Re: InvalidRequestException: Start key's token sorts after end token
> 
> Dean,
> 
> sorry,  but I saw your comments on Stackoverflow 
> (http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr
>  ) just after I sent this message ...
> 
> and I think you may be right about the sort method,  but Priam sets  
> Cassandra partitioner with "RandomPartitioner", and maybe the correct could 
> be "Murmur3Partitioner" when we use Hadoop (I am not sure too) ... if that is 
> true I got a problem because I can't change the partitioner with Priam (I 
> think it only works with RandomPartitioner) ...
> 
> Andre
> 
> 2013/4/17 Hiller, Dean mailto:dean.hil...@nrel.gov>>
> I literally jut replied to your stackoverflow comment then saw this email.  I 
> need the whole stack trace.  My guess is the ColFamily is configured for one 
> sort method where map/reduce is using another or something when querying but 
> that's just a guess.
> 
> Dean
> 
> From: Andre Tavares 
> mailto:andre...@gmail.com>>>
> Reply-To: 
> "user@cassandra.apache.org>"
>  
> mailto:user@cassandra.apache.org>>>
> Date: Wednesday, April 17, 2013 6:47 AM
> To: 
> "user@cassandra.apache.org>"
>  
> mailto:user@cassandra.apache.org>>>
> Subject: InvalidRequestException: Start key's token sorts after end token
> 
> know what exactly this message means a
> 



Re: looking at making astyanax asynchronous but cassandra-thrift-1.1.1 doesn't look right

2013-04-17 Thread aaron morton
Here's an example I did in python a long time ago 
http://www.mail-archive.com/user@cassandra.apache.org/msg04775.html

Call send() then select on the file handle, when it's ready to read call 
recv(). 

Or just add more threads on your side :)

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 2:50 AM, "Hiller, Dean"  wrote:

> Is cassandra-thrift-1.1.1.jar the generated code?  I see a send() and recv() 
> but I don't see a send(Callback cb) that is typicaly of true asynchronous 
> platforms.  Ie. I don't know when to call recv myself obviously if I am 
> trying to make astyanax truly asynchronous.
> 
> The reason I ask is we have a 100k row upload that with synchronous 20 
> threads takes around 30 seconds and with simulation, we predict this would be 
> done in 3 seconds with an asynch api as our threads would not get held up 
> like they do now.  I guess we can try to crank it up to 100 threads to get it 
> running a bit faster for now :( :(.
> 
> Thanks,
> Dean



Re: differences between DataStax Community Edition and Cassandra package

2013-04-17 Thread Robert Coli
On Wed, Apr 17, 2013 at 11:19 AM, aaron morton wrote:

> It's the same as the Apache version, but DSC comes with samples and the
> free version of Ops Centre.
>
>
DSE also comes with Solr special sauce and CDFS.

=Rob


Re: How to stop Cassandra and then restart it in windows?

2013-04-17 Thread aaron morton
> Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
> already in use: 7199; nested exception is:
> java.net.BindException: Address already in use: JVM_Bind
The process is already running, is it installed as a service and was it 
automatically started when the system started ?

either shut it down using the service management or find the process (however 
you do that in windows) and kill it. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 4:26 AM, Raihan Jamal  wrote:

> Hello,
> 
> Can anyone provide any help on this?
> 
> Thanks in advance.
> 
> 
> 
> 
> 
> 
> Raihan Jamal
> 
> 
> On Tue, Apr 16, 2013 at 6:50 PM, Raihan Jamal  wrote:
> Hello,
> 
> I installed single node cluster in my local dev box which is running Windows 
> 7 and it was working fine. Due to some reason, I need to restart my desktop 
> and then after that whenever I am doing like this on the command prompt, it 
> always gives me the below exception-
> 
> S:\Apache Cassandra\apache-cassandra-1.2.3\bin>cassandra -f
> Starting Cassandra Server
> Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
> already in use: 7199; nested exception is:
> java.net.BindException: Address already in use: JVM_Bind
> 
> 
> Meaning port being used somewhere. I have made some changes in cassandra.yaml 
> file so I need to shutdown the Cassandra server and then restart it again.
> 
> Can anybody help me with this?
> 
> Thanks for the help.
> 
> 
> 



Re: Using an EC2 cluster from the outside.

2013-04-17 Thread Robert Coli
On Wed, Apr 17, 2013 at 12:07 PM,  wrote:

> I have a working 3 node cluster in a single ec2 region and I need to hit
> it from our datacenter. As you'd expect, the client gets the internal
> addresses of the nodes back.
>
> Someone on irc mentioned using the public IP for rpc and binding that
> address to the box. I see that mentioned in an old list mail but I don't
> get exactly how this is supposed to work. I could really use either a link
> to something with explicit directions or a detailed explanation.
>
> Should cassandra use the public IPs for everything -- listen, b'cast, and
> rpc? What should cassandra.yaml look like? Is the idea to use the public
> addresses for cassandra but route the requests between nodes over the lan
> using nat?
>
> Any help or suggestion is appreciated.
>

Google "EC2MultiRegionSnitch".

=Rob


Re: Thrift message length exceeded

2013-04-17 Thread aaron morton
Can you reproduce this in a simple way ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 5:50 AM, Lanny Ripple  wrote:

> That was our first thought.  Using maven's dependency tree info we verified 
> that we're using the expected (cass 1.2.3) jars
> 
> $ mvn dependency:tree | grep thrift
> [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
> [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
> 
> I've also dumped the final command run by the hadoop we use (CDH3u5) and 
> verified it's not sneaking thrift in on us.
> 
> 
> On Tue, Apr 16, 2013 at 4:36 PM, aaron morton  wrote:
> Can you confirm the you are using the same thrift version that ships 1.2.3 ? 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/04/2013, at 10:17 AM, Lanny Ripple  wrote:
> 
>> A bump to say I found this
>> 
>>  
>> http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
>> 
>> so others are seeing similar behavior.
>> 
>> From what I can see of org.apache.cassandra.hadoop nothing has changed since 
>> 1.1.5 when we didn't see such things but sure looks like there's a bug 
>> that's slipped in (or been uncovered) somewhere.  I'll try to narrow down to 
>> a dataset and code that can reproduce.
>> 
>> On Apr 10, 2013, at 6:29 PM, Lanny Ripple  wrote:
>> 
>>> We are using Astyanax in production but I cut back to just Hadoop and 
>>> Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
>>> 
>>> We do have some extremely large rows but we went from everything working 
>>> with 1.1.5 to almost everything carping with 1.2.3.  Something has changed. 
>>>  Perhaps we were doing something wrong earlier that 1.2.3 exposed but 
>>> surprises are never welcome in production.
>>> 
>>> On Apr 10, 2013, at 8:10 AM,  wrote:
>>> 
 I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 
 to 0.8
 Turns out the Thrift message really was too long.
 The mystery to me: Why no complaints in previous versions? Were some 
 checks added in Thrift or Hector?
 
 -Original Message-
 From: Lanny Ripple [mailto:la...@spotright.com] 
 Sent: Tuesday, April 09, 2013 6:17 PM
 To: user@cassandra.apache.org
 Subject: Thrift message length exceeded
 
 Hello,
 
 We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran 
 sstableupgrades and got the ring on its feet and we are now seeing a new 
 issue.
 
 When we run MapReduce jobs against practically any table we find the 
 following errors:
 
 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: 
 Loaded the native-hadoop library
 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin : 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.thrift.TException: Message length 
 exceeded: 106
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native M

Re: Multi datacenter setup question

2013-04-17 Thread aaron morton
> 1)  Let’s say in this case I insert a key at DC2 and its token is, let’s 
> say 85070591730234615865843651857942052874, in this case will it be owned by 
> DC2 ? and then replicated on DC1 ? i.e. who owns it.
We don't think in terms of owning the token. 
The token range in the local DC that contains the token is used to find the 
first replica for the row. The same process is used to find the replicas in the 
remote DC's. 

> 2)  Notice that the Owns distribution is not even, is this something I 
> should be worrying about ?
No. I think that's changed in the newer versions. 
 
> I am using Cassandra 1.0.12.

Please use version 1.1 or 1.2. 

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 7:03 AM, "More, Sandeep R"  wrote:

> Hello,
> My test setup consist of two datacenters DC1 and DC2.
> DC2 has a offset of 10 as you can see in the following ring command.
>  
> I have two questions:
> 1)  Let’s say in this case I insert a key at DC2 and its token is, let’s 
> say 85070591730234615865843651857942052874, in this case will it be owned by 
> DC2 ? and then replicated on DC1 ? i.e. who owns it.
> 2)  Notice that the Owns distribution is not even, is this something I 
> should be worrying about ?
>  
> I am using Cassandra 1.0.12.
>  
> Following is the ring command output:
>  
> Address DC  RackStatus State   LoadOwns   
>  Token
>   
>   
> 85070591730234615865843651857942052874
> 10.0.0.1   DC1 RAC-1   Up Normal  101.73 KB   50.00%  0
> 10.0.0.2   DC2 RAC-1   Up Normal  92.55 KB0.00%   10
> 10.0.0.3   DC1 RAC-1   Up Normal  115.09 KB   50.00%  
> 85070591730234615865843651857942052864
> 10.0.0.4   DC2 RAC-1   Up Normal  101.62 KB   0.00%   
> 85070591730234615865843651857942052874
>  
>  



Re: Cassandra Client Recommendation

2013-04-17 Thread aaron morton
Was a typo, should have been "One note on"

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 7:23 AM, Techy Teck  wrote:

> Thanks Aaron for the suggestion. I am not sure, I was able to understand 
> regarding one node thing you mentioned on the native binary protocol? Can you 
> please elaborate that?
> 
> 
> 
> On Wed, Apr 17, 2013 at 11:21 AM, aaron morton  
> wrote:
> One node on the native binary protocol, AFAIK it's still considered beta in 
> 1.2
> 
> Also +1 for Astyanax
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 17/04/2013, at 6:50 PM, Francisco Trujillo  
> wrote:
> 
> > Hi
> >
> > We are using Cassandra 1.6 at this moment. We start to work with Hector, 
> > because it is the first recommendation that you can find in a simple google 
> > search for java clients Cassandra.
> >
> > We start using Hector but when we start to have non dynamically column 
> > families, that can be managed using cql, we start to use astyanax because:
> > -  It is easy to understand the code even for people who has never 
> > worked with Cassandra.
> > -  The cql implementation offer more capabilities
> > -  Astyanax is prepared to use Cql 3 and with hector we experienced 
> > some problems (probably our fault, but with Astyanax everything works from 
> > the beginning).
> > -  Astyanax allow to use compound primary keys.
> >
> > In next months we are going to substitute Hector by Astyanax totally but at 
> > this moment we are using both:
> >
> > -  Astyanax for cql.
> > -  Hector for dynamic column families.
> >
> >
> > From: Techy Teck [mailto:comptechge...@gmail.com]
> > Sent: woensdag 17 april 2013 8:14
> > To: user
> > Subject: Re: Cassandra Client Recommendation
> >
> > Thanks Everton for the suggestion. Couple of questions-
> >
> > 1) Does Astyanax client have any problem with previous version of Cassandra?
> > 2) You said one problem, that it will consume more memory? Can you 
> > elaborate that slightly? What do you mean by that?
> > 3) Does Astyanax supports asynch capabilities?
> >
> >
> > On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima  
> > wrote:
> > Hi Techy,
> >
> > We are using Astyanax with cassandra 1.2.4.
> >
> > beneficits:
> >  * It is so easy to configure and use.
> >  * Good wiki
> >  * Mantained by Netflix
> >  * Solution to manage the store of big files (more than 15mb)
> >  * Solution to read all rows efficiently
> >
> > problems:
> >  * It consume more memory
> >
> >
> > 2013/4/16 Techy Teck 
> > Hello,
> >
> > I have recently started working with Cassandra Database. Now I am in the 
> > process of evaluating which Cassandra client I should go forward with.
> > I am mainly interested in these three-
> >
> > --1)  Astyanax client
> >
> > 2--)  New Datastax client that uses Binary protocol.
> >
> > --3)  Pelops client
> >
> >
> >
> > Can anyone provide some thoughts on this? Some advantages and disadvantages 
> > for these three will be great start for me.
> >
> >
> >
> > Keeping in mind, we are running Cassandra 1.2.2 in production environment.
> >
> >
> > Thanks for the help.
> >
> >
> >
> > --
> > Everton Lima Aleixo
> > Bacharel em Ciência da Computação pela UFG
> > Mestrando em Ciência da Computação pela UFG
> > Programador no LUPA
> >
> 
> 



How to make compaction run faster?

2013-04-17 Thread Jay Svc
Hi Team,



I have a high write traffic to my Cassandra cluster. I experience a very
high number of pending compactions. As I expect higher writes, The pending
compactions keep increasing. Even when I stop my writes it takes several
hours to finishing pending compactions.


My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.


How can I increase the compaction rate so it will run bit faster to match
my write speed?


Your inputs are appreciated.


Thanks,

Jay


Re: How to make compaction run faster?

2013-04-17 Thread Edward Capriolo
three things:
1) compaction throughput is fairly low (yaml nodetool)
2) concurrent compactions is fairly low (yaml)
3) multithreaded compaction might be off in your version

Try raising these things. Otherwise consider option 4.

4)$$$ RAID,RAM wrote:

> Hi Team,
>
>
>
> I have a high write traffic to my Cassandra cluster. I experience a very
> high number of pending compactions. As I expect higher writes, The pending
> compactions keep increasing. Even when I stop my writes it takes several
> hours to finishing pending compactions.
>
>
> My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
> 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.
>
>
> How can I increase the compaction rate so it will run bit faster to match
> my write speed?
>
>
> Your inputs are appreciated.
>
>
> Thanks,
>
> Jay
>
>


Re: How to make compaction run faster?

2013-04-17 Thread Alexis Rodríguez
:D

Jay, check if your disk(s) utilization allows you to change the
configuration the way Edward suggest. iostat -xkcd 1 will show you how much
of your disk(s) are in use.




On Wed, Apr 17, 2013 at 5:26 PM, Edward Capriolo wrote:

> three things:
> 1) compaction throughput is fairly low (yaml nodetool)
> 2) concurrent compactions is fairly low (yaml)
> 3) multithreaded compaction might be off in your version
>
> Try raising these things. Otherwise consider option 4.
>
> 4)$$$ RAID,RAM
>
> On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc  wrote:
>
>> Hi Team,
>>
>>
>>
>> I have a high write traffic to my Cassandra cluster. I experience a very
>> high number of pending compactions. As I expect higher writes, The pending
>> compactions keep increasing. Even when I stop my writes it takes several
>> hours to finishing pending compactions.
>>
>>
>> My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
>> 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.
>>
>>
>> How can I increase the compaction rate so it will run bit faster to match
>> my write speed?
>>
>>
>> Your inputs are appreciated.
>>
>>
>> Thanks,
>>
>> Jay
>>
>>
>


Re: Thrift message length exceeded

2013-04-17 Thread Lanny Ripple
It's slow going finding the time to do so but I'm working on that.

We do have another table that has one or sometimes two columns per row.  We can 
run jobs on it without issue.  I looked through org.apache.cassandra.hadoop 
code and don't see anything that's really changed since 1.1.5 (which was also 
using thrift-0.7) so something of a puzzler about what's going on.


On Apr 17, 2013, at 2:47 PM, aaron morton  wrote:

> Can you reproduce this in a simple way ? 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/04/2013, at 5:50 AM, Lanny Ripple  wrote:
> 
>> That was our first thought.  Using maven's dependency tree info we verified 
>> that we're using the expected (cass 1.2.3) jars
>> 
>> $ mvn dependency:tree | grep thrift
>> [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
>> [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
>> 
>> I've also dumped the final command run by the hadoop we use (CDH3u5) and 
>> verified it's not sneaking thrift in on us.
>> 
>> 
>> On Tue, Apr 16, 2013 at 4:36 PM, aaron morton  
>> wrote:
>> Can you confirm the you are using the same thrift version that ships 1.2.3 ? 
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 16/04/2013, at 10:17 AM, Lanny Ripple  wrote:
>> 
>>> A bump to say I found this
>>> 
>>>  
>>> http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
>>> 
>>> so others are seeing similar behavior.
>>> 
>>> From what I can see of org.apache.cassandra.hadoop nothing has changed 
>>> since 1.1.5 when we didn't see such things but sure looks like there's a 
>>> bug that's slipped in (or been uncovered) somewhere.  I'll try to narrow 
>>> down to a dataset and code that can reproduce.
>>> 
>>> On Apr 10, 2013, at 6:29 PM, Lanny Ripple  wrote:
>>> 
 We are using Astyanax in production but I cut back to just Hadoop and 
 Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
 
 We do have some extremely large rows but we went from everything working 
 with 1.1.5 to almost everything carping with 1.2.3.  Something has 
 changed.  Perhaps we were doing something wrong earlier that 1.2.3 exposed 
 but surprises are never welcome in production.
 
 On Apr 10, 2013, at 8:10 AM,  wrote:
 
> I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 
> to 0.8
> Turns out the Thrift message really was too long.
> The mystery to me: Why no complaints in previous versions? Were some 
> checks added in Thrift or Hector?
> 
> -Original Message-
> From: Lanny Ripple [mailto:la...@spotright.com] 
> Sent: Tuesday, April 09, 2013 6:17 PM
> To: user@cassandra.apache.org
> Subject: Thrift message length exceeded
> 
> Hello,
> 
> We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran 
> sstableupgrades and got the ring on its feet and we are now seeing a new 
> issue.
> 
> When we run MapReduce jobs against practically any table we find the 
> following errors:
> 
> 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: 
> Loaded the native-hadoop library
> 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid 
> exited with exit code 0
> 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using 
> ResourceCalculatorPlugin : 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
> 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error 
> running child
> java.lang.RuntimeException: org.apache.thrift.TException: Message length 
> exceeded: 106
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgre

Re: How to make compaction run faster?

2013-04-17 Thread Jay Svc
Hi Edward,

Thank you for response. I have tried the following -

1. I have tried various compaction throughput ranging from 16M to 1G. CPU
continued to be low and memory between 40% to 50%. I still see compaction
more backing.
2. Does concurrent compactors take effect with Leveled compaction? (yaml
says this parameter has no effect with LCS)
3. I have tried multi-threaded compaction as well I do not see change in
handling the compaction.
4. I have 24 core CPU, have RAID 10 disks and commitlog on SSD and 48GB or
RAM with 8GB Heap.

Thanks,
Jay


On Wed, Apr 17, 2013 at 3:26 PM, Edward Capriolo wrote:

> three things:
> 1) compaction throughput is fairly low (yaml nodetool)
> 2) concurrent compactions is fairly low (yaml)
> 3) multithreaded compaction might be off in your version
>
> Try raising these things. Otherwise consider option 4.
>
> 4)$$$ RAID,RAM
>
> On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc  wrote:
>
>> Hi Team,
>>
>>
>>
>> I have a high write traffic to my Cassandra cluster. I experience a very
>> high number of pending compactions. As I expect higher writes, The pending
>> compactions keep increasing. Even when I stop my writes it takes several
>> hours to finishing pending compactions.
>>
>>
>> My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
>> 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.
>>
>>
>> How can I increase the compaction rate so it will run bit faster to match
>> my write speed?
>>
>>
>> Your inputs are appreciated.
>>
>>
>> Thanks,
>>
>> Jay
>>
>>
>


Re: How to make compaction run faster?

2013-04-17 Thread Jay Svc
Hi Alexis,

Thank you for your response.

My commit log is on SSD. which shows me 30 to 40 ms of disk latency.

When I ran iostat; I see "await" 26ms to 30 ms for my commit log disk. My
CPU is less than 18% used.

How I reduce the disk latency for my commit log disk. They are SSDs.

Thank you in advance,
Jay


On Wed, Apr 17, 2013 at 3:58 PM, Alexis Rodríguez <
arodrig...@inconcertcc.com> wrote:

> :D
>
> Jay, check if your disk(s) utilization allows you to change the
> configuration the way Edward suggest. iostat -xkcd 1 will show you how much
> of your disk(s) are in use.
>
>
>
>
> On Wed, Apr 17, 2013 at 5:26 PM, Edward Capriolo wrote:
>
>> three things:
>> 1) compaction throughput is fairly low (yaml nodetool)
>> 2) concurrent compactions is fairly low (yaml)
>> 3) multithreaded compaction might be off in your version
>>
>> Try raising these things. Otherwise consider option 4.
>>
>> 4)$$$ RAID,RAM>
>>
>> On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc  wrote:
>>
>>> Hi Team,
>>>
>>>
>>>
>>> I have a high write traffic to my Cassandra cluster. I experience a very
>>> high number of pending compactions. As I expect higher writes, The pending
>>> compactions keep increasing. Even when I stop my writes it takes several
>>> hours to finishing pending compactions.
>>>
>>>
>>> My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
>>> 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.
>>>
>>>
>>> How can I increase the compaction rate so it will run bit faster to
>>> match my write speed?
>>>
>>>
>>> Your inputs are appreciated.
>>>
>>>
>>> Thanks,
>>>
>>> Jay
>>>
>>>
>>
>


Re: How to make compaction run faster?

2013-04-17 Thread Alexis Rodríguez
Jay,

I believe that compaction occurs on the data directories and not in the
commitlog.

http://wiki.apache.org/cassandra/MemtableSSTable




On Wed, Apr 17, 2013 at 7:58 PM, Jay Svc  wrote:

> Hi Alexis,
>
> Thank you for your response.
>
> My commit log is on SSD. which shows me 30 to 40 ms of disk latency.
>
> When I ran iostat; I see "await" 26ms to 30 ms for my commit log disk. My
> CPU is less than 18% used.
>
> How I reduce the disk latency for my commit log disk. They are SSDs.
>
> Thank you in advance,
> Jay
>
>
> On Wed, Apr 17, 2013 at 3:58 PM, Alexis Rodríguez <
> arodrig...@inconcertcc.com> wrote:
>
>> :D
>>
>> Jay, check if your disk(s) utilization allows you to change the
>> configuration the way Edward suggest. iostat -xkcd 1 will show you how much
>> of your disk(s) are in use.
>>
>>
>>
>>
>> On Wed, Apr 17, 2013 at 5:26 PM, Edward Capriolo 
>> wrote:
>>
>>> three things:
>>> 1) compaction throughput is fairly low (yaml nodetool)
>>> 2) concurrent compactions is fairly low (yaml)
>>> 3) multithreaded compaction might be off in your version
>>>
>>> Try raising these things. Otherwise consider option 4.
>>>
>>> 4)$$$ RAID,RAM>>
>>>
>>> On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc  wrote:
>>>
 Hi Team,



 I have a high write traffic to my Cassandra cluster. I experience a
 very high number of pending compactions. As I expect higher writes, The
 pending compactions keep increasing. Even when I stop my writes it takes
 several hours to finishing pending compactions.


 My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.


 How can I increase the compaction rate so it will run bit faster to
 match my write speed?


 Your inputs are appreciated.


 Thanks,

 Jay


>>>
>>
>


Re: How to stop Cassandra and then restart it in windows?

2013-04-17 Thread Raihan Jamal
When I first started cassandra, I started as-

Cassandra -f

So I believe that's why it is getting started as service. Whenever I reboot
my machine, Cassandra is up always. I am not able to find that process in
windows to shut it down. I tried finding 7199 port but not able to find
that one out.

In the service management also, I am not able to figure out what service I
need to stop. I cannot find any service related to Cassandra.

Any thoughts?






*Raihan Jamal*


On Wed, Apr 17, 2013 at 12:43 PM, aaron morton wrote:

>  Error: Exception thrown by the agent : java.rmi.server.ExportException:
>> Port already in use: 7199; nested exception is:
>> java.net.BindException: Address already in use: JVM_Bind
>>
> The process is already running, is it installed as a service and was it
> automatically started when the system started ?
>
> either shut it down using the service management or find the process
> (however you do that in windows) and kill it.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/04/2013, at 4:26 AM, Raihan Jamal  wrote:
>
> Hello,
>
> Can anyone provide any help on this?
>
> Thanks in advance.
>
>
>
>
>
>
> *Raihan Jamal*
>
>
> On Tue, Apr 16, 2013 at 6:50 PM, Raihan Jamal wrote:
>
>> Hello,
>>
>> I installed single node cluster in my local dev box which is running
>> Windows 7 and it was working fine. Due to some reason, I need to restart my
>> desktop and then after that whenever I am doing like this on the command
>> prompt, it always gives me the below exception-
>>
>> S:\Apache Cassandra\apache-cassandra-1.2.3\bin>cassandra -f
>> Starting Cassandra Server
>> Error: Exception thrown by the agent : java.rmi.server.ExportException:
>> Port already in use: 7199; nested exception is:
>> java.net.BindException: Address already in use: JVM_Bind
>>
>>
>> Meaning port being used somewhere. I have made some changes in 
>> *cassandra.yaml
>> *file so I need to shutdown the Cassandra server and then restart it
>> again.
>>
>> Can anybody help me with this?
>>
>> Thanks for the help.
>>
>>
>>
>
>


Re: Key-Token mapping in cassandra

2013-04-17 Thread Ravikumar Govindarajan
Thanks Aaron.
 We are looking at co-locating all keys for a given user in one Cassandra
node.
Are there any other ways to achieve this

--
Ravi

On Thursday, April 18, 2013, aaron morton wrote:

> CASSANDRA-1034
>
> That ticket is about removing an assumption which was not correct.
>
> I would like all keys with "123" as prefix to be mapped to a single token.
>
> Why?
> it's not possible nor desirable IMHO. Tokens are used to identify a single
> row internally.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/04/2013, at 11:25 PM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com  'ravikumar.govindara...@gmail.com');>> wrote:
>
> We would like to map multiple keys to a single token in cassandra. I
> believe this should be possible now with CASSANDRA-1034
>
> Ex:
>
> Key1 --> 123/IMAGE
> Key2 --> 123/DOCUMENTS
> Key3 --> 123/MULTIMEDIA
>
> I would like all keys with "123" as prefix to be mapped to a single token.
>
> Is this possible? What should be the Partitioner that I should most likely
> extend and write my own to achieve the desired result?
>
> --
> Ravi
>
>
>


Re: Using an EC2 cluster from the outside.

2013-04-17 Thread Ben Bromhead
Depending on your client, disable automatic client discovery and just specify a 
list of all your nodes in your client configuration.

For more details check out 
http://xzheng.net/blogs/problem-when-connecting-to-cassandra-with-ruby/ , 
obviously this deals specifically with a ruby client but it should be 
applicable to others.

Cheers

Ben
Instaclustr | www.instaclustr.com | @instaclustr



On 18/04/2013, at 5:43 AM, Robert Coli  wrote:

> On Wed, Apr 17, 2013 at 12:07 PM,  wrote:
> I have a working 3 node cluster in a single ec2 region and I need to hit it 
> from our datacenter. As you'd expect, the client gets the internal addresses 
> of the nodes back. 
> 
> Someone on irc mentioned using the public IP for rpc and binding that address 
> to the box. I see that mentioned in an old list mail but I don't get exactly 
> how this is supposed to work. I could really use either a link to something 
> with explicit directions or a detailed explanation. 
> 
> Should cassandra use the public IPs for everything -- listen, b'cast, and 
> rpc? What should cassandra.yaml look like? Is the idea to use the public 
> addresses for cassandra but route the requests between nodes over the lan 
> using nat? 
> 
> Any help or suggestion is appreciated. 
> 
> Google "EC2MultiRegionSnitch".
> 
> =Rob



[no subject]

2013-04-17 Thread Ertio Lew
I run cassandra on single win 8 machine for development needs. Everything
has been working fine for  several months but just today I saw this error
message in cassandra logs & all host pools were marked down.



ERROR 08:40:42,684 Error occurred during processing of message.
java.lang.StringIndexOutOfBoundsException: String index out of range:
-214741811
1
at java.lang.String.checkBounds(String.java:397)
at java.lang.String.(String.java:442)
at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
.java:339)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
a.java:18958)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(
Cassandra.java:3441)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
a:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)


After restarting the server everything again worked fine.
I am curious to know what is this related to. Is this caused due to my
application putting any corrupted data?


Failed shuffle

2013-04-17 Thread David McNelis
I had a situation earlier where my shuffle failed after a hard disk drive
filled up.  I went through and disabled shuffle on the machines while
trying to get the situation resolved.  Now, while I can re-enable shuffle
on the machines, when trying to do an ls, I get a timeout.

Looking at the cassandra-shuffle code, it is trying execute this query:

SELECT token_bytes,requested_at FROM system.range_xfers

which is throwing the following error in my logs:

java.lang.AssertionError: [min(-1),max(-219851097003960625)]
at org.apache.cassandra.dht.Bounds.(Bounds.java:41)
at org.apache.cassandra.dht.Bounds.(Bounds.java:34)
at org.apache.cassandra.dht.Bounds.withNewRight(Bounds.java:121)
at
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1172)
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:132)
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:62)
at
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:132)
at
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:143)
at
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1726)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4074)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4062)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)


So this causes me two major issues, first, I can't restart my dead node
because it ends up with a Concurrency exception while trying to find
relocating tokens during StorageService initialization, and I can't clear
the moves because nothing is able to read what is in that range_xfers table
(at least, I also was not able to read it through cqlsh).

I thought I could recreate the table, but system is a restricted keyspace
and it looks like I can't drop and recreate that table, and cql requires a
key for delete... and since you can't get the key without getting an
error

Is there something simple I can do that I'm just missing right now?  Right
now I can't restart nodes because of this, nor sucessfully add new nodes to
my ring.


Re: Getting error while inserting data in cassandra table using Java with JDBC

2013-04-17 Thread himanshu.joshi


On 04/18/2013 12:06 AM, aaron morton wrote:

What version are you using ?
And what JDBC driver ?

Sounds like the driver is not converting the value to bytes for you.
I guess the problem may because of undefined 
key_validation_class,default_validation_class and comparator etc.

If you are using CQL these are not relevant.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 1:31 AM, himanshu.joshi > wrote:



Hi,


When I am trying to insert the data into a table using Java with 
JDBC, I am getting the error


InvalidRequestException(why:cannot parse 'Jo' as hex bytes)

My insert quarry is:
insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10);

This insert quarry is running successfully from CQLSH command prompt 
but not from the code


The quarry I have used to create the table in CQLSH is:

CREATE TABLE temp (
 id bigint PRIMARY KEY,
 dt_stamp timestamp,
 name text,
 url_id bigint,
 value text
) WITH
 bloom_filter_fp_chance=0.01 AND
 caching='KEYS_ONLY' AND
 comment='' AND
 dclocal_read_repair_chance=0.00 AND
 gc_grace_seconds=864000 AND
 read_repair_chance=0.10 AND
 replicate_on_write='true' AND
 populate_io_cache_on_flush='false' AND
 compaction={'class': 'SizeTieredCompactionStrategy'} AND
 compression={'sstable_compression': 'SnappyCompressor'};



I guess the problem may because of undefined 
key_validation_class,default_validation_class and comparator etc.

Is there any way to define these attributes using CQLSH ?
I have already tried ASSUME command but it also have not resolved the 
problem.


I am a beginner in cassandra and need your guidance.

--
Thanks & Regards,
Himanshu Joshi




Hi Aaron,

The problem is resolved now as I upgraded the version of JDBC to 1.2.2
Earlier I was using JDBC version 1.1.6 with Cassandra 1.2.2

Thanks for your guidance.

--
Thanks & Regards,
Himanshu Joshi



Re: Repair hanges on 1.1.4

2013-04-17 Thread adeel . akbar

Hi Aaron,

Thank you for your feedback. I have also installed DataStax OPS center  
and its nothing shows progress of repair. Previously every repair  
progress also shown on OPS center and once it 100%, reapir also  
completed on nodes. but now reapir is in progress on node but OPS  
center nothing shows. Secondly please find netstats and  
compactionstats results as under;


# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost netstats
Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 05327870
Responses   n/a 0  163271943

# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost compactionstats
pending tasks: 0
Active compaction remaining time :n/a

Regards,

Adeel Akbar

Quoting aaron morton :

The errors from Hints are not concerned with repair. Increasing the   
rpc_timeout may help with those. If it's logging about 0 hints you   
may be seeing this   
https://issues.apache.org/jira/browse/CASSANDRA-5068


How did repair hang ? Check for progress with nodetool   
compactionstats and nodetool netstats.


Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 3:01 AM, Alexis Rodríguez   
 wrote:



Adeel,

It may be a problem in the remote node, could you check the system.log?

Also you might want to check the rpc_timeout_in_ms in both nodes,   
maybe an increase in this parameter helps.






On Fri, Apr 12, 2013 at 9:17 AM,  wrote:
Hi,

I have started repair on newly added node with -pr and this nodes   
exist on another data center. I have 5MB internet connection and   
configured setstreamthroughput 1. After some time repair goes hang   
and following meesage found in logs;


# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
Address DC  RackStatus State   Load  
Effective-Ownership Token
 
169417178424467235000914166253263322299
10.0.0.3DC1 RAC1Up Normal  93.26 GB  
66.67%  0
10.0.0.4DC1 RAC1Up Normal  89.1 GB   
66.67%  56713727820156410577229101238628035242
10.0.0.15   DC1 RAC1Up Normal  72.87 GB  
66.67%  113427455640312821154458202477256070484
10.40.1.103 DC2 RAC1Up Normal  48.59 GB  
100.00% 169417178424467235000914166253263322299



 INFO [HintedHandoff:1] 2013-04-12 17:05:49,411   
HintedHandOffManager.java (line 372) Timed out replaying hints to   
/10.40.1.103; aborting further deliveries
 INFO [HintedHandoff:1] 2013-04-12 17:05:49,411   
HintedHandOffManager.java (line 390) Finished hinted handoff of 0   
rows to endpoint /10.40.1.103


Why we getting this message and how I prevent repair from this error.

Regards,

Adeel Akbar