Re: New web client & future API
How to download it? Your "Download war-file" open just blank page :( On 14/06/2011, Markus Wiesenbacher | Codefreun.de wrote: > I just released an early version of my web client > (http://www.codefreun.de/apollo) which is Thrift-based, and therefore I > would like to know what the future is ...
Re: New web client & future API
Should work now ... Von meinem iPhone gesendet Am 20.06.2011 um 09:28 schrieb "Andrey V. Panov" : > How to download it? > Your "Download war-file" open just blank page :( > > On 14/06/2011, Markus Wiesenbacher | Codefreun.de wrote: > >> I just released an early version of my web client >> (http://www.codefreun.de/apollo) which is Thrift-based, and therefore I >> would like to know what the future is ...
Re: New web client & future API
I just took a look at the demo. This is really great stuff! I will try this on our cluster as soon as possible. I like this because it allows people not too familiar with the cassandra CLI or Thrift a way to query cassandra data. On Jun 20, 2011, at 10:56 AM, Markus Wiesenbacher | Codefreun.de wrote: > Should work now ... > > Von meinem iPhone gesendet > > Am 20.06.2011 um 09:28 schrieb "Andrey V. Panov" : > >> How to download it? >> Your "Download war-file" open just blank page :( >> >> On 14/06/2011, Markus Wiesenbacher | Codefreun.de wrote: >> >>> I just released an early version of my web client >>> (http://www.codefreun.de/apollo) which is Thrift-based, and therefore I >>> would like to know what the future is ...
RE: New web client & future API
There is one more web client available for caasndra UI. http://code.google.com/p/cassui/ cassui.war is available for download. Vivek -Original Message- From: Jonathan Colby [mailto:jonathan.co...@gmail.com] Sent: Monday, June 20, 2011 3:50 PM To: user@cassandra.apache.org Subject: Re: New web client & future API I just took a look at the demo. This is really great stuff! I will try this on our cluster as soon as possible. I like this because it allows people not too familiar with the cassandra CLI or Thrift a way to query cassandra data. On Jun 20, 2011, at 10:56 AM, Markus Wiesenbacher | Codefreun.de wrote: > Should work now ... > > Von meinem iPhone gesendet > > Am 20.06.2011 um 09:28 schrieb "Andrey V. Panov" : > >> How to download it? >> Your "Download war-file" open just blank page :( >> >> On 14/06/2011, Markus Wiesenbacher | Codefreun.de wrote: >> >>> I just released an early version of my web client >>> (http://www.codefreun.de/apollo) which is Thrift-based, and >>> therefore I would like to know what the future is ... Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud ‘. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
[ANN] Mojo's Cassandra Maven Plugin 0.8.0-1 released
Hi, The Mojo team is pleased to announce the release of Mojo's Cassandra Maven Plugin version 0.8.0-1. Mojo's Cassandra Plugin is used when you want to install and control a test instance of Apache Cassandra from within your Apache Maven build. The plugin has the following goals. * cassandra:start Starts up a test instance of Cassandra in the background. * cassandra:stop Stops the test instance of Cassandra that was started using cassandra:start. * cassandra:start-cluster Starts up a test cluster of Cassandra in the background bound to the local loopback IP addresses 127.0.0.1, 127.0.0.2, etc. * cassandra:stop Stops the test cluster of Cassandra that was started using cassandra:start. * cassandra:run Starts up a test instance of Cassandra in the foreground. * cassandra:load Runs a cassandra-cli script against the test instance of Cassandra. * cassandra:repair Runs nodetool repair against the test instance of Cassandra. * cassandra:flush Runs nodetool flush against the test instance of Cassandra. * cassandra:compact Runs nodetool compact against the test instance of Cassandra. * cassandra:cleanup Runs nodetool cleanup against the test instance of Cassandra. * cassandra:delete Deletes the the test instance of Cassandra. http://mojo.codehaus.org/cassandra-maven-plugin/ To use this version, simply specify the version in your project's plugin configuration: org.codehaus.mojo cassandra-maven-plugin 0.8.0-1 Release Notes - Mojo's Cassandra Maven Plugin - Version 0.8.0-1 ** Improvement * [MCASSANDRA-8] - Document the availability of stopKey/stopPort configuration parameters (and possibly allow just stopPort to be defined) ** New Feature * [MCASSANDRA-9] - Patch to support Cassandra 0.8.0-rc1 * [MCASSANDRA-10] - Add support for local clusters * [MCASSANDRA-11] - Upgrade to Cassandra 0.8.0 Enjoy, The Mojo team. Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are trademarks of The Apache Software Foundation.
Re: Cassandra Clients for Java
Thank you all for your answers! It hard to tell the good projects from the not so good projects and it seems that the choice is really between Hector and Pelops. On average, most of the people are using Hector or Pelops and most of the names starts with Da (sorry, couldn't help :)). Dan Washusen asked to comment more regarding Hector's e Pelops' APIs. Lets start with what cassandra gives us: get, insert, remove for simple operations; get_slice to read an entire row (or something like that); multiget_slice and batch_mutate for reading or writing multiple data with a single call. So we could start from this and we would be speaking Cassandra-nease. One CassandraClient class exporting all those methods, with the necessary model classes on the package. All of those methods have small details we don't want to have to work out every time, like timestamp (now is a perfect default value), consistency level (the client could have default values) and string encoding (UTF8 could be a default, but the CassandraClient and the model classes should handle it themselves). So, at this point we would have an API similar to Thrift's but a little less boring to use. With this API we could do failover, load balancing, auto node discovery, and still be speaking Cassandra-nease. But them writing methods to read from Cassandra to create models and to persist models would be a pretty time wasting task. So them we would move to an JPA implementation doing this automagically, like Hector did (thanks for showing this up). The JPA implementation (I have no idea how to write something like this, although I would like to learn), would use our simplified thrift-like interface. In the end: no third semantics, still have direct access to Cassandra to deal with special cases and for most of the cases you have JPA, still have failover and load balancing. How many dependencies do we need besides what Cassandra already requires: zero. Simple, powerful. I agree that I'm not showing a proof-of-concept or anything but it is a good starting vision for a client. Of course there are corner cases and details to work out but, what do you think? With Hector, from their Getting Started page, first you initialize a Cluster, them you create a Keyspace from that cluster, them, wait, you have to create a template for your column family. With that template you can create an updater to do an insert (in Cassandra-nease), or you can query columns to do an get_slice (in Cassandra-nease), or delete column to do a remove (in Cassandra-nease). You can clearly see a third semantics here. With Pelops their documentation on their main website (at GitHub) seems very lacking and I couldn't understand if you have to create a mutator to every write or not and why they assigned strings and not objects to organize connections. They have this pool thing related to the fact that they use strings to organize connections. In the end I think Hector should be a better choice because of their JPA implementation and because Pelops doesn't seem to have documentation. Thanks again for showing the JPA thing up, I really couldn't find anything linking to it. Not even their User guide. Thank you very much for all for the answers. Best, Dani On Sat, Jun 18, 2011 at 4:04 PM, Rajesh Koilpillai < rajesh.koilpil...@gmail.com> wrote: > +1 to Hector (especially with the changes made in the latest version of > their API) > > > On Sun, Jun 19, 2011 at 12:01 AM, Steve Willcox wrote: > >> I'm using Hector. >> >> The main contributor Nate McCall is very active and responsive to any >> issues. The Hector community is very active. >> >> I've been using Java for a long time and I disagree that the client is >> more complex than the underlying Thrift client. The latest version of Hector >> has made large gains in simplifying the API. It has connection caching, load >> balancing and failover build into its client. >> >> I found it easy to use and stable. My code has been in production since >> April 2011 and we've not had one Hector issue yet. >> >> Hope that helps >> >> Steve W. >> >> On Fri, Jun 17, 2011 at 4:02 PM, Daniel Colchete wrote: >> >>> Good day everyone! >>> >>> I'm getting started with a new project and I'm thinking about using >>> Cassandra because of its distributed quality and because of its performance. >>> >>> I'm using Java on the back-end. There are many many things being said >>> about the Java high level clients for Cassandra on the web. To be frank, I >>> see problems with all of the java clients. For example, Hector and >>> Scale7-pelops have new semantics on them that are neither Java's or >>> Cassandra's, and I don't see much gain from it apart from the fact that it >>> is more complex. Also, I was hoping to go with something that was annotation >>> based so that it wouldn't be necessary to write boilerplate code (again, no >>> gain). >>> >>> Demoiselle Cassandra seems to be one option but I couldn't find a >>> download for it. I'm new to Java in the back-end and
RE: MemoryMeter uninitialized (jamm not specified as java agent)
Thanks, the java agent option is missing in the Cassandra.bat file https://issues.apache.org/jira/browse/CASSANDRA-2787 From: aaron morton [mailto:aa...@thelastpickle.com] Sent: zondag 19 juni 2011 21:33 To: user@cassandra.apache.org Subject: Re: MemoryMeter uninitialized (jamm not specified as java agent) What do you get for $ java -version java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode) Also you can check if the wrapper has correctly detected things with ps aux | grep javaagent The args to the java process should include -javaagent:bin/../lib/jamm-0.2.2.jar Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 17 Jun 2011, at 22:18, Rene Kochen wrote: Since using cassandra 0.8, I see the following warning: WARN 12:05:59,807 MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead I'am using Sun JRE. What can I do to resolve this? What are the consequences of this warning? Thanx, Rene
pig integration & NoClassDefFoundError TypeParser
Been trying for the past little bit to try and get the PIG integration working with Cassandra 0.8.0 1. Downloaded the src for 0.8.0 and ran ant build 2. went into contrib/pig and ran ant ... gives me: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar and is copied into the lib/ directory 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar - I did try to run it with Jackson 1.4 as the contrib/pig/README.txt suggested, but that failed... The referenced JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same results) Environment variables are set: java version "1.6.0_24" PIG_INITIAL_ADDRESS=localhost PIG_HOME=/usr/local/src/pig-0.8.1 PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner PIG_RPC_PORT=9160 CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src I then start up cassandra ... no issues. I connect and create a new keyspace called foo with a column family called bar and a CF called foo...Inside the CF bar, I create a few rows, with random columns 4 Rows. >From contrib/pig I run: bin/pig_cassandra -x local ... immediately get the error: [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then *** Problem here is that $PIG_JAR is a reference to two files ... pig-0.8.1-core.jar & pig.jar ... Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar Try again to run: bin/pig_cassandra -x local and everything loads up nicely: 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log 2011-06-21 02:07:23,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register /usr/local/src/pig-0.8.1/pig.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar; grunt> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage(); grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage(); 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-06-21 02:04:53,324 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-06-21 02:04:53,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1 Operator Key: scope-1) 2011-06-21 02:04:53,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2011-06-21 02:04:53,480 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:53,494 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:53,494 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-06-21 02:04:53,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-06-21 02:04:59,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-06-21 02:04:59,718 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:59,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-06-21 02:04:59,948 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:59,960 [Thread-5] INFO org.apache.hadoop.metrics.jvm.J
Re: pig integration & NoClassDefFoundError TypeParser
Try running with cdh3u0 version of pig and see if it has the same problem. They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro. The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz Alternatively, I believe today brisk beta 2 will be out which has pig integrated. Not sure if that would work for your current environment though. See if that works. On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote: > Been trying for the past little bit to try and get the PIG integration > working with Cassandra 0.8.0 > > 1. Downloaded the src for 0.8.0 and ran ant build > 2. went into contrib/pig and ran ant ... gives me: > /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar > and is copied into the lib/ directory > 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so > that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me > two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar > - I did try to run it with Jackson 1.4 as the > contrib/pig/README.txt suggested, but that failed... The referenced > JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same > results) > > Environment variables are set: > java version "1.6.0_24" > > PIG_INITIAL_ADDRESS=localhost > PIG_HOME=/usr/local/src/pig-0.8.1 > PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner > PIG_RPC_PORT=9160 > CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src > > I then start up cassandra ... no issues. I connect and create a new > keyspace called foo with a column family called bar and a CF called > foo...Inside the CF bar, I create a few rows, with random columns > 4 Rows. > > From contrib/pig I run: bin/pig_cassandra -x local ... immediately > get the error: > > [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator > > -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then > > *** Problem here is that $PIG_JAR is a reference to two files ... > pig-0.8.1-core.jar & pig.jar ... > > Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or > even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar > > Try again to run: bin/pig_cassandra -x local and everything loads up nicely: > > 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging > error messages to: > /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log > 2011-06-21 02:07:23,778 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting to hadoop file system at: file:/// > grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register > /usr/local/src/pig-0.8.1/pig.jar; register > /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar; > register > /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar; > register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar; > grunt> > grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage(); > grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage(); > 2011-06-21 02:04:53,271 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the > script: UNKNOWN > 2011-06-21 02:04:53,271 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > pig.usenewlogicalplan is set to true. New logical plan will be used. > 2011-06-21 02:04:53,324 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics > with processName=JobTracker, sessionId= > 2011-06-21 02:04:53,447 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1 > Operator Key: scope-1) > 2011-06-21 02:04:53,458 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler > - File concatenation threshold: 100 optimistic? false > 2011-06-21 02:04:53,477 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > 2011-06-21 02:04:53,477 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > 2011-06-21 02:04:53,480 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM > Metrics with processName=JobTracker, sessionId= - already initialized > 2011-06-21 02:04:53,494 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM > Metrics with processName=JobTracker, sessionId= - already initialized > 2011-06-21 02:04:53,494 [main] INFO > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are > added to the job > 2011-06-21 02:04:53,556 [main] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to > default 0.
Re: pig integration & NoClassDefFoundError TypeParser
Hi ... I still have the same problem with pig-0.8.0-cdh3u0... Maybe I'm doing something wrong. Where does org/apache/cassandra/db/marshal/TypeParser exist, or should exist? It's not in the $CASSANDRA_HOME/libs or /usr/local/src/pig-0.8.0-cdh3u0/lib or /usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars for jar in `ls *.jar` do jar -tf $jar | grep TypeParser if [ $? -eq 0 ]; then echo $jar fi done Shows me nothing in all the lib dirs On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna wrote: > Try running with cdh3u0 version of pig and see if it has the same problem. > They backported the patch (to pig 0.9 which should be out in time for the > hadoop summit next week) that adds the updated jackson dependency for avro. > The download URL for that is - > http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz > > Alternatively, I believe today brisk beta 2 will be out which has pig > integrated. Not sure if that would work for your current environment though. > > See if that works. > On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote: > >> Been trying for the past little bit to try and get the PIG integration >> working with Cassandra 0.8.0 >> >> 1. Downloaded the src for 0.8.0 and ran ant build >> 2. went into contrib/pig and ran ant ... gives me: >> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar >> and is copied into the lib/ directory >> 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so >> that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me >> two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar >> - I did try to run it with Jackson 1.4 as the >> contrib/pig/README.txt suggested, but that failed... The referenced >> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same >> results) >> >> Environment variables are set: >> java version "1.6.0_24" >> >> PIG_INITIAL_ADDRESS=localhost >> PIG_HOME=/usr/local/src/pig-0.8.1 >> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner >> PIG_RPC_PORT=9160 >> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src >> >> I then start up cassandra ... no issues. I connect and create a new >> keyspace called foo with a column family called bar and a CF called >> foo...Inside the CF bar, I create a few rows, with random columns >> 4 Rows. >> >> From contrib/pig I run: bin/pig_cassandra -x local ... immediately >> get the error: >> >> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator >> >> -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then >> >> *** Problem here is that $PIG_JAR is a reference to two files ... >> pig-0.8.1-core.jar & pig.jar ... >> >> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or >> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar >> >> Try again to run: bin/pig_cassandra -x local and everything loads up nicely: >> >> 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging >> error messages to: >> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log >> 2011-06-21 02:07:23,778 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> Connecting to hadoop file system at: file:/// >> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register >> /usr/local/src/pig-0.8.1/pig.jar; register >> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar; >> register >> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar; >> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar; >> grunt> >> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage(); >> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage(); >> 2011-06-21 02:04:53,271 [main] INFO >> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the >> script: UNKNOWN >> 2011-06-21 02:04:53,271 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> pig.usenewlogicalplan is set to true. New logical plan will be used. >> 2011-06-21 02:04:53,324 [main] INFO >> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics >> with processName=JobTracker, sessionId= >> 2011-06-21 02:04:53,447 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1 >> Operator Key: scope-1) >> 2011-06-21 02:04:53,458 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler >> - File concatenation threshold: 100 optimistic? false >> 2011-06-21 02:04:53,477 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >> - MR plan size before optimization: 1 >> 2011-06-21 02:04:53,477 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer >> - MR plan size after optimization: 1 >> 2011-06-21 02:04:53,480 [main] INFO >> org.apache.hadoop.metr
Re: pig integration & NoClassDefFoundError TypeParser
hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required. On Jun 20, 2011, at 2:05 PM, Sasha Dolgy wrote: > Hi ... I still have the same problem with pig-0.8.0-cdh3u0... > > Maybe I'm doing something wrong. Where does > org/apache/cassandra/db/marshal/TypeParser exist, or should exist? > > It's not in the $CASSANDRA_HOME/libs or > /usr/local/src/pig-0.8.0-cdh3u0/lib or > /usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars > > > for jar in `ls *.jar` > do > jar -tf $jar | grep TypeParser > if [ $? -eq 0 ]; then > echo $jar > fi > done > > Shows me nothing in all the lib dirs > > > > On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna > wrote: >> Try running with cdh3u0 version of pig and see if it has the same problem. >> They backported the patch (to pig 0.9 which should be out in time for the >> hadoop summit next week) that adds the updated jackson dependency for avro. >> The download URL for that is - >> http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz >> >> Alternatively, I believe today brisk beta 2 will be out which has pig >> integrated. Not sure if that would work for your current environment though. >> >> See if that works. >> On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote: >> >>> Been trying for the past little bit to try and get the PIG integration >>> working with Cassandra 0.8.0 >>> >>> 1. Downloaded the src for 0.8.0 and ran ant build >>> 2. went into contrib/pig and ran ant ... gives me: >>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar >>> and is copied into the lib/ directory >>> 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so >>> that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me >>> two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar >>> - I did try to run it with Jackson 1.4 as the >>> contrib/pig/README.txt suggested, but that failed... The referenced >>> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same >>> results) >>> >>> Environment variables are set: >>> java version "1.6.0_24" >>> >>> PIG_INITIAL_ADDRESS=localhost >>> PIG_HOME=/usr/local/src/pig-0.8.1 >>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner >>> PIG_RPC_PORT=9160 >>> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src >>> >>> I then start up cassandra ... no issues. I connect and create a new >>> keyspace called foo with a column family called bar and a CF called >>> foo...Inside the CF bar, I create a few rows, with random columns >>> 4 Rows. >>> >>> From contrib/pig I run: bin/pig_cassandra -x local ... immediately >>> get the error: >>> >>> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator >>> >>> -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then >>> >>> *** Problem here is that $PIG_JAR is a reference to two files ... >>> pig-0.8.1-core.jar & pig.jar ... >>> >>> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or >>> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar >>> >>> Try again to run: bin/pig_cassandra -x local and everything loads up >>> nicely: >>> >>> 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging >>> error messages to: >>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log >>> 2011-06-21 02:07:23,778 [main] INFO >>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >>> Connecting to hadoop file system at: file:/// >>> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register >>> /usr/local/src/pig-0.8.1/pig.jar; register >>> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar; >>> register >>> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar; >>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar; >>> grunt> >>> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage(); >>> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage(); >>> 2011-06-21 02:04:53,271 [main] INFO >>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the >>> script: UNKNOWN >>> 2011-06-21 02:04:53,271 [main] INFO >>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >>> pig.usenewlogicalplan is set to true. New logical plan will be used. >>> 2011-06-21 02:04:53,324 [main] INFO >>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics >>> with processName=JobTracker, sessionId= >>> 2011-06-21 02:04:53,447 [main] INFO >>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >>> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1 >>> Operator Key: scope-1) >>> 2011-06-21 02:04:53,458 [main] INFO >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler >>> - File concatenation threshold: 100 optimistic? false >>> 2011-06-21 02:04:53,477 [main] IN
Re: pig integration & NoClassDefFoundError TypeParser
Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src: /usr/local/src/apache-cassandra-0.8.0-src# ls /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/ AbstractCommutativeType.class AbstractType.class LexicalUUIDType.class UTF8Type.class AbstractType$1.classAbstractUUIDType.class LocalByPartionerType.class UTF8Type$UTF8Validator.class AbstractType$2.classAsciiType.class LongType.class UTF8Type$UTF8Validator$State.class AbstractType$3.classBytesType.class MarshalException.class UUIDType.class AbstractType$4.classCounterColumnType.class TimeUUIDType.class AbstractType$5.classIntegerType.class UTF8Type$1.class /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser /usr/local/src/apache-cassandra-0.8.0-src# echo $? 1 /usr/local/src/apache-cassandra-0.8.0-src# /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError . /usr/local/src/apache-cassandra-0.8.0-src# echo $? 1 /usr/local/src/apache-cassandra-0.8.0-src# TypeParser does not exist...? On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna wrote: > hmmm, did you build the cassandra src in the root of your cassandra directory > with ant? sounds like it can't find that cassandra class. That's required.
Re: pig integration & NoClassDefFoundError TypeParser
cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java : doesn't exist cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java : exists... PIG integration with 0.8.0 is no longer working / doesn't work with 0.8.0 release, but will with 0.8.1 .. fair assumption? On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy wrote: > Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src: > > /usr/local/src/apache-cassandra-0.8.0-src# ls > /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/ > AbstractCommutativeType.class AbstractType.class > LexicalUUIDType.class UTF8Type.class > AbstractType$1.class AbstractUUIDType.class > LocalByPartionerType.class UTF8Type$UTF8Validator.class > AbstractType$2.class AsciiType.class > LongType.class > UTF8Type$UTF8Validator$State.class > AbstractType$3.class BytesType.class > MarshalException.class UUIDType.class > AbstractType$4.class CounterColumnType.class > TimeUUIDType.class > AbstractType$5.class IntegerType.class > UTF8Type$1.class > > /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser > /usr/local/src/apache-cassandra-0.8.0-src# echo $? > 1 > /usr/local/src/apache-cassandra-0.8.0-src# > > /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError . > /usr/local/src/apache-cassandra-0.8.0-src# echo $? > 1 > /usr/local/src/apache-cassandra-0.8.0-src# > > TypeParser does not exist...? > > > On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna > wrote: >> hmmm, did you build the cassandra src in the root of your cassandra >> directory with ant? sounds like it can't find that cassandra class. That's >> required. > -- Sasha Dolgy sasha.do...@gmail.com
Re: pig integration & NoClassDefFoundError TypeParser
I seem to recall a last minute issue with 0.8.0 before release that the TypeParser wasn't in there (for the pig support). However, I'm pretty sure that got fixed before release. I'll test it out in a few minutes - stay tuned :). Jeremy On Jun 20, 2011, at 2:23 PM, Sasha Dolgy wrote: > cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java > : doesn't exist > cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java > : exists... > > PIG integration with 0.8.0 is no longer working / doesn't work with > 0.8.0 release, but will with 0.8.1 .. fair assumption? > > On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy wrote: >> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 >> src: >> >> /usr/local/src/apache-cassandra-0.8.0-src# ls >> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/ >> AbstractCommutativeType.class AbstractType.class >> LexicalUUIDType.class UTF8Type.class >> AbstractType$1.classAbstractUUIDType.class >> LocalByPartionerType.class UTF8Type$UTF8Validator.class >> AbstractType$2.classAsciiType.class >> LongType.class >> UTF8Type$UTF8Validator$State.class >> AbstractType$3.classBytesType.class >> MarshalException.class UUIDType.class >> AbstractType$4.classCounterColumnType.class >> TimeUUIDType.class >> AbstractType$5.classIntegerType.class >> UTF8Type$1.class >> >> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser >> /usr/local/src/apache-cassandra-0.8.0-src# echo $? >> 1 >> /usr/local/src/apache-cassandra-0.8.0-src# >> >> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError . >> /usr/local/src/apache-cassandra-0.8.0-src# echo $? >> 1 >> /usr/local/src/apache-cassandra-0.8.0-src# >> >> TypeParser does not exist...? >> >> >> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna >> wrote: >>> hmmm, did you build the cassandra src in the root of your cassandra >>> directory with ant? sounds like it can't find that cassandra class. >>> That's required. >> > > > > -- > Sasha Dolgy > sasha.do...@gmail.com
Read performance vs. vmstat + your experience with read optimizations
Hi all, I am having trouble reconciling various metrics regarding reads so I'm hoping someone here can help me understand what's going on. I am running tests on a single node cluster with 16GB of RAM. I'm testing on the following column family: Column Family: PUBLIC_MONTHLY SSTable count: 1 Space used (live): 28468417160 Space used (total): 28468417160 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 2669019991 Read Latency: 0.846 ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 2 Key cache size: 2 Key cache hit rate: 0.33393368358762754 Row cache capacity: 5 Row cache size: 5 Row cache hit rate: 0.15195090894076155 Compacted row minimum size: 216 Compacted row maximum size: 88148 Compacted row mean size: 483 The keys represent a grid cells (65 million), columns to store monthly increments (total & sum, to produce averages), super columns tag the data source The mean row length is 483 bytes The keycache & rowcache enabled but kept very low just to test going through the disk since I expect very random reads in production. I've done everything I can to optimize reads - Cassandra is setup to use only 4GB because my dataset is 28GB - I've compacted the data to a single file - I'm hitting cassandra with only 1 read request at a time & no writes. The request is a multislice across hundreds or thousands of keys The problem: vmstat shows that Cassandra is doing about 200MB/s of IO and since there are no writes on the system, I know it can only be reading (RAID-0 SSD drives). I know that Cassandra is reading about 1/3 the super columns. To be safe, let's assume Cassandra is deserializing 1/2 the row. I'll just assume for simplicity that the row size is 512 bytes. So it looks to me as if Cassandra is deserializing 200MB/((512bytes)/2)=400MB/(0.5KB) = 800K rows per second. That's 800 keys per millisecond. And yet, my app is being throttled by Cassandra during its MultigetSuperSliceCounterQuery: measuring the time spent in Hector show that I'm getting at most 20-30 rows per ms and sometimes I get My questions: 1) Any idea where the discrepency can come from ? I'd like to believe there is some magic setting that will x10 my read performance... 2) How do you recommend allocating memory ? Should I give the OS cache as much as possible or should I max out Cassandra's cache ? 3) Does anyone have numbers regarding the performance of range queries when compared to multiget queries ? I can probably take SimpleGeo's idea of a Z-order code to map the 2D grid to 1D ranges but I wonder if I will get the x10 performance I'm looking for. PS:Nodetool indicates that the read latency is 0.846ms so that's 1.12 key/ms ?! Let's just leave this aside, the process hasbeen running for 12 hours and maybe the number are very different from what we're seeing here. Thanks PG vmstat (SSD not maxed out in this but it does at other times) 0 0 78184 89252 10764 1125478400 18644818 8002 2352 7 4 50 39 0 9 78184 0 10764 1124990000 17660278 8046 2957 7 3 64 26 0 16 78184 88260 10764 1124682400 195726 0 9090 2718 8 4 52 36 0 14 78184 89376 10764 1124249600 227858 0 9533 2444 7 4 45 44 0 0 78184 88260 10764 1125433600 203374 1 9144 2567 7 4 59 30 0 4 78184 90368 10764 1125185600 235394 0 9732 1827 6 4 52 38 0 23 78184 92352 10756 1123800000 20314098 9007 2835 7 4 59 29 0 0 78184 91608 10756 1125095200 176348 0 8354 3535 7 3 64 26 1 0 78184 92352 10756 1125022800 163952 0 7475 3243 9 3 57 31 iostat -dmx 2 (filtered) Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 80.00 0.00 4061.500.0094.34 0.0047.57 80.18 19.49 19.490.00 0.16 63.00 sda 78.50 0.00 3934.500.0094.72 0.0049.31 76.87 19.27 19.270.00 0.16 62.80 dm-0 0.00 0.00 8310.500.00 192.47 0.0047.43 169.89 20.15 20.150.00 0.08 63.80 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 101.50 0.00 5141.000.00 121.16 0.0048.27 103.29 20.03 20.030.00 0.16 80.60 sda 100.00 0.00 5190.500.00 121.59 0.0047.97 100.74 19.24 19.240.00 0.15 79.80 dm-0 0.00 0.00 10552.500.00 242.85 0.0047.13 219.09 20.57 20.570.00 0.08
Re: pig integration & NoClassDefFoundError TypeParser
I think you might be having environment/classpath issues with an RC of cassandra 0.8 or something. I just downloaded 0.8 and did the following: - Ran the examples/hadoop_word_count/bin/word_count_setup to create some data - Ran contrib/pig/bin/pig_cassandra -x local example_script.pig (with the keyspace/columnfamily as wordcount/input_words) - that worked then I added the pygmalion data with a slight change for 0.8 (key_validation_class) (listed below) and ran the from_to_cassandra_bag_example.pig with bin/pig_cassandra -x local from_to_cassandra_bag_example.pig. That inputs from one column family and writes out to another column family from filtered data. The script is here (you just need to build pygmalion and point the register statement to your built pygmalion jar) - https://github.com/jeromatron/pygmalion/blob/master/scripts/from_to_cassandra_bag_example.pig That worked as well and output to cassandra. So I suspect that for some reason your environment is messed up somehow - the CassandraStorage class (for pig integration) doesn't point to TypeParser in 0.8.0. create keyspace pygmalion; use pygmalion; create column family account with comparator = UTF8Type and default_validation_class = UTF8Type and key_validation_class = UTF8Type and column_metadata= [ {column_name: num_heads, validation_class: LongType}, ]; create column family betelgeuse with comparator = UTF8Type and default_validation_class = UTF8Type; set account['hipcat']['first_name'] = 'Zaphod'; set account['hipcat']['last_name'] = 'Beeblebrox'; set account['hipcat']['birth_place'] = 'Betelgeuse Five'; set account['hipcat']['num_heads'] = '2'; set account['hoopyfrood']['first_name'] = 'Ford'; set account['hoopyfrood']['last_name'] = 'Prefect'; set account['hoopyfrood']['birth_place'] = 'Betelgeuse Five'; set account['hoopyfrood']['num_heads'] = '1'; set account['earthman']['first_name'] = 'Arthur'; set account['earthman']['last_name'] = 'Dent'; set account['earthman']['birth_place'] = 'Earth'; set account['earthman']['num_heads'] = '1'; On Jun 20, 2011, at 2:23 PM, Sasha Dolgy wrote: > cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java > : doesn't exist > cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java > : exists... > > PIG integration with 0.8.0 is no longer working / doesn't work with > 0.8.0 release, but will with 0.8.1 .. fair assumption? > > On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy wrote: >> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 >> src: >> >> /usr/local/src/apache-cassandra-0.8.0-src# ls >> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/ >> AbstractCommutativeType.class AbstractType.class >> LexicalUUIDType.class UTF8Type.class >> AbstractType$1.classAbstractUUIDType.class >> LocalByPartionerType.class UTF8Type$UTF8Validator.class >> AbstractType$2.classAsciiType.class >> LongType.class >> UTF8Type$UTF8Validator$State.class >> AbstractType$3.classBytesType.class >> MarshalException.class UUIDType.class >> AbstractType$4.classCounterColumnType.class >> TimeUUIDType.class >> AbstractType$5.classIntegerType.class >> UTF8Type$1.class >> >> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser >> /usr/local/src/apache-cassandra-0.8.0-src# echo $? >> 1 >> /usr/local/src/apache-cassandra-0.8.0-src# >> >> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError . >> /usr/local/src/apache-cassandra-0.8.0-src# echo $? >> 1 >> /usr/local/src/apache-cassandra-0.8.0-src# >> >> TypeParser does not exist...? >> >> >> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna >> wrote: >>> hmmm, did you build the cassandra src in the root of your cassandra >>> directory with ant? sounds like it can't find that cassandra class. >>> That's required. >> > > > > -- > Sasha Dolgy > sasha.do...@gmail.com
Re: framed transport and buffered transport
From changes.txt = https://github.com/apache/cassandra/blob/cassandra-0.8.0/CHANGES.txt#L687 " make framed transport the default so malformed requests can't OOM the=20= server (CASSANDRA-475) " btw, you *really* should upgrade. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20 Jun 2011, at 15:07, Donna Li wrote: > > My cassandra version is 0.6.3, what is the advantage of framed transport? > > -邮件原件- > 发件人: Jonathan Ellis [mailto:jbel...@gmail.com] > 发送时间: 2011年6月20日 10:56 > 收件人: user@cassandra.apache.org > 主题: Re: framed transport and buffered transport > > The most important difference is that only framed is supported in 0.8+ > > On Sun, Jun 19, 2011 at 9:27 PM, Donna Li wrote: >> All: >> >> What is the difference of framed transport and buffered transport? And what >> is the advantage and disadvantage of the two different transports? >> >> >> >> >> >> Thanks >> >> Donna li > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com
Re: framed transport and buffered transport
On Mon, Jun 20, 2011 at 4:51 PM, aaron morton wrote: > From changes.txt = > https://github.com/apache/cassandra/blob/cassandra-0.8.0/CHANGES.txt#L687 > " > make framed transport the default so malformed requests can't OOM the=20= > > server (CASSANDRA-475) > " > > btw, you *really* should upgrade. > > Cheers > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 20 Jun 2011, at 15:07, Donna Li wrote: > > > My cassandra version is 0.6.3, what is the advantage of framed transport? > > -邮件原件- > 发件人: Jonathan Ellis [mailto:jbel...@gmail.com] > 发送时间: 2011年6月20日 10:56 > 收件人: user@cassandra.apache.org > 主题: Re: framed transport and buffered transport > > The most important difference is that only framed is supported in 0.8+ > > On Sun, Jun 19, 2011 at 9:27 PM, Donna Li wrote: > > All: > > > What is the difference of framed transport and buffered transport? And > what > > is the advantage and disadvantage of the two different transports? > > > > > > > Thanks > > > Donna li > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > > > My first outage with Cassandra was when I ran nodetool to access the thrift port not the JMX port and I crashed Cassandra service. Thrift is more hardy now, but you really do not want to use that other transport :)
Secondary indexes performance
Hello, I've noticed that queries using secondary indexes seems to be getting rather slow. Right now I've got an Column Family with 4 indexed columns (plus 5-6 non indexed columns, column values are small), and around 1,5-2 millions of rows. I'm using pycassa client and query using get_indexed_slices method that returns over 10k rows (in batches of 1024 rows) can take up to 30 seconds. Is it normal? Seems too long for me. Maybe there's a way to tune Cassandra config for better secondary indexes performance? Using Cassandra 0.7.6 -- KosciaK
Problem with PropertyFileSnitch in Amazon EC2
Hi, I'm setting up a 3 node test cluster in multiple Amazon Availability Zones to test cross-zone internode communication (and eventually cross-region communications). But I wanted to start with a cross-zone setup and am having trouble getting the nodes to connect to each other and join one 3-node ring. All nodes just seem to join their own ring and claim 100% of that space. I'm using this Beta2 distribution of Brisk: http://debian.datastax.com/maverick/pool/brisk_1.0~beta1.2.tar.gz I had to manually recreate the $BRISK_HOME/lib/ folder because it didn't exist in the binary for some reason and I also added jna and mx4j jar files to the lib directory. The cluster is geographically located like this: Node 1 (seed): East-A Node 2: East-A Node 3: East-B The cassandra-topology.properties file on all three nodes contains this: # Cassandra Node IP=Data Center:Rack 10.68.x.x=DC1:RAC1 10.198.x.x=DC1:RAC2 10.204.x.x=DC2:RAC1 default=DC1:RAC1 and finally, here is what the relevant sections of the YAML file looks like for each node: ++ Node 1 ++ cluster_name: 'Test Cluster' initial_token: 0 auto_bootstrap: false partitioner: org.apache.cassandra.dht.RandomPartitioner - seeds: 50.17.x.x#This is the elastic IP for Node 1 listen_address: 10.68.x.x rpc_address: 0.0.0.0 endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch encryption_options: internode_encryption: none ++ Node 2 ++ cluster_name: 'Test Cluster' initial_token: 56713727820156410577229101238628035242 auto_bootstrap: true partitioner: org.apache.cassandra.dht.RandomPartitioner - seeds: 50.17.x.x#This is the elastic IP for Node 1 listen_address: 10.198.x.x rpc_address: 0.0.0.0 endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch encryption_options: internode_encryption: none ++ Node 3 ++ cluster_name: 'Test Cluster' initial_token: 113427455640312821154458202477256070485 auto_bootstrap: true partitioner: org.apache.cassandra.dht.RandomPartitioner - seeds: 50.17.x.x#This is the elastic IP for Node 1 listen_address: 10.204.x.x rpc_address: 0.0.0.0 endpoint_snitch: org.apache.cassandra.locator.PropertyFileSnitch encryption_options: internode_encryption: none When I start Cassandra on all three nodes using "sudo bin/brisk cassandra", the startup log doesn't show any warnings or errors. The end of the start log on Node1 says: INFO [main] 2011-06-20 21:06:57,702 MessagingService.java (line 201) Starting Messaging Service on port 7000 INFO [main] 2011-06-20 21:06:57,723 StorageService.java (line 482) Using saved token 0 INFO [main] 2011-06-20 21:06:57,724 ColumnFamilyStore.java (line 1011) Enqueuing flush of Memtable-LocationInfo@1260987126(38/47 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2011-06-20 21:06:57,724 Memtable.java (line 237) Writing Memtable-LocationInfo@1260987126(38/47 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2011-06-20 21:06:57,809 Memtable.java (line 254) Completed flushing /raiddrive/data/system/LocationInfo-g-12-Data.db (148 bytes) INFO [CompactionExecutor:2] 2011-06-20 21:06:57,812 CompactionManager.java (line 539) Compacting Major: [SSTableReader(path='/raiddrive/data/system/LocationInfo-g-9-Data.db'), SSTableReader(path='/raiddrive/data/system/LocationInfo-g-11-Data.db'), SSTableReader(path='/raiddrive/data/system/LocationInfo-g-10-Data.db'), SSTableReader(path='/raiddrive/data/system/LocationInfo-g-12-Data.db')] INFO [CompactionExecutor:2] 2011-06-20 21:06:57,828 CompactionIterator.java (line 186) Major@1110828771(system, LocationInfo, 429/808) now compacting at 16777 bytes/ms. INFO [main] 2011-06-20 21:06:57,881 Mx4jTool.java (line 67) mx4j successfuly loaded INFO [CompactionExecutor:2] 2011-06-20 21:06:57,909 CompactionManager.java (line 603) Compacted to /raiddrive/data/system/LocationInfo-tmp-g-13-Data.db. 808 to 432 (~53% of original) bytes for 3 keys. Time: 97ms. INFO [main] 2011-06-20 21:06:57,953 BriskDaemon.java (line 146) Binding thrift service to /0.0.0.0:9160 INFO [main] 2011-06-20 21:06:57,955 BriskDaemon.java (line 160) Using TFastFramedTransport with a max frame size of 15728640 bytes. INFO [Thread-4] 2011-06-20 21:06:57,958 BriskDaemon.java (line 187) Listening for thrift clients... And the end of the log on node 2 says: INFO [main] 2011-06-20 21:06:57,899 StorageService.java (line 368) Cassandra version: 0.8.0-beta2-SNAPSHOT INFO [main] 2011-06-20 21:06:57,901 StorageService.java (line 369) Thrift API version: 19.10.0 INFO [main] 2011-06-20 21:06:57,901 StorageService.java (line 382) Loading persisted ring state INFO [main] 2011-06-20 21:06:57,904 StorageService.java (line 418) Starting up server gossip INFO [main] 2011-06-20 21:06:57,915 ColumnFamilyStore.java (line 1011) Enqueuing flush of Memtable-LocationInfo@885597447(29/36 serialized/live bytes, 1 ops) INFO [FlushWriter:1] 2011-06-20 21:06:57,916 Memtable.java (line 237) Writing Memtable-LocationInfo@885597447(29/36 serialized/live bytes, 1 ops) INFO [FlushWriter:1] 2011-06-
OOM during restart
Hi, Cassandra: 7.6-2 I was restarting a node and ran into OOM while replaying the commit log. I am not able to bring the node up again. DEBUG 15:11:43,501 forceFlush requested but everything is clean < For this I don't know what to do. java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:123) at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.(SSTableWriter.java:395) at org.apache.cassandra.io.sstable.SSTableWriter.(SSTableWriter.java:76) at org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2238) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:166) at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49) at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:189) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Any help will be appreciated. If I update the schema while a node is down, the new schema is loaded before the flushing when the node is brought up again, correct? Thanks, -gabe
Cassandra Summit SF 2011
Registration for this year's Cassandra SF so far is double that of last year. Even though we're taking over the entire Mission Bay Conference Center, this event will sell out. Details and RSVP are at: http://www.datastax.com/events/cassandrasf2011 We're offering a 20% discount to Cassandra SF for everyone on the Cassandra mailing list. Just use the code "mailing-list", or use the following URL: http://cassandrasf2011.eventbrite.com/?discount=mailing-list We have great talks on indexing, CQL, time series data, Solandra, counters, and use cases from Netflix, Twitter, Urban Airship, Pantheon, and the Dachis group. Details on the "Speakers" tab of http://www.datastax.com/events/cassandrasf2011. (Somehow we've completely failed in making that link-able, sorry.) For one talk we're trying something new: Matt Dennis is taking data modeling questions and will explain "how to think in Cassandra" for the most popular in his talk. Please submit your ideas and votes at http://goo.gl/mod/NUMp. We still have a couple speaking slots open. If you'd like to present at Cassandra SF, send a brief proposal to lynnben...@datastax.com. This event will sell out. Reserve your ticket now at http://cassandrasf2011.eventbrite.com/?discount=mailing-list See you there! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Cassandra Summit SF 2011
On Mon, Jun 20, 2011 at 5:54 PM, Jonathan Ellis wrote: > We have great talks on indexing, CQL, time series data, Solandra, > counters, and use cases from Netflix, Twitter, Urban Airship, > Pantheon, and the Dachis group. Details on the "Speakers" tab of > http://www.datastax.com/events/cassandrasf2011. (Somehow we've > completely failed in making that link-able, sorry.) Fixed! http://www.datastax.com/events/cassandrasf2011#Speakers -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Problem with PropertyFileSnitch in Amazon EC2
Quick update... I'm trying to get a 3-node cluster defined the following way in the topology.properties file to work first: 10.68.x.x=DC1:RAC1 10.198.x.x=DC1:RAC2 10.204.x.x=DC1:RAC3 I'll split up the 3rd node into a separate data center later. Also, ignore that comment I made about the $BRISK_HOME/lib/ folder not existing. When you run ANT, I believe it populates correctly, but I'll have to confirm/test later. Based on Joaquin @ DataStax's suggestion, I tried changing the Seed IP in all 3 nodes' YAML file to the Amazon Private IP, instead of the Elastic IP. After this change, all three nodes joined the ring correctly: ubuntu@ip-10-68-x-x:~/brisk-1.0~beta1.2/resources/cassandra/conf$ ../bin/nodetool -h localhost ring Address Status State LoadOwnsToken 113427455640312821154458202477256070485 10.68.x.x Up Normal 10.9 KB 33.33% 0 10.198.x.x Up Normal 15.21 KB33.33% 56713727820156410577229101238628035242 10.204.x.x Up Normal 6.55 KB 33.33% 113427455640312821154458202477256070485 PasteBin is down and is showing me a diligent cat typing on a keyboard, so I uploaded some relevant DEBUG level log files here: http://blueplastic.com/accenture/N1-system-seed_is_ElasticIP.log (problem exists) http://blueplastic.com/accenture/N2-system-seed_is_ElasticIP.log (problem exists) http://blueplastic.com/accenture/N1-system-seed_is_privateIP.log (everything works) http://blueplastic.com/accenture/N2-system-seed_is_privateIP.log (everything works) But if I want to set up the Brisk cluster across Amazon regions, I have to be able to use the Elastic IP for the seed. Also, using v 0.7.4 of Cassandra in Amazon, we successfully set up a 30+ node cluster using 3 seed nodes which were declared in the YAML file using Elastic IPs. All 30 nodes were in the same region and availability zone. So, in an older version of Cassandra, providing the Seeds as Elastic IP used to work. In my current setup, even though nodes 1 & 2 are in the same region & availability zone, I can't seem to get them to join the same ring correctly. Here is what the system log file shows when I declare the Seed using Elastic IP: INFO [Thread-4] 2011-06-21 00:10:30,849 BriskDaemon.java (line 187) Listening for thrift clients... DEBUG [GossipTasks:1] 2011-06-21 00:10:31,608 Gossiper.java (line 201) Assuming current protocol version for /50.17.x.x DEBUG [WRITE-/50.17.212.84] 2011-06-21 00:10:31,610 OutboundTcpConnection.java (line 161) attempting to connect to /50.17.x.x DEBUG [GossipTasks:1] 2011-06-21 00:10:32,610 Gossiper.java (line 201) Assuming current protocol version for /50.17.x.x DEBUG [ScheduledTasks:1] 2011-06-21 00:10:32,613 StorageLoadBalancer.java (line 334) Disseminating load info ... DEBUG [GossipTasks:1] 2011-06-21 00:10:33,611 Gossiper.java (line 201) Assuming current protocol version for /50.17.x.x DEBUG [GossipTasks:1] 2011-06-21 00:10:34,612 Gossiper.java (line 201) Assuming current protocol version for /50.17.x.x But when I use private IP, the log shows: INFO [Thread-4] 2011-06-21 00:19:47,993 BriskDaemon.java (line 187) Listening for thrift clients... DEBUG [ScheduledTasks:1] 2011-06-21 00:19:49,769 StorageLoadBalancer.java (line 334) Disseminating load info ... DEBUG [WRITE-/10.198.126.193] 2011-06-21 00:20:09,658 OutboundTcpConnection.java (line 161) attempting to connect to /10.198.x.x INFO [GossipStage:1] 2011-06-21 00:20:09,690 Gossiper.java (line 637) Node /10.198.x.x is now part of the cluster DEBUG [GossipStage:1] 2011-06-21 00:20:09,691 MessagingService.java (line 158) Resetting pool for /10.198.x.x INFO [GossipStage:1] 2011-06-21 00:20:09,691 Gossiper.java (line 605) InetAddress /10.198.x.x is now UP DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java (line 282) Checking remote schema before delivering hints DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java (line 274) schema for /10.198.x.x matches local schema DEBUG [HintedHandoff:1] 2011-06-21 00:20:09,692 HintedHandOffManager.java (line 288) Sleeping 11662ms to stagger hint delivery - Sameer On Mon, Jun 20, 2011 at 2:28 PM, Sameer Farooqui wrote: > Hi, > > I'm setting up a 3 node test cluster in multiple Amazon Availability Zones > to test cross-zone internode communication (and eventually cross-region > communications). > > But I wanted to start with a cross-zone setup and am having trouble getting > the nodes to connect to each other and join one 3-node ring. All nodes just > seem to join their own ring and claim 100% of that space. > > I'm using this Beta2 distribution of Brisk: > http://debian.datastax.com/maverick/pool/brisk_1.0~beta1.2.tar.gz > > I had to manually recreate the $BRISK_HOME/lib/ folder because it didn't > exist in the binary for some reason and I also added jna and mx4j jar files > to the lib directory. > > The cluster is geographically located like this: > > Node 1 (seed): East-A > Node 2: East-A > Node 3: East-B > > The c
Re: Cassandra Clients for Java
+2 Hector.. We have been using Hector 0.7 for a while now..have not had any issues with it so far...also hector community is very active... We have not tried new API yet.. Sent from my iPhone On Jun 18, 2011, at 3:04 PM, Rajesh Koilpillai wrote: > +1 to Hector (especially with the changes made in the latest version of their > API) > > On Sun, Jun 19, 2011 at 12:01 AM, Steve Willcox wrote: > I'm using Hector. > > The main contributor Nate McCall is very active and responsive to any issues. > The Hector community is very active. > > I've been using Java for a long time and I disagree that the client is more > complex than the underlying Thrift client. The latest version of Hector has > made large gains in simplifying the API. It has connection caching, load > balancing and failover build into its client. > > I found it easy to use and stable. My code has been in production since April > 2011 and we've not had one Hector issue yet. > > Hope that helps > > Steve W. > > On Fri, Jun 17, 2011 at 4:02 PM, Daniel Colchete wrote: > Good day everyone! > > I'm getting started with a new project and I'm thinking about using Cassandra > because of its distributed quality and because of its performance. > > I'm using Java on the back-end. There are many many things being said about > the Java high level clients for Cassandra on the web. To be frank, I see > problems with all of the java clients. For example, Hector and Scale7-pelops > have new semantics on them that are neither Java's or Cassandra's, and I > don't see much gain from it apart from the fact that it is more complex. > Also, I was hoping to go with something that was annotation based so that it > wouldn't be necessary to write boilerplate code (again, no gain). > > Demoiselle Cassandra seems to be one option but I couldn't find a download > for it. I'm new to Java in the back-end and I find that maven is too much to > learn just because of a client library. Also it seems to be hard to integrate > with the other things I use on my project (GWT, GWT-platform, Google Eclipse > Plugin). > > Kundera looks great but besides not having a download link (Google site link > to Github, that links to Google site, but no download) its information is > partitioned on many blog posts, some of them saying things I couldn't find on > its website. One says it uses Lucandra for indexes but that is the only place > talking about it, no documentation about using it. It doesn't seem to support > Cassandra 0.8 also. Does it? > > I would like to hear from the users here what worked for you guys. Some real > world project in production that was good to write in Java, where the client > was stable and is maintained. What are the success stories of using Cassandra > with Java. What would you recommend? > > Thank you very much! > > Best, > -- > Dani > Cloud3 Tech - http://cloud3.tc/ > Twitter: @DaniCloud3 @Cloud3Tech > > > > > -- > Thanks, > - Rajesh Koilpillai
issue with querying SuperColumn
I am facing one issue with querying superColumn using clien.get() API. Although it is working when I try it for a ColumnFamily(rather than SuperColumnFamily). It is working for: ColumnFamily: users Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.2953125/63/1440 (millions of ops/MB/minutes) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: false Built indexes: [] Issuing list of users(using Cassandra-cli): [default@key1] list users; Using default limit of 100 --- RowKey: 1 => (column=name, value=74657374, timestamp=1308637325517000) Java code: String key="1"; ColumnPath columnPath = new ColumnPath("users"); columnPath.setColumn("name".getBytes()); ColumnOrSuperColumn colName = cassndraClient.get(java.nio.ByteBuffer.wrap(key.getBytes()), columnPath , ConsistencyLevel.ONE); Column col = colName.getColumn(); System.out.println(new String(col.getValue(), "UTF-8")); RESULT: I am getting "test" printed. BUT when I tried it for Super column family "SuperCli" : ColumnFamily: SuperCli (Super) Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 0.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.2953125/63/1440 (millions of ops/MB/minutes) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: false Built indexes: [] [default@key1] list SuperCli; Using default limit of 100 --- RowKey: 31 => (super_column=address, (column=city, value=6e6f696461, timestamp=1308296234977000)) => (super_column=address1, (column=city, value=476e6f696461, timestamp=1308296283221000)) => (super_column=address2, (column=city, value=476e6f696461, timestamp=1308296401951000)) 1 Row Returned. Java Code: ColumnPath columnPath = new ColumnPath("SuperCli"); columnPath.setSuper_column("address".getBytes()); String key="31"; cassndraClient.get(java.nio.ByteBuffer.wrap(key.getBytes()), columnPath , ConsistencyLevel.ONE); I am getting exception: NotFoundException() at org.apache.cassandra.thrift.Cassandra$get_result.read(Cassandra.java:6418) at org.apache.cassandra.thrift.Cassandra$Client.recv_get(Cassandra.java:519) at org.apache.cassandra.thrift.Cassandra$Client.get(Cassandra.java:492) at CasQuery.main(CasQuery.java:112) Any idea about this issue? --Vivek Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.