Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
I believe you'd need 2^127 - 1, which is 170141183460469231731687303715884105727 On Sep 12, 2011, at 2:30 PM, Kyle Gibson wrote: > What could you do if the initial_token is 0? > > On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna > wrote: >> Yeah - I would bootstrap at

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
> So to move data from node with token 0, the new node needs to have > initial token set to 170141183460469231731687303715884105727 ? I would do this route. > Another idea: could I move token to 1, and then use token 0 on the new node? nodetool move prior to 0.8 is a very heavy operation.

Re: Tool for SQL -> Cassandra data movement

2011-09-22 Thread Jeremy Hanna
Take a look at http://www.datastax.com/dev/blog/bulk-loading I'm sure there is a way to make it more seamless for what you want to do and it could be built on, but the recent bulk loading additions will provide the best foundation. On Sep 22, 2011, at 12:25 PM, Nehal Mehta wrote: > We are tryi

Hadoop settings if running into blacklisted task trackers with Cassandra

2011-09-24 Thread Jeremy Hanna
I thought I would share something valuable that Jacob Perkins (who recently started with us) shared. We were seeing blacklisted task trackers and occasionally failed jobs. These were almost always based on TimedOutExceptions from Cassandra. We've been fixing underlying reasons for those excep

Re: pig_cassandra problem - "Incompatible field schema" error

2011-10-11 Thread Jeremy Hanna
Just for informational purposes, Pete and I tried to troubleshoot it via twitter. I was able to do the following with Cassandra 0.8.1 and Pig 0.9.1. He's going to dig in to see if there's something else going on. // Cassandra-cli stuff // bin/cassandra-cli -h localhost -p 9160 create keyspace

Re: Massive writes when only reading from Cassandra

2011-10-17 Thread Jeremy Hanna
cable rock in our backpack and hopefully clears up where that setting is actually used. I'll update the storage configuration wiki to include that caveat as well. On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote: > Thanks for the insights. I may first try disabling hinted handoff for

Re: Efficient map reduce over ranges of Cassandra data

2011-11-11 Thread Jeremy Hanna
Nice! Thanks Ed. On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote: > Hey all, > > I know there are several tickets in the pipe that should make it possible do > use secondary indexes to run map reduce jobs that do not have to ingest the > entire dataset such as: > > https://issues.apache.

secondary indexes streaming building - when there are none

2011-11-11 Thread Jeremy Hanna
We're using 0.8.4 in our cluster and two nodes needed rebuilding. When building and streaming data to the nodes, there were multiple instances of building secondary indexes. We haven't had secondary indexes in that keyspace since like mid-August. Is that a bug? Thanks, Jeremy

Re: secondary indexes streaming building - when there are none

2011-11-12 Thread Jeremy Hanna
> On Fri, Nov 11, 2011 at 9:10 PM, Jeremy Hanna > wrote: >> We're using 0.8.4 in our cluster and two nodes needed rebuilding. When >> building and streaming data to the nodes, there were multiple instances of >> building secondary indexes. We haven't had seco

Re: secondary indexes streaming building - when there are none

2011-11-13 Thread Jeremy Hanna
https://issues.apache.org/jira/browse/CASSANDRA-3488 On Nov 12, 2011, at 9:52 AM, Jeremy Hanna wrote: > It sounds like that's just a message in compactionstats that's a no-op. This > is reporting for about an hour that it's building a secondary index on a > specific

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
If you are only interested in loading one row, why do you need to use Pig? Is it an extremely wide row? Unless you are using an ordered partitioner, you can't limit the rows you mapreduce over currently - you have to mapreduce over the whole column family. That will change probably in 1.1. H

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote: > Jeremy Hanna gmail.com> writes: > >> >> If you are only interested in loading one row, why do you need to use Pig? >> Is > it an extremely wide row? >> >> Unless you are using an ordered

Re: User Survey

2011-11-29 Thread Jeremy Hanna
On Nov 29, 2011, at 12:25 PM, Don Smith wrote: > cli's "show keyspaces" command shows way too much information by default. > > I think by default it should show just one line per keyspace. A "-v" option > could show more info. If you are using 1.x, there is a describe command for specific ke

Cassandra_Jobs on Twitter

2011-11-30 Thread Jeremy Hanna
For those interested in Apache Cassandra related jobs - either hiring or in search of - there is now a @Cassandra_Jobs account on Twitter. You can either send posts to that account on twitter or send them to me at this email address with a public link to the job posting and I will tweet them. Che

Re: Cassandra not suitable?

2011-12-07 Thread Jeremy Hanna
If you're getting lots of timeout exceptions with mapreduce, you might take a look at http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting We saw that and tweaked a variety of things - all of which are listed there. Ultimately, we also boosted hadoop's tolerance for them as well and it

Re: Using Cassandra in Rails App

2011-12-16 Thread Jeremy Hanna
Traditionally there are two places to go. Twitter's ruby client at https://github.com/twitter/cassandra or the newer cql driver at http://code.google.com/a/apache-extras.org/p/cassandra-ruby/. The latter might be nice for green field applications but CQL is still gaining features. Some peopl

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
We do this all the time. Take a look at http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use mapreduce or pig to get data out of cassandra. If it's going to a separate hadoop cluster, I don't think you'd need to co-locate task trackers or data nodes on your cassandra

Re: cassandra data to hadoop.

2011-12-23 Thread Jeremy Hanna
33 AM, Praveen Sadhu wrote: > Have you tried Brisk? > > > > On Dec 23, 2011, at 9:30 AM, "Jeremy Hanna" > wrote: > >> We do this all the time. Take a look at >> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can >> u

Re: Best way to determine how a Cassandra cluster is doing

2011-12-23 Thread Jeremy Hanna
One way to get a good bird's eye view of the cluster would be to install DataStax Opscenter - the community edition is free. You can do a lot of checks from a web interface that are based on the jmx hooks that are in Cassandra. We use it and it's helped us a lot. Hope it helps for what you're

Re: cassandra data to hadoop.

2011-12-24 Thread Jeremy Hanna
to achieve this. > > -R > > On Fri, Dec 23, 2011 at 9:28 AM, Jeremy Hanna > wrote: > We do this all the time. Take a look at > http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use > mapreduce or pig to get data out of cassandra. If it

Re: Cassandra performance question

2011-12-30 Thread Jeremy Hanna
This might be helpful: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html On Dec 30, 2011, at 1:59 PM, Dom Wong wrote: > Hi, could anyone tell me whether this is possible with Cassandra using an > appropriately sized EC2 cluster. > > 100,000 clients writing 50k each

Re: Hadoop + Cassandra

2012-01-06 Thread Jeremy Hanna
I would first look at http://wiki.apache.org/cassandra/HadoopSupport - you'll want to look in the section on cluster configuration. DataStax also has a product that makes it pretty simple to use Hadoop with Cassandra if you don't mind paying for it - http://www.datastax.com/products/enterprise

Re: Installing C* on EC2

2012-01-13 Thread Jeremy Hanna
On Jan 12, 2012, at 6:36 PM, Mohit Anchlia wrote: > What's the best way to install C*? Any good links? http://www.slideshare.net/mattdennis/cassandra-on-ec2 has some interesting points that aren't immediately obvious - it's mdennis in the cassandra irc channel if you had any questions about th

Re: Hive + Cassandra tutorial

2012-01-23 Thread Jeremy Hanna
Take a look at http://wiki.apache.org/cassandra/HadoopSupport and in the source download of cassandra there's a contrib/pig section that has a wordcount example. On Jan 23, 2012, at 1:16 PM, Tharindu Mathew wrote: > Hi, > > I'm trying to experiment with Hive using Data in Cassandra. Brisk look

Re: General questions about Cassandra

2012-02-17 Thread Jeremy Hanna
MapReduce and Hadoop generally are pluggable so you can do queries over HDFS, over HBase, or over Cassandra. Cassandra has good Hadoop support as outlined here: http://wiki.apache.org/cassandra/HadoopSupport. If you're looking for a simpler solution, there is DataStax's enterprise product whic

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-24 Thread Jeremy Hanna
Check out the troubleshooting section of the hadoop support - we ran into the same thing and tried to update that with some info on how to get around it: http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting On Feb 24, 2012, at 7:20 AM, Patrik Modesto wrote: > Hi, > > I can see some st

Re: newer Cassandra + Hadoop = TimedOutException()

2012-02-24 Thread Jeremy Hanna
By chance are you in EC2? On Feb 24, 2012, at 8:33 AM, Patrik Modesto wrote: > Hi Jeremy, > > I've seen the page and tried the values but to no help. > > Here goes tcpdump of one failed TCP connection: > > 15:06:20.231421 IP 10.0.18.87.9160 > 10.0.18.87.39396: Flags [P.], seq > 137891735:13790

Re: hadoop map join with ColumnFamilyInputFormat

2012-03-01 Thread Jeremy Hanna
I haven't used that in particular, but it's pretty trivial to do that with Pig and I would imagine it would just do the right thing under the covers. It's a simple join with Pig. We use pygmalion to get data from the Cassandra bag. A simple example would be: DEFINE FromCassandraBag org.pygmal

Re: newer Cassandra + Hadoop = TimedOutException()

2012-03-06 Thread Jeremy Hanna
you may be running into this - https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it really affects the execution of the job itself though. On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote: > Hi, > > I was recently trying Hadoop job + cassandra-all 0.8.10 again and the > Ti

cassandra_jobs on twitter

2012-04-10 Thread Jeremy Hanna
some time back, I created the account cassandra_jobs on twitter. if you email the user list or better yet just cc cassandra_jobs on twitter, I'll retweet it there so that the information can get out to more people. https://twitter.com/#!/cassandra_jobs cheers, Jeremy

Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Jeremy Hanna
fwiw - we had a similar problem reading at quorum with 0.8.4 when reading with hadoop. The symptom we see is when reading a column family with hadoop using quorum using 0.8.4, we have lots of minor compactions as a result of heavy writes. When we read at CL.ONE or move to 1.0.8 the problem is

Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread Jeremy Hanna
I backported this to 0.8.4 and it didn't fix the problem we were seeing (as I outlined in my parallel post) but if it fixes it for you, then beautiful. Just wanted to let you know our experience with similar symptoms. On Apr 11, 2012, at 11:56 AM, Thibaut Britz wrote: > Fixed in https://issue

Re: Source for Cassandra Pig and Hive

2012-05-02 Thread Jeremy Hanna
The hive support is going to be integrated into the main source tree with this ticket: https://issues.apache.org/jira/browse/CASSANDRA-4131 You can go to https://github.com/riptano/hive to find the CassandraStorageHandler right now though. For 1.0.8, the CassandraStorage class for the Pig suppor

Re: Exception when truncate

2012-05-17 Thread Jeremy Hanna
when doing a truncate, it has to talk to all of the nodes in the ring to perform the operation. by the error, it looks like one of the nodes was unreachable for some reason. you might do a nodetool ring in the cli do a 'describe cluster;' and see if your ring is okay. So I think the operation

Re: Matthew Dennis's "Cassandra On EC2"

2012-05-17 Thread Jeremy Hanna
Sorry - it was at the austin cassandra meetup and we didn't record the presentation. I wonder if this would be a popular topic to have at the upcoming Cassandra SF event which would be recorded... On May 17, 2012, at 6:51 AM, Tamar Fraenkel wrote: > Hi! > > I found the slides of the lecture

Re: Dynamic CF

2012-07-06 Thread Jeremy Hanna
you can use the cqlsh help but it will eventually refer you to a cql reference such as this one that says what the options are. Looks like you need just 'default_validation'. http://www.datastax.com/docs/1.0/references/cql/index#cql-column-family-storage-parameters On Jul 6, 2012, at 2:13 PM,

Re: Dynamic CF

2012-07-06 Thread Jeremy Hanna
rote: > Thanks Jeremy, but this doesn't work for me. I am using cql3, because I need > new features like composite keys. The manual you pointed to is for 2.0. > I have suspicion that cql3 does not support dynamic tables at all. Is there a > manual for cql3? > > -----Orig

Re: chunk lenght

2013-03-09 Thread Jeremy Hanna
These pages may have some helpful background for you: http://www.datastax.com/docs/1.1/configuration/storage_configuration#compression-options http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression Cheers, Jeremy On Mar 9, 2013, at 9:27 PM, Kanwar Sangha wrote: > Hi – Can some

Re: Cassandra as storage for cache data

2013-06-25 Thread Jeremy Hanna
If you have rapidly expiring data, then tombstones are probably filling your disk and your heap (depending on how you order the data on disk). To check to see if your queries are affected by tombstones, you might try using the query tracing that's built-in to 1.2. See: http://www.datastax.com/d

Re: Too many open files and stopped compaction with many pending compaction tasks

2013-06-27 Thread Jeremy Hanna
Are you on SSDs? On 27 Jun 2013, at 14:24, "Desimpel, Ignace" wrote: > On a test with 3 cassandra servers version 1.2.5 with replication factor 1 > and leveled compaction, I did a store last night and I did not see any > problem with Cassandra. On all 3 machine the compaction is stopped alread

Re: [RELEASE] Apache Cassandra 1.2.8

2013-07-29 Thread Jeremy Hanna
The CHANGES and NEWS links pointed to the 1.2.8-tentative. The 1.2.8 links are: CHANGES.txt: https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/1.2.8 NEWS.txt: https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags

Re: C* 1.0.6 to 1.1.12: upgradesstables or scrub?

2013-08-13 Thread Jeremy Hanna
If you were using leveled compaction on any column families in 1.0, you'll need to run offline scrub on those column families. On 13 Aug 2013, at 15:38, Romain HARDOUIN wrote: > Hi all, > > We are migrating from C* 1.0.6 to 1.1.12 and after reading DataStax > documentation (http://www.datast

Re: bug in Pig LOAD with cqlStorage and param columns? - cassandra 1.2.8 - pig 0.11.1

2013-08-21 Thread Jeremy Hanna
In order to narrow down the problem, I would start without the request parameters and see if that works. Then I would add the request parameters one at a time to see what breaks things. Often pig is not very helpful with its error messages, so I've had to use this method a lot. On 21 Aug 2013

Re: Security?

2013-09-05 Thread Jeremy Hanna
For open-source Cassandra, there is a framework for security (see the security book-thing in the sidebar): http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html For those wanting additional things like auditing and other features, there's DataStax Enterprise: http://www.datasta

Re: Security?

2013-09-05 Thread Jeremy Hanna
1/security/security_features On 5 Sep 2013, at 17:51, "Hartzman, Leslie" wrote: > Thanks for the info. > > So open-source Cassandra does not provide for auditing? > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Thursd

Cassandra Summit EU 2013

2013-09-30 Thread Jeremy Hanna
For those in the Europe area, there will be a Cassandra Summit EU 2013 in London in the month of October. On 17 October, there will be the main conference sessions and the 16th and 18th there will be Cassandra workshops. http://www.datastax.com/cassandraeurope2013 The speakers have been announ

Re: how can i ran the word count example on windows?

2010-12-01 Thread Jeremy Hanna
There isn't currently, but perhaps you could contribute one :). If you take a look at the sh script in the bin directory of the word count example, it shouldn't be terribly difficult to mimic the behavior. It's mostly just setting up the classpath and executing the Java class with some argumen

Re: Cassandra Map-Reduce

2010-12-04 Thread Jeremy Hanna
I created to explore doing that - it would seem like a reasonable thing to do with a batch/analytic/MR operation. You might chime in to explain your use case on the ticket. https://issues.apache.org/jira/browse/CASSANDRA-1821 On Dec 3, 2010, at 2:33 PM, Sanjay Acharya wrote: > We are in the p

Re: question about 0.7 RC2 CLI

2010-12-13 Thread Jeremy Hanna
I think you need to load the schema from your yaml through the jmx call. See http://wiki.apache.org/cassandra/FAQ#no_keyspaces On Dec 13, 2010, at 9:02 AM, Peter Lin wrote: > I downloaded the latest RC2 to play with. > > I was able to convert my 0.6 storage-conf.xml using the conversion > tool

Re: Create CF in Mapper's setup

2010-12-21 Thread Jeremy Hanna
Download the source version of the latest 0.7 from http://cassandra.apache.org/download/ and take a look at the contrib/word_count example. Specifically, in the contrib/word_count/src/WordCountSetup.java file, there are examples of how to create a column family using thrift. On Dec 21, 2010, a

Re: [RELEASE] 0.7.0 rc3

2010-12-24 Thread Jeremy Hanna
> You should still exercise caution > upgrading anything that matters, but now is the time to test. Please. For those interested in a distributed test harness, several in the Bay Area Cassandra community have started one: https://issues.apache.org/jira/browse/CASSANDRA-1859 On Dec 24, 2010, at

Re: Cassandra gotchas ...

2011-01-08 Thread Jeremy Hanna
> I know that Cassandra is a work in progress and there are many > limitations I can live with, but it would be nice to know what the > roadmap is for the next 12-24 months so we can get an idea of what major > directions Cassandra is going in so we can plan accordingly. Take a look at Jira - htt

Re: can't create a jira ticket for cassandra

2011-01-09 Thread Jeremy Hanna
Hmmm, I've never seen that when creating Jira tickets. You might try to just fill out the basic info first - Summary/Description. Then go in and edit the ticket that was created - that way you can at least create the ticket and bypass whatever error you're seeing. Weird though. On Jan 9, 201

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-13 Thread Jeremy Hanna
On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote: > Hi folks, > > We have a Cassandra 0.6.6 cluster running in production. We want to run > Hadoop (version 0.20.2) jobs over this cluster in order to generate reports. > I modified the word_count example in the contrib folder of the cassandra

Re: Multi-tenancy, and authentication and authorization

2011-01-18 Thread Jeremy Hanna
Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues. The wiki is editable by all. The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT. Obviously the memtable problem is the largest co

Re: about the hector client

2011-01-18 Thread Jeremy Hanna
Definitely get involved with that google group, but some examples are found here: https://github.com/zznate/hector-examples/blob/master/src/main/java/com/riptano/cassandra/hector/example/SchemaManipulation.java On Jan 18, 2011, at 10:17 PM, Aaron Morton wrote: > Try the hector user group for hel

current chef recipes for cassandra

2011-02-01 Thread Jeremy Hanna
Is there anyone working with current chef recipes for Cassandra?

Fwd: CFP Fourth IEEE International Scalable Computing Challenge (SCALE 2011) - Deadline 28 Feb 2011

2011-02-01 Thread Jeremy Hanna
Begin forwarded message: > From: Viraj Bhat > Date: February 1, 2011 1:02:23 PM PST > To: "pig-u...@hadoop.apache.org" , > "mapreduce-u...@hadoop.apache.org" , > "mapreduce-...@hadoop.apache.org" , > "hdfs-...@hadoop.apache.org" , > "d...@hive.apache.org" , > "mapreduce-...@hadoop.apache.o

Re: plugins/triggers/coprocessors

2011-02-16 Thread Jeremy Hanna
/CASSANDRA-1311 On Feb 11, 2011, at 1:31 PM, Jeremy Hanna wrote: > So from here I guess it's a matter of working out the comments/concerns > presented on 1311 and any future discussion sounds like it belongs there. > > Like I said, I just wanted to initiate discussion since it had be

Re: Rows and deletion

2011-02-21 Thread Jeremy Hanna
On Feb 21, 2011, at 4:33 PM, Ásgeir Halldórsson wrote: > Thanks for the fast response but that would be quite difficult on paging > results, do you know if there is a fix in the works? I don't think the range ghosts behavior is going away. Is it possible to buffer results and return them once

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jeremy Hanna
Yeah - no worries - I don't think anyone was thinking you were trying to drink kool-aid or selling anything. Jonathan was just pointing out thoughtful replies to his claims. This past year, Michael Stonebraker with voltdb and other things seems to have tried to take advantage of momentum behin

Re: New Chain for : Does Cassandra use vector clocks

2011-02-25 Thread Jeremy Hanna
And everyone has a bias - and I think most people working with any of these solutions realizes that. I think it's interesting how many organizations use multiple data storage solutions versus just using one as they have different capabilities - like the recent Netflix news about using different

Re: counters & v0.8

2011-02-26 Thread Jeremy Hanna
It's in http://svn.apache.org/repos/asf/cassandra/trunk/ if you'd like to try it though that's pretty bleeding edge. Also I'm not sure if the wiki page documents all of the changes that have been made to counters. So the source is the best available docs :). You're welcome to ask any specific

Re: ColumnFamilyRecordWriter

2011-02-28 Thread Jeremy Hanna
There certainly could be a thrift based record writer. However, (if I remember correctly) to enable Hadoop output streaming, it was easier to go with Avro for doing the records as the schema is included. There could also have been a thrift version of the record writer, but it's simpler to just

Re: ColumnFamilyRecordWriter

2011-02-28 Thread Jeremy Hanna
t 10:19 AM, Jeremy Hanna wrote: > There certainly could be a thrift based record writer. However, (if I > remember correctly) to enable Hadoop output streaming, it was easier to go > with Avro for doing the records as the schema is included. There could also > have been a thrift

Integrating Cassandra with other projects/products

2011-03-02 Thread Jeremy Hanna
I started a wiki page for those wishing to let people in the community know about projects/products that integrate with Cassandra. http://wiki.apache.org/cassandra/IntegrationPoints So far listed there are projects like Hadoop (including Pig and hive), Solr/Lucene, Flume, and Scribe. If you wo

Re: Advice on a design

2011-03-03 Thread Jeremy Hanna
Have you considered using Solandra (Solr/Lucene + Cassandra) - https://github.com/tjake/Lucandra#readme ? There is a #solandra channel on freenode if you had any questions as well. On Mar 3, 2011, at 8:00 AM, Vodnok wrote: > Ok seems that i'll use Solr (with dedicated Cassandra) for search >

Re: Aamzon EC2 & Cassandra to ebs or not..

2011-03-09 Thread Jeremy Hanna
I've seen both sides but Cassandra does handle replication and bringing data back is a matter of bootstrapping a node to replace the downed node. One thing to consider is availability zones and regions though. What happens if your entire cluster goes down in the case of a single datacenter go

Re: Cassandra startup port problem, apache-cassandra-0.7.3 on Snow Leopard.

2011-03-10 Thread Jeremy Hanna
Comments in-line. On Mar 10, 2011, at 8:10 PM, Bob Futrelle wrote: > After a reboot, cassandra spits out many lines on startup but then appears to > stall. > > Worse, trying to run cassandra a second time stops immediately because of a > port problem: > > apache-cassandra-0.7.3: sudo ./bin/c

Re: Pig output to Cassandra

2011-03-11 Thread Jeremy Hanna
Yep - it's usable and separate so you should be able to download 0.7-branch and build the jar and use it against a 0.7.3 cluster. I've been using it against a 0.7.2 cluster actually. http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/ To use it, check out the readme in the contri

Re: Cassandra still won't start - in-use ports block it

2011-03-11 Thread Jeremy Hanna
I don't know if others have asked this but do you have a firewall running that would prevent access to those ports or something like that? On Mar 11, 2011, at 10:40 PM, Bob Futrelle wrote: > My frustration continues, especially exasperating because so many people just > seem to download Cassand

Re: Map-Reduce on top of cassandra

2011-03-14 Thread Jeremy Hanna
Can you go into the #cassandra channel and ask your question? See if jeromatron or driftx are around. That way there can be a back and forth about settings and things. http://webchat.freenode.net/?channels=#cassandra On Mar 14, 2011, at 10:06 AM, Or Yanay wrote: > Hi All, > > I am trying t

Re: Map-Reduce on top of cassandra

2011-03-14 Thread Jeremy Hanna
Just for the sake of updating this thread - Orr didn't yet have task trackers on the Cassandra nodes so most of the time was likely due to copying the ~100G of data to the hadoop cluster prior to processing. They're going to try after installing task trackers on the nodes. On Mar 14, 2011, at

Re: where to find the stress testing programs?

2011-03-15 Thread Jeremy Hanna
contrib is only in the source download of cassandra On Mar 15, 2011, at 11:23 AM, Jonathan Colby wrote: > According to the Cassandra Wiki and OReilly book supposedly there is a > "contrib" directory within the cassandra download containing the > Python Stress Test script stress.py. It's not in t

Re: Upgrade to a different version?

2011-03-16 Thread Jeremy Hanna
Paul, Don't feel like you have to hold back when it comes to feedback. There is a place to vote on releases. If you have something that could potentially be critical that you can isolate, by all means chime in. Even if your vote isn't binding if you are not a committer, votes with something

Re: hadoop cassandra

2011-03-17 Thread Jeremy Hanna
You can start with a word count example that's only for hdfs. Then you can replace the reducer in that with the ReducerToCassandra that's in the cassandra word_count example. You need to match up your Mapper's output to the Reducer's input and set a couple of configuration variables to tell it

Re: hadoop streaming input

2011-03-17 Thread Jeremy Hanna
I started it and added the tentative patch at the end of October. It needs to be rebased with the current 0.7-branch and completed - it's mostly there. I just tried to abstract some things in the process. I have changed jobs since then and I just haven't had time with the things I've been doi

Re: hadoop streaming input

2011-03-17 Thread Jeremy Hanna
ed possible > problems. > > I may well need to take a crack at this sometime in the next few weeks, but > if somebody beats me to it, I certainly won't complain. > > On Thu, Mar 17, 2011 at 2:06 PM, Jeremy Hanna > wrote: > I started it and added the tentative patc

Re: EC2 - 2 regions

2011-03-21 Thread Jeremy Hanna
I talked to Matt Dennis in the channel about it and I think everyone would like to make sure that cassandra works great across multiple regions. He sounded like he didn't know why it wouldn't work after having looked at the patches. I would like to try it both ways - with and without the patch

Re: EC2 - 2 regions

2011-03-21 Thread Jeremy Hanna
at 10:41 PM, Jeremy Hanna wrote: > Sorry if I was presumptuous earlier. I created a ticket so that the patch > could be submitted and reviewed - that is if it can be generalized so that it > works across regions and doesn't adversely affect the common case. > https://issues.

Re: EC2 - 2 regions

2011-03-22 Thread Jeremy Hanna
s a part of larger patch. I will explain in the > limitation sections about why it is not a general solution; as I find time. > > Regards > Milind > > On Mon, Mar 21, 2011 at 11:42 PM, Jeremy Hanna > wrote: > Sorry if I was presumptuous earlier. I created a ticket s

Re: EC2 - 2 regions

2011-03-22 Thread Jeremy Hanna
't provide decent information between regions, something like this workaround patch is required. Anyway - thanks for the work on this. On Mar 22, 2011, at 8:33 AM, Jeremy Hanna wrote: > Milind, > > Thank you for attaching the patch here, but it would be really nice if you > cou

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
The limit defaults to 1024 but you can set it when you use CassandraStorage in pig, like so: rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096); or whatever value you wish. Give that a try and see if it gives you more of what you're looking for. On Mar 24, 2011, at 1:16

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
r. Are there plans to make this streaming/paged? > > -Jeffrey > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Thursday, March 24, 2011 11:34 AM > To: user@cassandra.apache.org > Subject: Re: pig counting question > > Th

Re: pig counting question

2011-03-24 Thread Jeremy Hanna
And if you download the 0.7 branch and build the cassandra_storage.jar in the contrib/pig section with that update, you should be able to use it with your 0.7.3 cluster. Those changes are typically independent of the Cassandra version. On Mar 24, 2011, at 5:49 PM, Jeremy Hanna wrote: > H

Re: pig counting question

2011-03-25 Thread Jeremy Hanna
p the limit up very high (e.g. 1M columns), my Cassandra > starts eating up huge amounts of memory, maxing out my 16GB heap size. I > suspect this is because of the get_range_slices() call from > ColumnFamilyRecordReader. Are there plans to make this streaming/paged? > > -Jeff

Re: pig counting question

2011-03-25 Thread Jeremy Hanna
x27;s my understanding of it; if there's something I'm missing, please let > me know. > > -Jeffrey > > -Original Message- > From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] > Sent: Friday, March 25, 2011 11:06 AM > To: user@cassandra.apache.org > Subject: Re: p

Re: Ditching Cassandra

2011-04-01 Thread Jeremy Hanna
On Apr 1, 2011, at 10:13 AM, Eric Evans wrote: > On Fri, 2011-04-01 at 09:52 -0500, Jeremiah Jordan wrote: >> Quick comment on libraries for different languages. >> The libraries for different languages should almost ALWAYS look >> different. They should look like what someone using that languag

Re: Ditching Cassandra

2011-04-01 Thread Jeremy Hanna
Speaking of jdbc - there's already a jdbc driver that's been written :) http://svn.apache.org/repos/asf/cassandra/trunk/drivers/java/src/org/apache/cassandra/cql/jdbc/ On Apr 1, 2011, at 11:21 AM, Moaz Reyad wrote: > See: > > https://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.html?view=co

CQL, 0.8, and need for language drivers

2011-04-12 Thread Jeremy Hanna
As some may have heard, CQL is going to be in 0.8. It's a level of abstraction that will hopefully make the lives of client developers substantially easier. The ideal is to make it so client devs only need to do work to make a client idiomatic to a language or even a framework within a languag

Re: pig + hadoop

2011-04-19 Thread Jeremy Hanna
oh yeah - that's what's going on. what I do is on the machine that I run the pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf directory and in my mapred-site.xml file found there, I set the three variables. I don't use environment variables when I run against a cluster. On A

Re: pig + hadoop

2011-04-19 Thread Jeremy Hanna
Just as an example: cassandra.thrift.address 10.12.34.56 cassandra.thrift.port 9160 cassandra.partitioner.class org.apache.cassandra.dht.RandomPartitioner On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote: > oh yeah - that's what's going o

Pygmalion - a github project for pig + cassandra

2011-04-27 Thread Jeremy Hanna
tuple (name, value)}) - the column names are extracted from the variable names in the Pig script. Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of st

Re: Apt repositories

2011-04-27 Thread Jeremy Hanna
Thanks Eric! On Apr 26, 2011, at 7:03 PM, Eric Evans wrote: > On Sat, 2011-04-23 at 16:49 -0700, David Strauss wrote: >> I just noticed that, following the Cassandra 0.8 beta release, the Apt >> repository is encouraging servers in my clusters to upgrade. Beta >> releases should probably be on di

Re: best way to backup

2011-04-28 Thread Jeremy Hanna
one thing we're looking at doing is watching the cassandra data directory and backing up the sstables to s3 when they are created. Some guys at simplegeo started tablesnap that does this: https://github.com/simplegeo/tablesnap What it does is for every sstable that is pushed to s3, it also copi

Re: best way to backup

2011-04-29 Thread Jeremy Hanna
SSTables as they are created, and drop > them in S3. > > Whatever you do, make sure you have a regular process to restore the > data and verify that it contains what you think it should... > > Adrian > > On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna > wrote: >>

Re: Experiences with Map&Reduce Stress Tests

2011-04-29 Thread Jeremy Hanna
It sounds like there might be some tuning you can do to your jobs - take a look at the wiki's HadoopSupport page, specifically the Troubleshooting section: http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting On Apr 29, 2011, at 11:45 AM, Subscriber wrote: > Hi all, > > We want to sh

Re: Experiences with Map&Reduce Stress Tests

2011-05-02 Thread Jeremy Hanna
ay 2, 2011, at 6:25 AM, Subscriber wrote: > Hi Jeremy, > > thanks for the link. > I doubled the rpc_timeout (20 seconds) and reduced the range-batch-size to > 2048, but I still get timeouts... > > Udo > > Am 29.04.2011 um 18:53 schrieb Jeremy Hanna: > >>

Re: Experiences with Map&Reduce Stress Tests

2011-05-03 Thread Jeremy Hanna
found in the system.logs that the ConcurrentMarkSweeps take quite long (up > to 8 seconds). The heap size didn't grow much about 3GB so there was still > "enough air to breath". > > So the question remains: can I recommend this setup? > > Thanks again and best re

Re: Migrating all rows from 0.6.13 to 0.7.5 over thrift?

2011-05-06 Thread Jeremy Hanna
If you're able, go into the #cassandra channel on freenode (IRC) and talk to driftx or jbellis or aaron_morton about your problem. It could be that you don't have to do all of this based on a conversation there. On May 6, 2011, at 5:04 AM, Henrik Schröder wrote: > I'll see if I can make some e

<    1   2   3   >