I believe you'd need 2^127 - 1, which is 170141183460469231731687303715884105727
On Sep 12, 2011, at 2:30 PM, Kyle Gibson wrote:
> What could you do if the initial_token is 0?
>
> On Mon, Sep 12, 2011 at 1:09 PM, Jeremy Hanna
> wrote:
>> Yeah - I would bootstrap at
> So to move data from node with token 0, the new node needs to have
> initial token set to 170141183460469231731687303715884105727 ?
I would do this route.
> Another idea: could I move token to 1, and then use token 0 on the new node?
nodetool move prior to 0.8 is a very heavy operation.
Take a look at http://www.datastax.com/dev/blog/bulk-loading
I'm sure there is a way to make it more seamless for what you want to do and it
could be built on, but the recent bulk loading additions will provide the best
foundation.
On Sep 22, 2011, at 12:25 PM, Nehal Mehta wrote:
> We are tryi
I thought I would share something valuable that Jacob Perkins (who recently
started with us) shared. We were seeing blacklisted task trackers and
occasionally failed jobs. These were almost always based on TimedOutExceptions
from Cassandra. We've been fixing underlying reasons for those excep
Just for informational purposes, Pete and I tried to troubleshoot it via
twitter. I was able to do the following with Cassandra 0.8.1 and Pig 0.9.1.
He's going to dig in to see if there's something else going on.
// Cassandra-cli stuff
// bin/cassandra-cli -h localhost -p 9160
create keyspace
cable rock in our backpack and hopefully clears up where that setting is
actually used. I'll update the storage configuration wiki to include that
caveat as well.
On Sep 10, 2011, at 5:14 PM, Jeremy Hanna wrote:
> Thanks for the insights. I may first try disabling hinted handoff for
Nice! Thanks Ed.
On Nov 10, 2011, at 11:20 PM, Edward Capriolo wrote:
> Hey all,
>
> I know there are several tickets in the pipe that should make it possible do
> use secondary indexes to run map reduce jobs that do not have to ingest the
> entire dataset such as:
>
> https://issues.apache.
We're using 0.8.4 in our cluster and two nodes needed rebuilding. When
building and streaming data to the nodes, there were multiple instances of
building secondary indexes. We haven't had secondary indexes in that keyspace
since like mid-August. Is that a bug?
Thanks,
Jeremy
> On Fri, Nov 11, 2011 at 9:10 PM, Jeremy Hanna
> wrote:
>> We're using 0.8.4 in our cluster and two nodes needed rebuilding. When
>> building and streaming data to the nodes, there were multiple instances of
>> building secondary indexes. We haven't had seco
https://issues.apache.org/jira/browse/CASSANDRA-3488
On Nov 12, 2011, at 9:52 AM, Jeremy Hanna wrote:
> It sounds like that's just a message in compactionstats that's a no-op. This
> is reporting for about an hour that it's building a secondary index on a
> specific
If you are only interested in loading one row, why do you need to use Pig? Is
it an extremely wide row?
Unless you are using an ordered partitioner, you can't limit the rows you
mapreduce over currently - you have to mapreduce over the whole column family.
That will change probably in 1.1. H
On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote:
> Jeremy Hanna gmail.com> writes:
>
>>
>> If you are only interested in loading one row, why do you need to use Pig?
>> Is
> it an extremely wide row?
>>
>> Unless you are using an ordered
On Nov 29, 2011, at 12:25 PM, Don Smith wrote:
> cli's "show keyspaces" command shows way too much information by default.
>
> I think by default it should show just one line per keyspace. A "-v" option
> could show more info.
If you are using 1.x, there is a describe command for specific ke
For those interested in Apache Cassandra related jobs - either hiring or in
search of - there is now a @Cassandra_Jobs account on Twitter. You can
either send posts to that account on twitter or send them to me at this
email address with a public link to the job posting and I will tweet them.
Che
If you're getting lots of timeout exceptions with mapreduce, you might take a
look at http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
We saw that and tweaked a variety of things - all of which are listed there.
Ultimately, we also boosted hadoop's tolerance for them as well and it
Traditionally there are two places to go. Twitter's ruby client at
https://github.com/twitter/cassandra or the newer cql driver at
http://code.google.com/a/apache-extras.org/p/cassandra-ruby/. The latter might
be nice for green field applications but CQL is still gaining features. Some
peopl
We do this all the time. Take a look at
http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use
mapreduce or pig to get data out of cassandra. If it's going to a separate
hadoop cluster, I don't think you'd need to co-locate task trackers or data
nodes on your cassandra
33 AM, Praveen Sadhu wrote:
> Have you tried Brisk?
>
>
>
> On Dec 23, 2011, at 9:30 AM, "Jeremy Hanna"
> wrote:
>
>> We do this all the time. Take a look at
>> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can
>> u
One way to get a good bird's eye view of the cluster would be to install
DataStax Opscenter - the community edition is free. You can do a lot of checks
from a web interface that are based on the jmx hooks that are in Cassandra. We
use it and it's helped us a lot. Hope it helps for what you're
to achieve this.
>
> -R
>
> On Fri, Dec 23, 2011 at 9:28 AM, Jeremy Hanna
> wrote:
> We do this all the time. Take a look at
> http://wiki.apache.org/cassandra/HadoopSupport for some details - you can use
> mapreduce or pig to get data out of cassandra. If it
This might be helpful:
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
On Dec 30, 2011, at 1:59 PM, Dom Wong wrote:
> Hi, could anyone tell me whether this is possible with Cassandra using an
> appropriately sized EC2 cluster.
>
> 100,000 clients writing 50k each
I would first look at http://wiki.apache.org/cassandra/HadoopSupport - you'll
want to look in the section on cluster configuration. DataStax also has a
product that makes it pretty simple to use Hadoop with Cassandra if you don't
mind paying for it - http://www.datastax.com/products/enterprise
On Jan 12, 2012, at 6:36 PM, Mohit Anchlia wrote:
> What's the best way to install C*? Any good links?
http://www.slideshare.net/mattdennis/cassandra-on-ec2 has some interesting
points that aren't immediately obvious - it's mdennis in the cassandra irc
channel if you had any questions about th
Take a look at http://wiki.apache.org/cassandra/HadoopSupport and in the source
download of cassandra there's a contrib/pig section that has a wordcount
example.
On Jan 23, 2012, at 1:16 PM, Tharindu Mathew wrote:
> Hi,
>
> I'm trying to experiment with Hive using Data in Cassandra. Brisk look
MapReduce and Hadoop generally are pluggable so you can do queries over HDFS,
over HBase, or over Cassandra. Cassandra has good Hadoop support as outlined
here: http://wiki.apache.org/cassandra/HadoopSupport. If you're looking for a
simpler solution, there is DataStax's enterprise product whic
Check out the troubleshooting section of the hadoop support - we ran into the
same thing and tried to update that with some info on how to get around it:
http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
On Feb 24, 2012, at 7:20 AM, Patrik Modesto wrote:
> Hi,
>
> I can see some st
By chance are you in EC2?
On Feb 24, 2012, at 8:33 AM, Patrik Modesto wrote:
> Hi Jeremy,
>
> I've seen the page and tried the values but to no help.
>
> Here goes tcpdump of one failed TCP connection:
>
> 15:06:20.231421 IP 10.0.18.87.9160 > 10.0.18.87.39396: Flags [P.], seq
> 137891735:13790
I haven't used that in particular, but it's pretty trivial to do that with Pig
and I would imagine it would just do the right thing under the covers. It's a
simple join with Pig. We use pygmalion to get data from the Cassandra bag. A
simple example would be:
DEFINE FromCassandraBag org.pygmal
you may be running into this -
https://issues.apache.org/jira/browse/CASSANDRA-3942 - I'm not sure if it
really affects the execution of the job itself though.
On Mar 6, 2012, at 2:32 AM, Patrik Modesto wrote:
> Hi,
>
> I was recently trying Hadoop job + cassandra-all 0.8.10 again and the
> Ti
some time back, I created the account cassandra_jobs on twitter. if you email
the user list or better yet just cc cassandra_jobs on twitter, I'll retweet it
there so that the information can get out to more people.
https://twitter.com/#!/cassandra_jobs
cheers,
Jeremy
fwiw - we had a similar problem reading at quorum with 0.8.4 when reading with
hadoop. The symptom we see is when reading a column family with hadoop using
quorum using 0.8.4, we have lots of minor compactions as a result of heavy
writes. When we read at CL.ONE or move to 1.0.8 the problem is
I backported this to 0.8.4 and it didn't fix the problem we were seeing (as I
outlined in my parallel post) but if it fixes it for you, then beautiful. Just
wanted to let you know our experience with similar symptoms.
On Apr 11, 2012, at 11:56 AM, Thibaut Britz wrote:
> Fixed in https://issue
The hive support is going to be integrated into the main source tree with this
ticket:
https://issues.apache.org/jira/browse/CASSANDRA-4131
You can go to https://github.com/riptano/hive to find the
CassandraStorageHandler right now though.
For 1.0.8, the CassandraStorage class for the Pig suppor
when doing a truncate, it has to talk to all of the nodes in the ring to
perform the operation. by the error, it looks like one of the nodes was
unreachable for some reason. you might do a nodetool ring in the cli do a
'describe cluster;' and see if your ring is okay.
So I think the operation
Sorry - it was at the austin cassandra meetup and we didn't record the
presentation. I wonder if this would be a popular topic to have at the
upcoming Cassandra SF event which would be recorded...
On May 17, 2012, at 6:51 AM, Tamar Fraenkel wrote:
> Hi!
>
> I found the slides of the lecture
you can use the cqlsh help but it will eventually refer you to a cql reference
such as this one that says what the options are. Looks like you need just
'default_validation'.
http://www.datastax.com/docs/1.0/references/cql/index#cql-column-family-storage-parameters
On Jul 6, 2012, at 2:13 PM,
rote:
> Thanks Jeremy, but this doesn't work for me. I am using cql3, because I need
> new features like composite keys. The manual you pointed to is for 2.0.
> I have suspicion that cql3 does not support dynamic tables at all. Is there a
> manual for cql3?
>
> -----Orig
These pages may have some helpful background for you:
http://www.datastax.com/docs/1.1/configuration/storage_configuration#compression-options
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
Cheers,
Jeremy
On Mar 9, 2013, at 9:27 PM, Kanwar Sangha wrote:
> Hi – Can some
If you have rapidly expiring data, then tombstones are probably filling your
disk and your heap (depending on how you order the data on disk). To check to
see if your queries are affected by tombstones, you might try using the query
tracing that's built-in to 1.2.
See:
http://www.datastax.com/d
Are you on SSDs?
On 27 Jun 2013, at 14:24, "Desimpel, Ignace" wrote:
> On a test with 3 cassandra servers version 1.2.5 with replication factor 1
> and leveled compaction, I did a store last night and I did not see any
> problem with Cassandra. On all 3 machine the compaction is stopped alread
The CHANGES and NEWS links pointed to the 1.2.8-tentative.
The 1.2.8 links are:
CHANGES.txt:
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/1.2.8
NEWS.txt:
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags
If you were using leveled compaction on any column families in 1.0, you'll need
to run offline scrub on those column families.
On 13 Aug 2013, at 15:38, Romain HARDOUIN wrote:
> Hi all,
>
> We are migrating from C* 1.0.6 to 1.1.12 and after reading DataStax
> documentation (http://www.datast
In order to narrow down the problem, I would start without the request
parameters and see if that works. Then I would add the request parameters one
at a time to see what breaks things. Often pig is not very helpful with its
error messages, so I've had to use this method a lot.
On 21 Aug 2013
For open-source Cassandra, there is a framework for security (see the security
book-thing in the sidebar):
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html
For those wanting additional things like auditing and other features, there's
DataStax Enterprise:
http://www.datasta
1/security/security_features
On 5 Sep 2013, at 17:51, "Hartzman, Leslie"
wrote:
> Thanks for the info.
>
> So open-source Cassandra does not provide for auditing?
>
> -Original Message-
> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com]
> Sent: Thursd
For those in the Europe area, there will be a Cassandra Summit EU 2013 in
London in the month of October. On 17 October, there will be the main
conference sessions and the 16th and 18th there will be Cassandra workshops.
http://www.datastax.com/cassandraeurope2013
The speakers have been announ
There isn't currently, but perhaps you could contribute one :). If you take a
look at the sh script in the bin directory of the word count example, it
shouldn't be terribly difficult to mimic the behavior. It's mostly just
setting up the classpath and executing the Java class with some argumen
I created to explore doing that - it would seem like a reasonable thing to do
with a batch/analytic/MR operation. You might chime in to explain your use
case on the ticket.
https://issues.apache.org/jira/browse/CASSANDRA-1821
On Dec 3, 2010, at 2:33 PM, Sanjay Acharya wrote:
> We are in the p
I think you need to load the schema from your yaml through the jmx call. See
http://wiki.apache.org/cassandra/FAQ#no_keyspaces
On Dec 13, 2010, at 9:02 AM, Peter Lin wrote:
> I downloaded the latest RC2 to play with.
>
> I was able to convert my 0.6 storage-conf.xml using the conversion
> tool
Download the source version of the latest 0.7 from
http://cassandra.apache.org/download/ and take a look at the contrib/word_count
example. Specifically, in the contrib/word_count/src/WordCountSetup.java file,
there are examples of how to create a column family using thrift.
On Dec 21, 2010, a
> You should still exercise caution
> upgrading anything that matters, but now is the time to test. Please.
For those interested in a distributed test harness, several in the Bay Area
Cassandra community have started one:
https://issues.apache.org/jira/browse/CASSANDRA-1859
On Dec 24, 2010, at
> I know that Cassandra is a work in progress and there are many
> limitations I can live with, but it would be nice to know what the
> roadmap is for the next 12-24 months so we can get an idea of what major
> directions Cassandra is going in so we can plan accordingly.
Take a look at Jira - htt
Hmmm, I've never seen that when creating Jira tickets. You might try to just
fill out the basic info first - Summary/Description. Then go in and edit the
ticket that was created - that way you can at least create the ticket and
bypass whatever error you're seeing. Weird though.
On Jan 9, 201
On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote:
> Hi folks,
>
> We have a Cassandra 0.6.6 cluster running in production. We want to run
> Hadoop (version 0.20.2) jobs over this cluster in order to generate reports.
> I modified the word_count example in the contrib folder of the cassandra
Feel free to use that wiki page or another wiki page to collaborate on more
pressing multi tenant issues. The wiki is editable by all. The MultiTenant
page was meant as a launching point for tracking progress on things we could
think of wrt MT.
Obviously the memtable problem is the largest co
Definitely get involved with that google group, but some examples are found
here:
https://github.com/zznate/hector-examples/blob/master/src/main/java/com/riptano/cassandra/hector/example/SchemaManipulation.java
On Jan 18, 2011, at 10:17 PM, Aaron Morton wrote:
> Try the hector user group for hel
Is there anyone working with current chef recipes for Cassandra?
Begin forwarded message:
> From: Viraj Bhat
> Date: February 1, 2011 1:02:23 PM PST
> To: "pig-u...@hadoop.apache.org" ,
> "mapreduce-u...@hadoop.apache.org" ,
> "mapreduce-...@hadoop.apache.org" ,
> "hdfs-...@hadoop.apache.org" ,
> "d...@hive.apache.org" ,
> "mapreduce-...@hadoop.apache.o
/CASSANDRA-1311
On Feb 11, 2011, at 1:31 PM, Jeremy Hanna wrote:
> So from here I guess it's a matter of working out the comments/concerns
> presented on 1311 and any future discussion sounds like it belongs there.
>
> Like I said, I just wanted to initiate discussion since it had be
On Feb 21, 2011, at 4:33 PM, Ásgeir Halldórsson wrote:
> Thanks for the fast response but that would be quite difficult on paging
> results, do you know if there is a fix in the works?
I don't think the range ghosts behavior is going away. Is it possible to
buffer results and return them once
Yeah - no worries - I don't think anyone was thinking you were trying to drink
kool-aid or selling anything. Jonathan was just pointing out thoughtful
replies to his claims.
This past year, Michael Stonebraker with voltdb and other things seems to have
tried to take advantage of momentum behin
And everyone has a bias - and I think most people working with any of these
solutions realizes that.
I think it's interesting how many organizations use multiple data storage
solutions versus just using one as they have different capabilities - like the
recent Netflix news about using different
It's in http://svn.apache.org/repos/asf/cassandra/trunk/ if you'd like to try
it though that's pretty bleeding edge. Also I'm not sure if the wiki page
documents all of the changes that have been made to counters. So the source is
the best available docs :). You're welcome to ask any specific
There certainly could be a thrift based record writer. However, (if I remember
correctly) to enable Hadoop output streaming, it was easier to go with Avro for
doing the records as the schema is included. There could also have been a
thrift version of the record writer, but it's simpler to just
t 10:19 AM, Jeremy Hanna wrote:
> There certainly could be a thrift based record writer. However, (if I
> remember correctly) to enable Hadoop output streaming, it was easier to go
> with Avro for doing the records as the schema is included. There could also
> have been a thrift
I started a wiki page for those wishing to let people in the community know
about projects/products that integrate with Cassandra.
http://wiki.apache.org/cassandra/IntegrationPoints
So far listed there are projects like Hadoop (including Pig and hive),
Solr/Lucene, Flume, and Scribe.
If you wo
Have you considered using Solandra (Solr/Lucene + Cassandra) -
https://github.com/tjake/Lucandra#readme ? There is a #solandra channel on
freenode if you had any questions as well.
On Mar 3, 2011, at 8:00 AM, Vodnok wrote:
> Ok seems that i'll use Solr (with dedicated Cassandra) for search
>
I've seen both sides but Cassandra does handle replication and bringing data
back is a matter of bootstrapping a node to replace the downed node.
One thing to consider is availability zones and regions though. What happens
if your entire cluster goes down in the case of a single datacenter go
Comments in-line.
On Mar 10, 2011, at 8:10 PM, Bob Futrelle wrote:
> After a reboot, cassandra spits out many lines on startup but then appears to
> stall.
>
> Worse, trying to run cassandra a second time stops immediately because of a
> port problem:
>
> apache-cassandra-0.7.3: sudo ./bin/c
Yep - it's usable and separate so you should be able to download 0.7-branch and
build the jar and use it against a 0.7.3 cluster. I've been using it against a
0.7.2 cluster actually.
http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/
To use it, check out the readme in the contri
I don't know if others have asked this but do you have a firewall running that
would prevent access to those ports or something like that?
On Mar 11, 2011, at 10:40 PM, Bob Futrelle wrote:
> My frustration continues, especially exasperating because so many people just
> seem to download Cassand
Can you go into the #cassandra channel and ask your question? See if
jeromatron or driftx are around. That way there can be a back and forth about
settings and things.
http://webchat.freenode.net/?channels=#cassandra
On Mar 14, 2011, at 10:06 AM, Or Yanay wrote:
> Hi All,
>
> I am trying t
Just for the sake of updating this thread - Orr didn't yet have task trackers
on the Cassandra nodes so most of the time was likely due to copying the ~100G
of data to the hadoop cluster prior to processing. They're going to try after
installing task trackers on the nodes.
On Mar 14, 2011, at
contrib is only in the source download of cassandra
On Mar 15, 2011, at 11:23 AM, Jonathan Colby wrote:
> According to the Cassandra Wiki and OReilly book supposedly there is a
> "contrib" directory within the cassandra download containing the
> Python Stress Test script stress.py. It's not in t
Paul,
Don't feel like you have to hold back when it comes to feedback. There is a
place to vote on releases. If you have something that could potentially be
critical that you can isolate, by all means chime in. Even if your vote isn't
binding if you are not a committer, votes with something
You can start with a word count example that's only for hdfs. Then you can
replace the reducer in that with the ReducerToCassandra that's in the cassandra
word_count example. You need to match up your Mapper's output to the Reducer's
input and set a couple of configuration variables to tell it
I started it and added the tentative patch at the end of October. It needs to
be rebased with the current 0.7-branch and completed - it's mostly there. I
just tried to abstract some things in the process.
I have changed jobs since then and I just haven't had time with the things I've
been doi
ed possible
> problems.
>
> I may well need to take a crack at this sometime in the next few weeks, but
> if somebody beats me to it, I certainly won't complain.
>
> On Thu, Mar 17, 2011 at 2:06 PM, Jeremy Hanna
> wrote:
> I started it and added the tentative patc
I talked to Matt Dennis in the channel about it and I think everyone would like
to make sure that cassandra works great across multiple regions. He sounded
like he didn't know why it wouldn't work after having looked at the patches. I
would like to try it both ways - with and without the patch
at 10:41 PM, Jeremy Hanna wrote:
> Sorry if I was presumptuous earlier. I created a ticket so that the patch
> could be submitted and reviewed - that is if it can be generalized so that it
> works across regions and doesn't adversely affect the common case.
> https://issues.
s a part of larger patch. I will explain in the
> limitation sections about why it is not a general solution; as I find time.
>
> Regards
> Milind
>
> On Mon, Mar 21, 2011 at 11:42 PM, Jeremy Hanna
> wrote:
> Sorry if I was presumptuous earlier. I created a ticket s
't provide decent information
between regions, something like this workaround patch is required.
Anyway - thanks for the work on this.
On Mar 22, 2011, at 8:33 AM, Jeremy Hanna wrote:
> Milind,
>
> Thank you for attaching the patch here, but it would be really nice if you
> cou
The limit defaults to 1024 but you can set it when you use CassandraStorage in
pig, like so:
rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096);
or whatever value you wish.
Give that a try and see if it gives you more of what you're looking for.
On Mar 24, 2011, at 1:16
r. Are there plans to make this streaming/paged?
>
> -Jeffrey
>
> -Original Message-
> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com]
> Sent: Thursday, March 24, 2011 11:34 AM
> To: user@cassandra.apache.org
> Subject: Re: pig counting question
>
> Th
And if you download the 0.7 branch and build the cassandra_storage.jar in the
contrib/pig section with that update, you should be able to use it with your
0.7.3 cluster. Those changes are typically independent of the Cassandra
version.
On Mar 24, 2011, at 5:49 PM, Jeremy Hanna wrote:
> H
p the limit up very high (e.g. 1M columns), my Cassandra
> starts eating up huge amounts of memory, maxing out my 16GB heap size. I
> suspect this is because of the get_range_slices() call from
> ColumnFamilyRecordReader. Are there plans to make this streaming/paged?
>
> -Jeff
x27;s my understanding of it; if there's something I'm missing, please let
> me know.
>
> -Jeffrey
>
> -Original Message-
> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com]
> Sent: Friday, March 25, 2011 11:06 AM
> To: user@cassandra.apache.org
> Subject: Re: p
On Apr 1, 2011, at 10:13 AM, Eric Evans wrote:
> On Fri, 2011-04-01 at 09:52 -0500, Jeremiah Jordan wrote:
>> Quick comment on libraries for different languages.
>> The libraries for different languages should almost ALWAYS look
>> different. They should look like what someone using that languag
Speaking of jdbc - there's already a jdbc driver that's been written :)
http://svn.apache.org/repos/asf/cassandra/trunk/drivers/java/src/org/apache/cassandra/cql/jdbc/
On Apr 1, 2011, at 11:21 AM, Moaz Reyad wrote:
> See:
>
> https://svn.apache.org/viewvc/cassandra/trunk/doc/cql/CQL.html?view=co
As some may have heard, CQL is going to be in 0.8. It's a level of abstraction
that will hopefully make the lives of client developers substantially easier.
The ideal is to make it so client devs only need to do work to make a client
idiomatic to a language or even a framework within a languag
oh yeah - that's what's going on. what I do is on the machine that I run the
pig script from, I set the PIG_CONF variable to my HADOOP_HOME/conf directory
and in my mapred-site.xml file found there, I set the three variables.
I don't use environment variables when I run against a cluster.
On A
Just as an example:
cassandra.thrift.address
10.12.34.56
cassandra.thrift.port
9160
cassandra.partitioner.class
org.apache.cassandra.dht.RandomPartitioner
On Apr 19, 2011, at 10:28 PM, Jeremy Hanna wrote:
> oh yeah - that's what's going o
tuple
(name, value)}) - the column names are extracted from the variable names in the
Pig script.
Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna
StringConcat: probably something everyone implements but instead of CONCAT that
only does two strings, it does any number of st
Thanks Eric!
On Apr 26, 2011, at 7:03 PM, Eric Evans wrote:
> On Sat, 2011-04-23 at 16:49 -0700, David Strauss wrote:
>> I just noticed that, following the Cassandra 0.8 beta release, the Apt
>> repository is encouraging servers in my clusters to upgrade. Beta
>> releases should probably be on di
one thing we're looking at doing is watching the cassandra data directory and
backing up the sstables to s3 when they are created. Some guys at simplegeo
started tablesnap that does this:
https://github.com/simplegeo/tablesnap
What it does is for every sstable that is pushed to s3, it also copi
SSTables as they are created, and drop
> them in S3.
>
> Whatever you do, make sure you have a regular process to restore the
> data and verify that it contains what you think it should...
>
> Adrian
>
> On Thu, Apr 28, 2011 at 1:35 PM, Jeremy Hanna
> wrote:
>>
It sounds like there might be some tuning you can do to your jobs - take a look
at the wiki's HadoopSupport page, specifically the Troubleshooting section:
http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting
On Apr 29, 2011, at 11:45 AM, Subscriber wrote:
> Hi all,
>
> We want to sh
ay 2, 2011, at 6:25 AM, Subscriber wrote:
> Hi Jeremy,
>
> thanks for the link.
> I doubled the rpc_timeout (20 seconds) and reduced the range-batch-size to
> 2048, but I still get timeouts...
>
> Udo
>
> Am 29.04.2011 um 18:53 schrieb Jeremy Hanna:
>
>>
found in the system.logs that the ConcurrentMarkSweeps take quite long (up
> to 8 seconds). The heap size didn't grow much about 3GB so there was still
> "enough air to breath".
>
> So the question remains: can I recommend this setup?
>
> Thanks again and best re
If you're able, go into the #cassandra channel on freenode (IRC) and talk to
driftx or jbellis or aaron_morton about your problem. It could be that you
don't have to do all of this based on a conversation there.
On May 6, 2011, at 5:04 AM, Henrik Schröder wrote:
> I'll see if I can make some e
101 - 200 of 246 matches
Mail list logo