Theory aside, I switched from "RAID of ephemerals" for data, and root
volume for write log to single EBS-based SSD without any noticeable impact
on performance.
will
On Thu, Sep 4, 2014 at 9:35 PM, Steve Robenalt
wrote:
> Yes, I am aware there are no heads on an SSD. I also have seen plenty of
IPs) or private only.
>
> Any insight regarding snitches ? What snitch do you guys use ?
>
>
> 2014-06-05 15:06 GMT+02:00 William Oberman >:
>
>> I don't think traffic will flow between "classic" ec2 and vpc directly.
>> There is some kind of gateway
I don't think traffic will flow between "classic" ec2 and vpc directly.
There is some kind of gateway bridge instance that sits between, acting as
a NAT. I would think that would cause new challenges for:
-transitions
-clients
Sorry this response isn't heavy on content! I'm curious how this thr
I'm concerned about the bad reports of using shuffle to do a vnode upgrade
(and I did a "smoke test" trying shuffle a test cluster, and had out of
disk space issues).
I then started to plan out the "dual DC" upgrade path, but I wonder if this
option is easier:
Starting point: N node cluster, no v
I found this:
http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3ccaeduwd1erq-1m-kfj6ubzsbeser8dwh+g-kgdpstnbgqsqc...@mail.gmail.com%3E
I read the three referenced cases. In addition, case 4123 references:
http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html
And
the source for NTS), as NTS does
skip over the same rack (though, it will allow multiple in the same rack if
you "fill up"... I guess if someone did DC:4 with 3 racks they'll always
get one rack with two copies of the data, for example).
will
On Tue, May 13, 2014 at 1:41 PM, Will
4, Ruchir Jha wrote:
> I tried to do this, however the doubling in disk space is not "temporary"
> as you state in your note. What am I missing?
>
>
> On Fri, Apr 11, 2014 at 10:44 AM, William Oberman <
> ober...@civicscience.com
> > wrote:
>
> So, if I
operly. If not -
>> that's the reason.
>>
>> Kind regards,
>> Michał Michalski,
>> michal.michal...@boxever.com
>>
>>
>> On 14 April 2014 15:04, William Oberman wrote:
>>
>>> I didn't cross link my thread, but the basic idea is
lski,
> michal.michal...@boxever.com
>
>
> On 14 April 2014 14:44, William Oberman wrote:
>
>> I had a thread on this forum about clearing junk from a CF. In my case,
>> it's ~90% of ~1 billion rows.
>>
>> One side effect I had hoped for was a reduction i
I had a thread on this forum about clearing junk from a CF. In my case,
it's ~90% of ~1 billion rows.
One side effect I had hoped for was a reduction in the size of the bloom
filter. But, according to nodetool cfstats, it's still fairly large
(~1.5GB of RAM).
Do bloom filters ever resize themse
you
> decide to take and the results that would be great.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman > wrote:
>
>> I've learned a *lot* from this thread. My thanks to all of the
>> contributors!
>>
>> Paulo: Good luck w
o compact these smaller SSTables. For all these
>> reasons it is generally advised to stay away from running compactions
>> manually.
>>
>> Assuming that this is a production environment and you want to keep
>> everything running as smoothly as possible I would reduce the gc_grace o
ant to keep
> everything running as smoothly as possible I would reduce the gc_grace on
> the CF, allow automatic minor compactions to kick in and then increase the
> gc_grace once again after the tombstones have been removed.
>
>
> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman >
impact to minor compactions).
I'm hesitant to write the offending sentence again :-)
On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
wrote:
> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) r
allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day interval.
>
>
> Mark
>
>
> On Fri,
it never worked so I run
> nodetool compaction on every node; that does it.
>
>
> 2014-04-11 16:05 GMT+02:00 William Oberman :
>
> I'm wondering what will clear tombstoned rows? nodetool cleanup, nodetool
>> repair, or time (as in just wait)?
>>
>> I had a CF
I'm wondering what will clear tombstoned rows? nodetool cleanup, nodetool
repair, or time (as in just wait)?
I had a CF that was more or less storing session information. After some
time, we decided that one piece of this information was pointless to track
(and was 90%+ of the columns, and in 99
the row
}
} catch (Exception $e) {
fwrite(STDERR, $e);
}
====
On Fri, Apr 4, 2014 at 1:40 PM, William Oberman wrote:
> Looking at the code, cassandra.input.split.size==Pig URL split_size,
> right? But, in cassandra 1.2.15 I'm wondering if there is a bug that would
&g
w exactly why,
> probably because it hits the minimum number of rows per token.
>
> Another suggestion is to decrease the number of simultaneous mappers of
> your job, so it doesn't hit cassandra too hard, and you'll get less
> TimedOutExceptions, but your job will take long
Hi,
I have some history with cassandra + hadoop:
1.) Single DC + integrated hadoop = Was "ok" until I needed steady
performance (the single DC was used in a production environment)
2.) Two DC's + integrated hadoop on 1 of 2 DCs = Was "ok" until my data
grew and in AWS compute is expensive compared
Same region, cross zone transfer is $0.01 / GB (see
http://aws.amazon.com/ec2/pricing/, Data Transfer section).
On Wed, Feb 12, 2014 at 3:04 PM, Russell Bradberry wrote:
> Cross zone data transfer does not cost any extra money.
>
> LOCAL_QUORUM = QUORUM if all 6 servers are located in the same
I'm using AWS's EMR (hadoop as a service), and one step copies some data
from EMR -> my cassandra cluster. I used to patch EMR with pig 0.11, but
now AWS officially supports 0.11, so I thought I'd give it a try. I was
having issues. The AWS forum on it is here:
https://forums.aws.amazon.com/thre
I've been running cassandra a while, and have used the PHP api and
cassandra-cli, but never gave cqlsh a shot.
I'm not quite getting it. My most simple CF is a dumping ground for
testing things created as:
create column family stats;
I was putting random stats I was computing in it. All keys, co
I get this:
Running rpm_check_debug
ERROR with rpm_check_debug vs depsolve:
apache-cassandra11 conflicts with apache-cassandra11-1.1.11-1.noarch
I'm using Centos. Problem with my OS, or problem with the package? (And
how can it conflict with itself??)
will
ndTcpConnectionPool constructor checking
> to see if this could be the source of the leak.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/05/2013, at 2:18 AM,
; The
> threads are created in the OutboundTcpConnectionPool constructor checking
> to see if this could be the source of the leak.
>
> Cheers
>
> -----
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On
> if your app is leaking connection you should probably deal with that first.
>
> Cheers
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30/04/2013, at 3:07 AM, William Ober
Hi,
I'm having some issues. I keep getting:
ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java
(line 135) Exception in thread Thread[GossipStage:1,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
--
after a day or two of runti
99% sure it's in bytes.
On Mon, Apr 15, 2013 at 11:25 AM, William Oberman
wrote:
> Mainly the:
> "ColumnFamilyMemtable ops,data"
> section.
>
> Is data in bytes/kb/mb/etc?
>
> Example line:
> StatusLogger.java (line 116) civicscience.sessions4963,1799916
>
> Thanks!
>
>
>
Mainly the:
"ColumnFamilyMemtable ops,data"
section.
Is data in bytes/kb/mb/etc?
Example line:
StatusLogger.java (line 116) civicscience.sessions4963,1799916
Thanks!
pears I can't set min > 32
>
> Why did you want to set it so high ?
> If you want to disable compaction set it to 0.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thel
just leave my compaction killers
running instead (not that killing compactions constantly isn't messing with
things as well).
will
On Tue, Apr 2, 2013 at 10:43 AM, William Oberman
wrote:
> Edward, you make a good point, and I do think am getting closer to having
> to increase my clust
ions are "out of
> control" it usually means one of these things,
> 1) you have a corrupt table that the compaction never finishes on,
> sstables count keep growing
> 2) you do not have enough hardware to handle your write load
>
>
> On Tue, Apr 2, 2013 at 7:50 AM, Wil
tcompactionthreshold
> - Set the min and max
> compaction thresholds for a given column family
>
>
>
> On Mon, Apr 1, 2013 at 12:38 PM, William Oberman
> 'ober...@civicscience.com');>
> > wrote:
>
>> I'll skip the prelude, but I worked myse
I'll skip the prelude, but I worked myself into a bit of a jam. I'm
recovering now, but I want to double check if I'm thinking about things
correct.
Basically, I was in a state where a majority of my servers wanted to do
compactions, and rather large ones. This was impacting my site
performance.
I happened to notice some bizarre timestamps coming out of the
cassandra-cli. Example:
[default@XXX] get CF[‘e2b753aa33b13e74e5e803d787b06000'];
=> (column=c35ef420-c37a-11e0-ac88-09b2f4397c6a, value=XXX,
timestamp=2013042719)
=> (column=c3845ea0-c37a-11e0-8f6f-09b2f4397c6a, value=XXX,
timestamp=2
d Index files. The problem goes away when I have all another
> files (Compression, Filter...)
>
>
> On Mon, Jan 21, 2013 at 11:36 AM, William Oberman <
> ober...@civicscience.com> wrote:
>
>> I'm running 1.1.6 from the datastax repo.
>>
>> I
I'm running 1.1.6 from the datastax repo.
I ran sstable2json and got the following error:
Exception in thread "main" java.io.IOError: java.io.IOException: dataSize
of 7020023552240793698 starting at 993981393 would be larger than file
/var/lib/cassandra/data/X-Data.db length 7502161255
I have a "peer EBS disk" to the ephemeral disk . Then I do nodetool
snapshot -> rsync from ephemeral to EBS -> take snapshot of EBS. Syncing
nodetool snapshot directly to S3 would involve less steps and be cheaper
(EBS costs more than S3), but I do post processing on the snapshot for EMR,
and it
n
> the cluster. This way, you would be able to correctly set the vars you need.
> Out of curiosity, could you share what are you using for cassandra
> storage? I am currently using EC2 local disks, but I am looking for an
> alternative.
>
> Best regards,
> Marcelo.
>
>
oner).
I still want to know why the old easy way (of setting the 3 system
variables on the pig starter box, and having the config flow into the task
trackers) doesn't work!
will
On Fri, Jan 4, 2013 at 9:04 AM, William Oberman wrote:
> On all tasktrackers, I see:
> java.io.IOException
ess
> isn't set (on the slave, the master is ok).
>
> Can you post the full error ?
>
> Cheers
>-
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/01/2013, at 11:15 AM
Anyone ever try to read or write directly between EMR <-> Cassandra?
I'm running various Cassandra resources in Ec2, so the "physical
connection" part is pretty easy using security groups. But, I'm having
some configuration issues. I have managed to get Cassandra + Hadoop
working in the past usi
rote data directly to DC2, then you are correct you
> don't need to run repair.
>
> You should just need to update the schema, and then decommission the node.
>
> -Jeremiah
>
> On Nov 12, 2012, at 2:25 PM, William Oberman
> wrote:
>
> There is a great
There is a great guide here on how to add resources:
http://www.datastax.com/docs/1.1/operations/cluster_management#adding-capacity
What about deleting resources? I'm thinking of removing a data center.
Clearly I'd need to change strategy options, which is currently something
like this:
{DC1:3,DC
A recent thread made it sound like Brisk was no longer a datastax supported
thing (it's DataStax Enterpise, or DSE, now):
http://www.mail-archive.com/user@cassandra.apache.org/msg24921.html
In particular this response:
http://www.mail-archive.com/user@cassandra.apache.org/msg25061.html
On Thu, Oc
through them. I
> don't recall if we did paging in pig or mapreduce but you should be able to
> do that in both since pig allows you to specify the slice start.
>
> On Oct 11, 2012, at 11:28 AM, William Oberman
> wrote:
>
> > If you don't mind me asking, how are you
ort, it sounds like there are
> some rough edges like you say. But issues that are reproducible on tickets
> for any problems are much appreciated and they will get addressed.
>
> On Oct 11, 2012, at 10:43 AM, William Oberman
> wrote:
>
> > I'm wondering how many peop
I'm wondering how many people are using cassandra + pig out there? I
recently went through the effort of validating things at a much higher
level than I previously did(*), and found a few issues:
https://issues.apache.org/jira/browse/CASSANDRA-4748
https://issues.apache.org/jira/browse/CASSANDRA-4
going on in terms of the integration between
cassandra/pig/hadoop.
will
On Thu, Sep 27, 2012 at 3:26 PM, William Oberman
wrote:
> The next painful lesson for me was figuring out how to get logging working
> for a distributed hadoop process. In my test environment, I have a single
>
oop cluster
I'm going to try to undo all of my other hacks to get logging/printing
working to confirm if those were actually the only two changes I had to
make.
will
On Thu, Sep 27, 2012 at 1:43 PM, William Oberman
wrote:
> Ok, this is painful. The first problem I found is in sto
ging messages),
make sure it's appears first on the pig classpath (use pig -secretDebugCmd
to see the fully qualified command line).
The next thing I'm trying to figure out is why when widerows == true I'm
STILL not seeing more than 1024 columns :-(
will
On Wed, Sep 26, 2012 at 3:42 PM,
Hi,
I'm trying to figure out what's going on with my cassandra/hadoop/pig
system. I created a "mini" copy of my main cassandra data by randomly
subsampling to get ~50,000 keys. I was then writing pig scripts but also
the equivalent operation using simple single threaded code to double check
pig.
from the range being
> considered, not the last node that was chosen as a replica).
>
> To fix this, you'll either need to make the 1d node a 1c node, or make
> 42535295865117307932921825928971026432 a 1d node so that you're alternating
> racks within that DC.
>
>
Hi,
I recently upgraded from 0.8.x to 1.1.x (through 1.0 briefly) and nodetool
-ring seems to have changed from "owns" to "effectively owns".
"Effectively owns" seems to account for replication factor (RF). I'm ok
with all of this, yet I still can't figure out what's up with my cluster.
I have
I also have used datastax with great success (same disclaimer).
A specific example:
-I setup a one-on-one call to talk through an issue, in my case a server
reconfiguration. It took 2 days to find a time to meet, though that was my
fault as I believe they could have worked me in within a day. I
cluster and we're in the process of moving
> to production. We're currently using pig from cdhu0. All we did was
> replace the 0.8.4 jars after installing the debian packages for 0.8.4.
>
> Not sure if that helps anyone, but thought I would share what we've seen.
>
> Btw,
I've had some troubles, so I thought I'd pass on my various bug fixes:
-Cass 0.8.4 has troubles with pig/hadoop (you get NPE's when trying to connect
to cassandra in the pig logs). You need this patch:
http://svn.apache.org/viewvc?revision=1158940&view=revision
And maybe this:
http://svn.apache.
>
>
> create keyspace civicscience with replication_factor=3 and
> strategy_options = [{us-east:3}] and
> placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy';
>
> FYI the replication_factor property with the NTS is incorrect, the next(?)
> revision of 0.8 will raise an error
I was hoping to transition my "simple" cassandra cluster (where each node is a
cassandra + hadoop tasktracker) to a cluster with two virtual datacenters
(vanilla cassandra vs. cassandra + hadoop tasktracker), based on this:
http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig
The problem
I finally upgraded to 0.7.4 -> 0.8.0 (using riptano packages) 2 days ago.
Before, my resident memory (for the java process) would slowly grow without
bound and the OS would kill the process. But, over the last 2 days, I
_think_ it's been stable. I'll let you know in a week :-)
My other stats:
AW
a Java class.
> *
> *
> On Fri, Jul 8, 2011 at 11:13 AM, William Oberman > wrote:
>
>> I use a language specific wrapper around thrift as my "client", but yes, I
>> guess I fundamentally mean thrift == client, and the cassandra server ==
>> server.
>>
I use a language specific wrapper around thrift as my "client", but yes, I
guess I fundamentally mean thrift == client, and the cassandra server ==
server.
will
On Fri, Jul 8, 2011 at 11:08 AM, Jeffrey Kesselman wrote:
> I am confused by what you mean by "Cassandra client code." Is this part o
dra is using the database definition.
will
On Fri, Jul 8, 2011 at 10:35 AM, William Oberman
wrote:
> I think you need to look into Zookeeper, or other distributed coordinator,
> as you have little/no guarantees from cassandra between 1-3 (in terms of the
> guarantees you want and need).
&g
lidation check, see?
>
> If Cassandra does not guard against this then one possible
> solution would be to make my own key-to-mutex map in memory, lock the mutex
> for A's key as a precursor to (1) and release it in a post-update function.
> But I am always very nervous
else will confirm if I'm
wrong yet again.
For me, if I need two pieces of data to be consistently related to each
other and stored in cassandra, I encode them (usually JSON) and store them
in one column.
will
On Fri, Jul 8, 2011 at 8:30 AM, William Oberman wrote:
> Questions like this see
Questions like this seem to come up a lot:
http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-no
http://stackoverflow.com/questions/2055037/cassandra-atomic-reads-writes-within-a-single-columnfamily
http://www.mail-archive.com/use
I think I had (and have) a similar problem:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html
My memory usage grew slowly until I ran out of mem and the OS killed my
process (due to no swap).
I'm still on 0.7.4, but I'm rolling
>
>
> On Wed, Jul 6, 2011 at 2:48 PM, William Oberman
> wrote:
>
>> I have a few cassandra/hadoop/pig questions. I currently have things set
>> up in a test environment, and for the most part everything works. But,
>> before I start to roll things out to produc
I have a few cassandra/hadoop/pig questions. I currently have things set up
in a test environment, and for the most part everything works. But, before
I start to roll things out to production, I wanted to check on/confirm some
things.
When I originally set things up, I used:
http://wiki.apache.o
tally works.
Sounds like you are hacking (or at least looking) at the source, so all the
power to you if/when you try these kind of changes.
will
On Sun, Jul 3, 2011 at 8:45 PM, AJ wrote:
> **
> On 7/3/2011 6:32 PM, William Oberman wrote:
>
> Was just going off of: " Send the
Was just going off of: "Send the value to the primary replica and send
placeholder values to the other replicas". Sounded like you wanted to write
the value to one, and write the placeholder to N-1 to me. But, C* will
propagate the value to N-1 eventually anyways, 'cause that's just what it
does
Ok, I see the "you happen to choose the 'right' node" idea, but it sounds
like you want to solve "C* problems" in the client, and they already wrote
that complicated code to make clients simple. You're talking about
reimplementing key<->node mappings, network topology (with failures), etc...
Plu
that is the current metric to use.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30 Jun 2011, at 06:35, William Oberman wrote:
>
> > I'll start with my question: given
I'll start with my question: given a CF with comparator TimeUUIDType, what
is the most efficient way to get the greatest column's value?
Context: I've been running cassandra for a couple of months now, so
obviously it's time to start layering more on top :-) In my test
environment, I managed to g
I think you have to do:
assume counters comparator as bytes;
del counters['EU'][0];
will
On Fri, Jun 24, 2011 at 6:51 AM, Sasha Dolgy wrote:
> I have implemented counters in a limited capacity to record the number
> of 'hits' that are received from a given ISO country code. CH for
> example,
I've been doing EBS snapshots for mysql for some time now, and was using a
similar pattern as Josep (XFS with freeze, snap, unfreeze), with the extra
complication that I was actually using 8 EBS's in RAID-0 (and the extra
extra complication that I had to lock the MyISAM tables... glad to be moving
> Doesn't matter. auto_bootstrap only applies to first start ever.
>
> On Wed, Jun 22, 2011 at 10:48 AM, William Oberman
> wrote:
> > I have a question about auto_bootstrap. When I originally brought up the
> > cluser, I did:
> > -seed with auto_boot = false
>
so that new clusters don't bootstrap immediately. You
should turn this on when you start adding new nodes to a cluster that
already has data on it.
I'm not adding new nodes, but the cluster does have data on it...
will
On Wed, Jun 22, 2011 at 11:39 AM, William Oberman
wrote:
> I jus
the version of nodetool
will
On Wed, Jun 22, 2011 at 10:15 AM, William Oberman
wrote:
> I'm running 0.7.4 from rpm (riptano). If I do a yum upgrade, it's trying
> to do 0.7.6. To get 0.8.x I have to do "install apache-cassandra08". But
> that is going to insta
I'm running 0.7.4 from rpm (riptano). If I do a yum upgrade, it's trying to
do 0.7.6. To get 0.8.x I have to do "install apache-cassandra08". But that
is going to install two copies.
Is there a semi-official way of properly upgrading to 0.8 via rpm?
--
Will Oberman
Civic Science, Inc.
3030 Pe
unning on the
> boxes?
>
>
>
> On Wed, Jun 22, 2011 at 9:06 AM, William Oberman > wrote:
>
>> I was wondering/I figured that /var/log/kern indicated the OS was killing
>> java (versus an internal OOM).
>>
>> The nodetool repair is interesting. My applica
100% for days and days
> before it was finally killed because 'apt' was fighting for resource.
> At least, that's as far as I got in my investigation before giving up,
> moving to 0.8.0 and implementing 24hr nodetool repair on each node via
> cronjobso far ... no p
isn't super obvious to me at the moment...
> >
> > On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis
> wrote:
> >> The place to start is with the statistics Cassandra logs after each GC.
>
> look for GCInspector
>
> I found this in the logs on all my se
I woke up this morning to all 4 of 4 of my cassandra instances reporting
they were down in my cluster. I quickly started them all, and everything
seems fine. I'm doing a postmortem now, but it appears they all OOM'd at
roughly the same time, which was not reported in any cassandra log, but I
disc
I haven't done it yet, but when I researched how to make
geo-diverse/failover DCs, I figured I'd have to do something like RF=6,
strategy = {DC1=3, DC2=3}, and LOCAL_QUORUM for reads/writes. This gives
you an "ack" after 2 local nodes do the read/write, but the data eventually
gets distributed to
I'll do a reply all, to keep this more consistent (sorry!).
Rather than staying stuck, I wrote a custom function: TupleToBagOfTuple. I'm
curious if I could have avoided it with proper pig scripting though.
On Wed, Jun 15, 2011 at 3:08 PM, William Oberman
wrote:
> My problem is the
ndraBag from
> pygmalion - it does the work for you to get it back into a form that
> cassandra understands.
>
> Others may know better how to massage the data into that form using just
> pig, but if all else fails, you could write a udf to do that.
>
> Jeremy
>
> On Jun 1
I think I'm stuck on typing issues trying to store data in cassandra. To
verify, cassandra wants (key, {tuples})
My pig script is fairly brief:
raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS
(key:chararray, columns:bag {column:tuple (name, value)});
--colums == timeUUID -> J
I decided to try out hadoop/pig + cassandra. I had my ups and downs to get
the script I wanted to run to work. I'm sure everyone who tries will have
their own experiences/problems, but mine were:
-Everything I need to know was in
http://hadoop.apache.org/common/docs/r0.20.2/cluster_setup.html an
out the BigTable and original Facebook papers,
> linked from the wiki
>
> <http://wiki.apache.org/cassandra/ArchitectureOverview>Aaron
>
> On 29 Apr 2011, at 23:43, William Oberman wrote:
>
> Dumb question, but referenced twice now: which files are the SSTables and
>
system from there without impacting the main data raid.
>
> But the main reason to do this is to have an 'omg we screwed up big time
> and deleted / corrupted data' recovery.
>
> On Apr 28, 2011, at 9:53 PM, William Oberman wrote:
>
> Even with N-nodes for redundancy, I
so copies a
> json file with the current files in the directory, so you can know what to
> restore in that event (as far as I understand).
>
> On Apr 28, 2011, at 2:53 PM, William Oberman wrote:
>
> > Even with N-nodes for redundancy, I still want to have backups. I'm an
&
at seems pointless
anyways.
will
On Thu, Apr 28, 2011 at 3:57 PM, Sasha Dolgy wrote:
> You could take a snapshot to an EBS volume. then, take a snapshot of that
> via AWS. of course, this is ok.when they -arent- having outages and issues
> ...
> On Apr 28, 2011 9:54 PM, "William Ob
Even with N-nodes for redundancy, I still want to have backups. I'm an
amazon person, so naturally I'm thinking S3. Reading over the docs, and
messing with nodeutil, it looks like each new snapshot contains the previous
snapshot as a subset (and I've read how cassandra uses hard links to avoid
ex
I've figured this out, but to help those out there who don't want to waste
an hour like me debugging a hung "nodetool ring" command: JMX opens a
second random port, so you either have to disable any firewalls between the
machine running nodetool and the cassandra instance (or there are
complicated
> vaguely remember Ellis saying it's not a good idea to switch
> NetworkTopologyStrategy ...
>
> On Wed, Apr 27, 2011 at 3:29 PM, William Oberman
> wrote:
> > Thanks Sasha. Fortunately/unfortunately I did realize the default &
> current
> > behavior of th
Route53 is already
in the works (to route EC2 traffic to the closest region).
will
On Wed, Apr 27, 2011 at 9:33 AM, William Oberman
wrote:
> I don't think of it as migrating an instance, it's more of a destroy/start
> with EC2. But, I still think it would be very useful to spin up
the ring, do your work, bootstrap
> it back to the ring .. i think this could be avoided if cassandra
> maintained hostname references and not just IP references for nodes.
>
> -sasha
>
> On Wed, Apr 27, 2011 at 2:56 PM, William Oberman
> wrote:
> > While I haven't
> We leverage cassandra instances in APAC, US & Europe ... so it's
> important for us to know that we have one data center in each 'region'
> and multiple racks per DC ...
>
> -sasha
>
> On Wed, Apr 27, 2011 at 3:06 PM, William Oberman
> wrote:
> &
1 - 100 of 129 matches
Mail list logo