The cron script doesn't do much. It pulls new IPNs (usually only 1 in
a given 5 minute period), inserts a row, and then sends an email.
As for failure handling in the script itself, I rely on python
exception handling, and whenever an exception occurs I do get an email
with the exception details.
The problem with naive last write wins is that writes don't always
arrive at each replica in the same order. So no, that's a
non-starter.
Vector clocks are a series of (client id, clock) entries, and usually
a timestamp so you can prune old entries. Obviously implementations
can vary, but to pic
I have a few questions which I can't seem to find answers to...
I know that the memory overhead of timestamps is 8 bytes per row/column.
What is the memory overhead of vector clocks?
Is it possible (at least in theory) to run without timestamps on your
values? I'm fine with last writer wins se
> Is there any chance that the entire file from source node got streamed to
> destination node even though only small amount of data in hte file from
> source node is supposed to be streamed destination node?
Yes, but the thing that's annoying me is that even if so - you should
not be seeing a 40
I'm on 0.8.4
I have removed a dead node from the cluster using nodetool removetoken command,
and moved one of the remaining nodes to rebalance the tokens. Everything looks
fine when I run nodetool ring now, as it only lists the remaining 2 nodes and
they both look fine, owning 50% of the token
Symptom is that when we populate data into the non-prod cluster, after a while,
we start seeing this warning message from the prod cluster:
"WARN [GossipStage:1] 2011-08-19 19:47:35,730 GossipDigestSynVerbHandler.java
(line 63) ClusterName mismatch from non-prod-node-ip
non-prod-Cluster!=prod-C
I think this is what you want:
https://github.com/stuhood/cassandra/tree/file-format-and-promotion
On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller
wrote:
>> https://issues.apache.org/jira/browse/CASSANDRA-674
>> But when I downloaded the patch file I can't find the correct trunk to
>> patch...
>
>
> To confirm - are you saying the data directory size is huge, but the
> live size as reported by nodetool ring and nodetool info does NOT
> reflect this inflated size?
>
That's correct.
> What files *do* you have in the data directory? Any left-over *tmp*
> files for example?
>
>
The files th
> There were few Compacted files. I thought that might have been the cause,
> but it wasn't it. We have a CF that is 23GB, and while repair is running,
> there are multiple instances of that CF created along with other CFs.
To confirm - are you saying the data directory size is huge, but the
liv
ok I will go with the IP change strategy and keep you posted. Not going to
manually copy any data, just bring up the node and let it bootstrap.
Thanks
On Fri, Aug 19, 2011 at 11:46 AM, Peter Schuller <
peter.schul...@infidyne.com> wrote:
> > (Yes, this should definitely be easier. Maybe the most
> (Yes, this should definitely be easier. Maybe the most generally
> useful fix would be for Cassandra to support a node joining the wring
> in "write-only" mode. This would be useful in other cases, such as
> when you're trying to temporarily off-load a node by dissabling
> gossip).
I knew I had
> From what I understand, Peter's recommendation should work for you. They
> have both worked for me. No need to copy anything by hand on the new node.
> Bootstrap/repair does that for you. From the Wiki:
Right - it's just that the complication comes from the fact that he's
using the same machine,
> I am running read/write at quorum. At this point I have turned off my
> clients from talking to this node. So if that is the case I can potentially
> just nodetool repair (without changing IP). But would it be better if I
No, other nodes in the cluster will still be sending reads to the node.
>
There were few Compacted files. I thought that might have been the cause,
but it wasn't it. We have a CF that is 23GB, and while repair is running,
there are multiple instances of that CF created along with other CFs.
I checked the stream directory across cluster of four nodes, but it was
empty.
> Somewhere I remember discussions about issues with the merkle tree range
> splitting or some such that resulted in repair always thinking a little bit
> of data was out of sync.
https://issues.apache.org/jira/browse/CASSANDRA-2324 - fixed for early 0.8.
I don't *think* there's a know open bug t
Hi -
From what I understand, Peter's recommendation should work for you. They
have both worked for me. No need to copy anything by hand on the new node.
Bootstrap/repair does that for you. From the Wiki:
If a node goes down entirely, then you have two options:
(Recommended approach) Bring
> I've know run 7 repairs in a row on this keyspace and every single one has
> finished successfully but performed streams between all nodes. This keyspace
> was written to over the course of several weeks, sometimes with
How much data is streamed, do you know? Mainly interesting is if there
is a
I wasn't clear on that. What I mean was would scrub putting data in at
state that might have caused the repair consume a lot of disk space?
On Thu, Aug 18, 2011 at 6:44 PM, aaron morton wrote:
> No scrub is a local operation only.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassa
Let me be specific on lost data -> lost a replica , the other 2 nodes have
replicas
I am running read/write at quorum. At this point I have turned off my
clients from talking to this node. So if that is the case I can potentially
just nodetool repair (without changing IP). But would it be better
> Is it possible for instance that sometimes your cron job takes longer
> than five minutes?
Or just a lack of failure handling in the cron job for that matter.
Are you *SURE* the the "processed" flag truly got set? Do you have a
log statement (written *AFTER* successful write to Cassandra) that
i
> https://issues.apache.org/jira/browse/CASSANDRA-674
> But when I downloaded the patch file I can't find the correct trunk to
> patch...
Check it out from git (or svn) and apply to trunk. I'm not sure
whether it still applies cleanly; given the size of the patch I
wouldn't be surprised if some re
> After upgrading to cass 0.8.4 from cass 0.6.11. I ran scrub. That worked
> fine. Then I ran nodetool repair on one of the nodes. The disk usage on
> data directory increased from 40GB to 480GB, and it's still growing.
If you check your data directory, does it contain a lot of
"*Compacted" fi
> ok, so we just lost the data on that node. are building the raid on it, but
> once it is up what is the best way to bring it back in the cluster
You're saying the raid failed and data is gone?
> just let it come up and run nodetool repair
> copy data from another node and then run nodetool repa
> Is it normal that the repair takes 4+ hours for every node, with only about
> 10G data? If this is not expected, do we have any hint what could be causing
> this?
It does not seem entirely crazy, depending on the nature of your data
and how CPU-intensive it is "per byte" to compact.
Assuming
ok, so we just lost the data on that node. are building the raid on it, but
once it is up what is the best way to bring it back in the cluster
- just let it come up and run nodetool repair
- copy data from another node and then run nodetool repair,
- do I still need to run repair imme
> The compactions ettings do not affect repair. (Thinking out loud, or does it
> ? Validation compactions and table builds.)
It does.
--
/ Peter Schuller (@scode on twitter)
(a) this really isn't the right forum to review patches; I've pointed
out the relevant jira ticket
(b) ignoring unavailable ranges is a misfeature, imo
On Fri, Aug 19, 2011 at 8:11 AM, Patrik Modesto
wrote:
> Is there really no interest in the patch?
>
> P.
>
> On Thu, Aug 18, 2011 at 08:54, Pat
Hello All
I have let a node run for a period of 2 hours, untouched, with something
like
10 Column families, and just 30 columns in total.
I see a memory trend that is continually increasing. There are no operations
against that node.
I started the node at 14:05, at 15:05 I did a manual GC.
the log file shows as follows, not sure what does 'Couldn't find cfId=1000'
means(google just returned useless results):
INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453) Found
table data in data directories. Consider using JMX to call
org.apache.cassandra.service.StorageServ
Is there really no interest in the patch?
P.
On Thu, Aug 18, 2011 at 08:54, Patrik Modesto wrote:
> On Wed, Aug 17, 2011 at 17:08, Jonathan Ellis wrote:
>> See https://issues.apache.org/jira/browse/CASSANDRA-2388
>
> Ok, thanks for the JIRA ticker. I've found that very same problem
> during my
Hi,
we were using apache-cassandra-2011-06-28_08-04-46.jar so far in
production and wanted to upgrade to 0.8.4.
Our cluster was well balanced and we only saved keys with a lower case
md5 prefix. (Orderpreserving partitioner).
Each node owned 20% of the tokens, which was also displayed on each
nod
Nice one thanks.
We're now up to 500k a second on one box which is pretty good (well good
enough until our data grows 5 fold). So maybe (un)durable_writes may speed
us up some more!!
Cheers,
Paul.
On Thu, Aug 18, 2011 at 11:40 PM, aaron morton wrote:
> couple of thoughts, 400 row mutations in
32 matches
Mail list logo