If I continuously read from the node that I am rebooting, the request made
to that node hangs until the client times out, subsequent requests receive a
"Failed to connect" error.
I am using curl for my tests.
Thanks,
Dan
Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
d...@basho.com
You may have mentioned which client you are using (the thread is deep
already) but I would think that this is a client implementation
problem. As in some sort of connection pooling thing. Try calling curl
from a sleep loop in a shell script and see what happens.
-Alexander
On Mon, Nov 29, 2010 at
Hm, that's curious. Are you rebooting the physical machine? When you
reboot one of the nodes, what happens to HTTP calls to that node? Do they
immediately error, or do they hang indefinitely?
In the meanwhile, I'll add some logging so I can see whether I'm timing out
on the writes as well, and
Hi Jay,
I'm not able to reproduce the behavior you are seeing. Here is what I am
doing to try to reproduce the issue:
1. Setup a 4 node cluster
2. Continuously write a new object to Riak every 0.5 second
3. Continuously read a known object (GET riak/test/1) from Riak every 0.5
second
4. Reboot one
Hey Dan/Sean,
Thanks for the response. sasl-error.log on node A is completely empty, and
I see this pattern in erlang.log:
= ALIVE Tue Nov 23 12:46:57 PST 2010
= Tue Nov 23 12:57:36 PST 2010
=ERROR REPORT 23-Nov-2010::12:57:36 ===
** Node 'riak@' not responding **
** Removing (time
On Tue, Nov 23, 2010 at 3:33 PM, Jay Adkisson wrote:
> (many profuse apologies to Dan - hit "reply" instead of "reply all")
> Alrighty, I've done a little more digging. When I throttle the writes
> heavily (2/sec) and set R and W to 1 all around, the cluster works just fine
> after I restart the
1) Riak detects node outage the same way any Erlang system does - when a
message fails to deliver, or the heartbeat maintained by epmd fails. The
default timeout in epmd is 1 minute, which is probably why you're seeing it
take 1 minute to be detected.
2) If it takes too long (the vnode is overl
Neville, I'm not sure how you mean. The network gear is all functional,
otherwise I wouldn't be able to interact with the machines at all (they're
at our colo). But as far as I understand, if I hard reboot a box (or, in a
real-world scenario, the pdu fails), the switch will happily continue
forwa
Just a thought ... have you verified your switch, cables, nics, etc
On 24 November 2010 09:33, Jay Adkisson wrote:
> (many profuse apologies to Dan - hit "reply" instead of "reply all")
>
> Alrighty, I've done a little more digging. When I throttle the writes
> heavily (2/sec) and set R and W t
(many profuse apologies to Dan - hit "reply" instead of "reply all")
Alrighty, I've done a little more digging. When I throttle the writes
heavily (2/sec) and set R and W to 1 all around, the cluster works just fine
after I restart the node for about 15-20 seconds. Then the read request
hangs fo
Hey Dan,
Thanks for the response! I tried it again while watching `riak-admin
status` - basically, it takes about 30 seconds of node C being down before
riak realizes it's gone. During that time, if I'm writing to the cluster at
all (I throttled it to 2 writes per second for testing), both write
Your HTTP calls should not being timing out. Are you sending requests
directly to the Riak node or are you using a load balancer? How much load
are you placing on node A? Is it a write only load or are there reads as
well? Can you confirm "all" requests time out or is it a large subset of the
reque
Hey all,
Here's what I'm seeing: I have four nodes A, B, C, and D. I'm loading lots
of data into node A, which is being distributed evenly across the nodes. If
I physically reboot node D, all my HTTP calls time out, and `riak-admin
ringready` complains that not all nodes are up. Is this intende
13 matches
Mail list logo