It sounds like you understood perfectly.

Basically we are running a cluster of machines that are busy doing
lots of stuff.  We wanted to use Riak to keep configuration
information about those machines and the stuff they were doing.  So
Riak would be running on machines whose primary job is something else.
 A critical use case for us is to figure out what needs to be done on
which other machines after one of the machines goes down.  Therefore
having the potential to have our data unavailable during a failover
because of the failover kills the benefit that we wanted from a high
availability system.

We've chosen to go with the simple approach of a relational database
on external hardware in a high availability setup.  We didn't want
that dependency, but we've done enough now that we're committed to it.

On Thu, Jun 9, 2011 at 7:33 PM, Ryan Zezeski <rzeze...@basho.com> wrote:
> Ben,
> I hate non-obvious behavior too, and it's something we constantly try to
> fight at Basho.  That said, I don't think Riak is in as bad a position as
> you think.  Lets see if I can convince you :)
> If I'm understanding you correctly you are making two points here:
> 1) When performing a join/leave under load most GETs return 404 until data
> transfer has completed.
> 2) A node in the cluster has failed and that is causing data to become
> unavailable.
> Assuming these are indeed your claims I counter...
> 1) Yes, performing a join/leave **can** cause reads to return 404s.  Just
> ask Greg Nelson and he can tell you all about it.  However, I want to
> emphasize the **can** qualifier here.  It depends on the # of nodes you are
> going from->to.  The reason this matters is b/c this number will affect how
> the claim algorithm behaves and how much data actually shifts around.
> Now I can hear you saying "Yea, but that's still brittle/broken!"  Yes, I
> agree 100% with the words I just put in your mouth.  My point is simply that
> there are shades of grey here and depending on how many nodes you have you
> might never hit this case (note that 3-5 nodes **will** hit this case).  We
> are actively working on a solution to this problem as we recognize it's
> seriousness and very much want to see it fixed.
> 2) This should absolutely not be happening.  This is Riak's bread and butter
> use case, i.e. high availability.  My guess is I'm misunderstanding what you
> are saying.
> -Ryan
>
>
>
> On Thu, Jun 9, 2011 at 8:00 PM, Ben Tilly <bti...@gmail.com> wrote:
>>
>> I am not a developer advocate.  But my top hate is that when machines
>> leave/rejoin your data can be inaccessable for some time.
>>
>> We had a great case where we wanted to use Riak, but that was a
>> complete showstopper and we won't be using it because of that.  (We
>> wanted to store information which needed to be read in the event of a
>> machine failing.  But the machine that could fail would be on the same
>> cluster that was running Riak, so we'd be potentially trying to do
>> reads exactly when data was unavailable.)
>>
>> On Thu, Jun 9, 2011 at 10:25 AM, Srdjan Pejic <spe...@gmail.com> wrote:
>> > What do you guys hate about Riak right now?
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users@lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>> >
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to