Newbie question 1/3

2012-01-08 Thread John DeTreville
(An earlier post seems not to have gone through. My apologies in the eventual 
case of a duplicate.)

I'm thinking of using Riak to replace a large Oracle system, and I'm trying to 
understand its guarantees. I have a few introductory questions; this is the 
first of three.

I'm trying to understand the reliability of stored data. Imagine (for example) 
that I have 5 Riak hosts, and an n_val of 3. Imagine that each host is down 1% 
of the time (I bought the disks at a flood sale), and imagine that host 
failures are uncorrelated, and imagine that when hosts come back up, they stay 
up long enough to fully rejoin the service, and imagine that I haven't done any 
writes for a long while.

Given these assumptions, I might naïvely assume that my data are available with 
a probability of about 99.999%, or down about 5 minutes a year. This would be 
great (perhaps). Of course, this ignores the possibility that some of my data 
may not be replicated at all, perhaps even with all three copies on the same 
host. If all I know is that some data may not be replicated, then all I know is 
that (some of) my data may be unavailable as much as 3.65 days a year, which 
would not be nearly as great. I understand things probably won't be this bad, 
but "probably" isn't a probability.

Is this right? Is there anything I can do to guarantee higher reliability, 
short of setting n_val to 5?

Cheers,
John
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Newbie question 2/3

2012-01-08 Thread John DeTreville
(An earlier post seems not to have gone through. My apologies in the eventual 
case of a duplicate.)

I'm thinking of using Riak to replace a large Oracle system, and I'm trying to 
understand its guarantees. I have a few introductory questions; this is the 
second of three.

Imagine I do a write, and the write fails because it could not contact enough 
hosts. Am I right to imagine that the write may actually have persisted, and 
that the data might later be available for reading? Am I also right to imagine 
that the data, once read, might later vanish due to host failure, because it 
was persisted to fewer hosts than expected?

Cheers,
John
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Newbie question 3/3

2012-01-08 Thread John DeTreville
(An earlier post seems not to have gone through. My apologies in the eventual 
case of a duplicate.)

I'm thinking of using Riak to replace a large Oracle system, and I'm trying to 
understand its guarantees. I have a few introductory questions; this is the 
third of three.

I would like to do two updates atomically, but of course I cannot. I imagine I 
could construct my own redo log, and perform a sequence of operations something 
like:

   write redo log entry (timestamp, A's update, B's update) to redo log
   update A
   update B
   delete redo log entry from redo log

Asynchronously, I could read dangling entries from the redo log and repeat 
them, deleting them upon success. (Let's imagine for simplicity that the 
updates are idempotent and commutative.) This seems doable, but it's not 
pretty. Is this the best I can do? Or should I think about the problem 
differently?

(BTW, I believe that secondary indexes won't help me.)

Cheers,
John
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Newbie question 2/3

2012-01-09 Thread John DeTreville
Thanks for the reply, which confirms what I expected.

Let me explain why I asked. I have an application that my intuition says would 
be a good match to Riak, but I don't trust my intuition since I've never used 
Riak and I'm not sure I understand all of its failure modes. One thing I'm 
trying is to work through a mental model-checking exercise—which I might 
eventually turn over to a real model checker—which is making me wonder about 
all the things that can go wrong. A failed write that is visible anyway, either 
permanently or just for a while, is just one example.

In the long run, it would be great if Riak were documented perfectly and 
completely—and other other piece of software in the world too!—but in the 
meanwhile I'm just trying to build my own mental model. I'd prefer, of course, 
a mental model that does not depend on a detailed knowledge of Riak's internal 
workings, enumberating only the preconditions and postconditions of each 
operation. We'll see how far I can get

Cheers,
John

On Jan 9, 2012, at 2:38 PM, John DeTreville wrote:

> Thanks you very much for your reply. Longer response to follow.
> 
> Cheers,
> John
> 
> On Jan 9, 2012, at 2:33 PM, Ryan Zezeski wrote:
> 
>> John,
>> 
>> To your first question, yes, it is possible that the client may receive a 
>> failure response from Riak but the data could have persisted on some of the 
>> nodes.  This is because a single write to Riak is actually N writes to N 
>> different partitions inside of Riak.  These N writes are not atomic in 
>> relation to each other.
>> 
>> As for your second question, it depends on what happens between the time of 
>> the "failed" write and the time the node(s) with the replicas go down.  If 
>> some form of anti-entropy is employed before the node failure then the 
>> replicas should have been repaired and N copies should exist.  Riak's main 
>> form of anti-entropy is read repair that occurs at read time (we also have a 
>> form of active anti-entropy between Riak clusters in our enterprise 
>> offering).  If the object is read before node failure then read-repair will 
>> occur and repair all N replicas.
>> 
>> An example might help.  If N=3/W=2 and two partitions fail to write then the 
>> overall request will fail but the remaining W is successful.  If you perform 
>> a read after this "failed" write then you may or may not see the new value 
>> depending on the R value and which partitions respond to the coordinator 
>> first.  However, regardless what is returned by that read the coordinator 
>> will stay alive a while longer in an attempt to perform read-repair.  If 
>> read-repair is successful then you should have N copies and it will be like 
>> the write failure never occurred.  If you hadn't performed that read and the 
>> replicas hadn't been repaired and the node containing the only replica went 
>> down and you did a read then you would get the old value or a not_found 
>> (depending on if a value existed for that key before the write).
>> 
>> -Ryan
>> 
>> 
>> On Mon, Jan 9, 2012 at 12:32 AM, John DeTreville  wrote:
>> (An earlier post seems not to have gone through. My apologies in the 
>> eventual case of a duplicate.)
>> 
>> I'm thinking of using Riak to replace a large Oracle system, and I'm trying 
>> to understand its guarantees. I have a few introductory questions; this is 
>> the second of three.
>> 
>> Imagine I do a write, and the write fails because it could not contact 
>> enough hosts. Am I right to imagine that the write may actually have 
>> persisted, and that the data might later be available for reading? Am I also 
>> right to imagine that the data, once read, might later vanish due to host 
>> failure, because it was persisted to fewer hosts than expected?
>> 
>> Cheers,
>> John
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Newbie question 3/3

2012-01-09 Thread John DeTreville
Right, I certainly don't want distributed transactions, which I agree would 
destroy availability. (I should add that my system is geographically 
distributed, making everything much worse.)

Still, that leaves open the question of doing what my application needs without 
transactions. Let's consider two situations involving updates.

The first situation is when I can reduce an update to a single write, such as 
by using Riak's secondary indexes. Unfortunately, I don't have a great 
understanding of the performance of secondary indexes, and I don't have a great 
understanding of their failure modes. Can you offer any guidance?

The second situation is when I really need to do multiple writes, in which case 
I must model (some subset of) transactional semantics at the application level. 
One example is implementing my own redo log, as mentioned earlier. Have other 
users ever had such problems? What are the good ways to solve them? Heck, what 
are the bad ways (just so I'll know what to avoid)?

Cheers,
John

On Jan 9, 2012, at 2:54 PM, Ryan Zezeski wrote:

> John,
> 
> As you already seem to understand, Riak doesn't provide a way to make 
> multiple ops atomic.  Part of the reason is because Riak's main focus thus 
> far has been availability.  Distributed transactions would work, but at the 
> cost of availability.  I think a flaw with the redo log approach is that you 
> need to serialize all operations to A & B through _one_ client to keep from 
> reading an inconsistent state.
> 
> A much simpler option, if you can bend your data, is to combine A and B into 
> one object.
> 
> -Ryan
> 
> On Mon, Jan 9, 2012 at 12:33 AM, John DeTreville  wrote:
> (An earlier post seems not to have gone through. My apologies in the eventual 
> case of a duplicate.)
> 
> I'm thinking of using Riak to replace a large Oracle system, and I'm trying 
> to understand its guarantees. I have a few introductory questions; this is 
> the third of three.
> 
> I would like to do two updates atomically, but of course I cannot. I imagine 
> I could construct my own redo log, and perform a sequence of operations 
> something like:
> 
>   write redo log entry (timestamp, A's update, B's update) to redo log
>   update A
>   update B
>   delete redo log entry from redo log
> 
> Asynchronously, I could read dangling entries from the redo log and repeat 
> them, deleting them upon success. (Let's imagine for simplicity that the 
> updates are idempotent and commutative.) This seems doable, but it's not 
> pretty. Is this the best I can do? Or should I think about the problem 
> differently?
> 
> (BTW, I believe that secondary indexes won't help me.)
> 
> Cheers,
> John
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Newbie question 2/3

2012-01-10 Thread John DeTreville
Let me elaborate a tiny bit more.

Consider the write(2) syscall on Unix and likealooks. If it succeeds, it 
returns the number of bytes written. If it fails, it returns -1. One must 
sometimes learn the hard way that some bytes may have been written even in the 
case of failure, but that there is no way to know how many. Interesting!

I'm just trying to accelerate my learning process with Riak.

Cheers,
John

On Jan 9, 2012, at 3:25 PM, John DeTreville wrote:

> Thanks for the reply, which confirms what I expected.
> 
> Let me explain why I asked. I have an application that my intuition says 
> would be a good match to Riak, but I don't trust my intuition since I've 
> never used Riak and I'm not sure I understand all of its failure modes. One 
> thing I'm trying is to work through a mental model-checking exercise—which I 
> might eventually turn over to a real model checker—which is making me wonder 
> about all the things that can go wrong. A failed write that is visible 
> anyway, either permanently or just for a while, is just one example.
> 
> In the long run, it would be great if Riak were documented perfectly and 
> completely—and other other piece of software in the world too!—but in the 
> meanwhile I'm just trying to build my own mental model. I'd prefer, of 
> course, a mental model that does not depend on a detailed knowledge of Riak's 
> internal workings, enumberating only the preconditions and postconditions of 
> each operation. We'll see how far I can get
> 
> Cheers,
> John
> 
> On Jan 9, 2012, at 2:38 PM, John DeTreville wrote:
> 
>> Thanks you very much for your reply. Longer response to follow.
>> 
>> Cheers,
>> John
>> 
>> On Jan 9, 2012, at 2:33 PM, Ryan Zezeski wrote:
>> 
>>> John,
>>> 
>>> To your first question, yes, it is possible that the client may receive a 
>>> failure response from Riak but the data could have persisted on some of the 
>>> nodes.  This is because a single write to Riak is actually N writes to N 
>>> different partitions inside of Riak.  These N writes are not atomic in 
>>> relation to each other.
>>> 
>>> As for your second question, it depends on what happens between the time of 
>>> the "failed" write and the time the node(s) with the replicas go down.  If 
>>> some form of anti-entropy is employed before the node failure then the 
>>> replicas should have been repaired and N copies should exist.  Riak's main 
>>> form of anti-entropy is read repair that occurs at read time (we also have 
>>> a form of active anti-entropy between Riak clusters in our enterprise 
>>> offering).  If the object is read before node failure then read-repair will 
>>> occur and repair all N replicas.
>>> 
>>> An example might help.  If N=3/W=2 and two partitions fail to write then 
>>> the overall request will fail but the remaining W is successful.  If you 
>>> perform a read after this "failed" write then you may or may not see the 
>>> new value depending on the R value and which partitions respond to the 
>>> coordinator first.  However, regardless what is returned by that read the 
>>> coordinator will stay alive a while longer in an attempt to perform 
>>> read-repair.  If read-repair is successful then you should have N copies 
>>> and it will be like the write failure never occurred.  If you hadn't 
>>> performed that read and the replicas hadn't been repaired and the node 
>>> containing the only replica went down and you did a read then you would get 
>>> the old value or a not_found (depending on if a value existed for that key 
>>> before the write).
>>> 
>>> -Ryan
>>> 
>>> 
>>> On Mon, Jan 9, 2012 at 12:32 AM, John DeTreville  
>>> wrote:
>>> (An earlier post seems not to have gone through. My apologies in the 
>>> eventual case of a duplicate.)
>>> 
>>> I'm thinking of using Riak to replace a large Oracle system, and I'm trying 
>>> to understand its guarantees. I have a few introductory questions; this is 
>>> the second of three.
>>> 
>>> Imagine I do a write, and the write fails because it could not contact 
>>> enough hosts. Am I right to imagine that the write may actually have 
>>> persisted, and that the data might later be available for reading? Am I 
>>> also right to imagine that the data, once read, might later vanish due to 
>>> host failure, because it was persisted to fewer hosts than expected?
>>> 
>>> Cheers,
>>> John
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>> 
> 

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Newbie question 1/3

2012-01-10 Thread John DeTreville
Excellent answer; thank you.

I imagine the unavailability I see will depend strongly on the speed of read 
repairs. Since I have quite a lot of data, I imagine that they might be quite 
slow, but I probably can't say more than that without real measurements.

A related question. You say that if my n_val is 3, some data may reside only on 
2 physical nodes. Ignoring failures, might some of it of reside on just one 
node?

Cheers,
John
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Newbie question 1/3

2012-01-10 Thread John DeTreville
That's good to know; thanks. I imagine I may have to vary my physical node 
count as time goes by, and I wondering how much planning ahead that might take.

Going by your example, if my n_val is 4 and an object hashes to partition 6, 
then my object will be stored only on two physical nodes, right? In my system 
(as in many), some objects are much more important than others, although I 
unfortunately don't know which are which until after the fact. Having two 
server failures in rapid succession is not all that uncommon, so I might have 
to use an n_val of 5 to guarantee storage on 3 physical nodes, right?

Cheers,
John
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Non-standard ring size

2012-01-19 Thread John DeTreville
That's a pity, as, if the number of physical hosts is not also a power of two, 
some number of records will reside on fewer than n_val physical hosts.

Cheers,
John

On Jan 19, 2012, at 8:22 AM, Sean Cribbs  wrote:

> The ring size must be a power of two because it must evenly divide 2^160 (the 
> size of our consistent hashing space), which is not divisible by 3. Using a 
> non-power-of-two ring size will have unknown or unpredictable effects.


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Delta updates using pre-commit hooks

2012-01-19 Thread John DeTreville
I for one would find the availability of CRDTs to be very interesting. Good 
luck!

Cheers,
John

On Jan 18, 2012, at 5:36 AM, Marek Zawirski  wrote:

> thanks for your answer with a bunch of useful info. I should have introduced 
> ourselves better given your replies. In fact we authored the paper on CRDTs 
> that Ryan mentioned and we continue to work in this area. I am putting some 
> relevant people in CC. What we try now is to experiment with CRDTs on top of 
> Riak.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


OutOfMemoryError in Java client

2012-02-14 Thread John DeTreville
I have a simple single-threaded Java client for Riak that consistently runs out 
of memory creating threads.

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:658)
at 
java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
at 
java.util.concurrent.ThreadPoolExecutor.prestartCoreThread(ThreadPoolExecutor.java:1381)
at 
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:222)
at 
java.util.concurrent.ScheduledThreadPoolExecutor.scheduleWithFixedDelay(ScheduledThreadPoolExecutor.java:443)
at 
com.basho.riak.pbc.RiakConnectionPool.doStart(RiakConnectionPool.java:232)
at 
com.basho.riak.pbc.RiakConnectionPool.access$100(RiakConnectionPool.java:41)
at 
com.basho.riak.pbc.RiakConnectionPool$State$1.start(RiakConnectionPool.java:58)
at 
com.basho.riak.pbc.RiakConnectionPool.start(RiakConnectionPool.java:227)
at com.basho.riak.pbc.RiakClient.(RiakClient.java:90)
at com.basho.riak.pbc.RiakClient.(RiakClient.java:81)
at 
com.basho.riak.client.raw.pbc.PBClientAdapter.(PBClientAdapter.java:91)
at com.basho.riak.client.RiakFactory.pbcClient(RiakFactory.java:107)

The client is a JUnit test for some data structures I'm storing in Riak. When I 
run it, my Java client process starts about 2028 native threads before it 
collapses.

This JUnit test creates a moderately large number of IRiakClient objects, but 
only one at a time. It does not close them, as there is no method for doing so.

This happens with Riak 1.0.2 and with Riak 1.1.0RC2. As I've said, the client 
is single-theaded.

Any ideas?

Cheers,
John

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: OutOfMemoryError in Java client

2012-02-14 Thread John DeTreville
Excellent! I had imagined it was something like this, but it's nice to see it 
in confirmed.

My real code is not so profligate with IRiakClient objects, of course, but it 
was a surprise to see this pop up in JUnit tests.

Thanks very much for the quick answer.

Cheers,
John

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com