On 6/24/2011 9:28 AM, Yang wrote:
without a clear description of your pseudo-code, it's difficult to say
whether it will work.
but I think it can work fine as an election/agreement protocol, which
you can use as a lock to some degree, but this requires
all the potential lock contenders to all participate, you can't grab a
lock before everyone has voiced their vote yet
I agree with this statement. I think the issue is that the timestamps
are generated by the clients and their clocks may not be in sync, so
write A from client A might arrive with timestamp T, and write B from
client B may reach the node later in time, however it may have an
earlier timestamp (T', where T' < T). Client A may perform a read
immediately after its write and notice that it was the only client to
request a lock -- so it will assume it has acquired the lock. After
Client B's lock request, it will perform a read and observe that it has
written the request with the earliest timestamp -- so it will also
assume it has acquired the lock, which would result in a failure of the
locking scheme. If each client is required to wait for all other
clients to "vote", then this issue goes away.
I'm interested in the objectives of this thread as well, as I was
contemplating something similar. Any other ideas? :)
Jim
. let's say the code is like
lock(UUID) {
write_column( "token_"+my_hostname()+"_"+UUID, getTimeStamp() );
for all possible node N :
block until ( token = read_column("token_" + N + "_" + UUID,
getTimeStamp()) ) != null
return "LOCKED" if my timestamp is the lowest among all nodes
}
you have to wait for all nodes to voice their timestamp because the
timestamp in cassandra is client-generated, a latter node can create a
column with an earlier
timestamp.
On Fri, Jun 24, 2011 at 11:33 AM, AJ <a...@dude.podzone.net
<mailto:a...@dude.podzone.net>> wrote:
Sorry, I know this is long-winded but I just want to make sure
before I go through the trouble to implement this since it's not
something that can be reliably tested and requires in-depth
knowledge about C* internals. But, this ultimately deals with
concurrency control so anyone interested in that subject may want
to try and answer this. Thanks!
I would like to know how to do a series of writes and reads such
that I can tell definitively what process out of many was the
first to create a unique flag column.
IOW, I would like to have multiple processes (clients) compete to
see who is first to write a token column. The tokens start with a
known prefix, such as "Token_" followed by the name of the process
that created it and a UUID so that all columns are guaranteed
unique and don't get overwritten. For example, Process A could
create:
Token_ProcA_<UUID>
and process B would create:
Token_ProcB_<UUID>
These writes/reads are asynchronous between the two or more
processes. After the two processes write their respective tokens,
each will read back all columns named "Token_*" that exist (a
slice). They each do this in order to find out who "won". The
process that wrote the column with the lowest timestamp wins. The
purpose is to implement a lock.
I think all that is required is for the processes to use QUORUM
read/writes to make sure the final read is consistent and will
assure each process that it can rely on what's returned from the
final read and that there isn't an earlier write floating around
somewhere. This is where the "happened-before" question comes in.
Is it possible that Process A which writes it's token with a
lower timestamp (and should be the winner), that this write may
not be seen by Process B when it does it's read (which is after
it's token write and after Process A wrote it's token), and thus
conclude incorrectly that itself (Process B) is the winner since
it will not see Process A's token? I'm 99% sure using QUORUM
read/writes will allow this to work because that's the whole
purpose, but I just wanted to double-check in case there's another
detail I'm forgetting about C* that would defeat this.
Thanks!
P.S. I realize this will cost me in performance, but this is only
meant to be used on occasion.