Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

Bruno Bonacci Wed, 22 Jun 2016 04:26:22 -0700

Hi,

Yes adding more bits to the id is certainly undesirable, however it isn't
uncommon. For example Linked just published a paper on Ambry a distributed
datastore which uses a total of *40 bytes* to identify a blob (8 bytes for
the partition + 32 bytes for UUID, see:
http://dprg.cs.uiuc.edu/docs/SIGMOD2016-a/ambry.pdf)

To answer Brian on the "potential" problem of the clock drift I would
recommend to have a look to
https://aphyr.com/posts/299-the-trouble-with-timestamps. Beside the
hardware problems you have to account for things like ntpd daemon which is
meant to synchronize clocks.
To keep them in sync it accelerates or decelerates the clock speed on your
machine, however if it falls too much behind it will do a hard-reset,
so what you might see is that your System/currentTimeMillis calls jump back
and forward (non-monotonic).
With VMs (such as cloud environment) this problem is way worse and more
frequent.

The process ID in many platform is just 16bits, however some platform have
32bits (
http://unix.stackexchange.com/questions/16883/what-is-the-maximum-value-of-the-pid-of-a-process),
and the ThreadID in java is a long (
https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#getId())
which is unfortunate. It would be nice if there was a way to read which CPU
core physical thread is executing a particular thread (when the thread is
running), i such way you could replace the processId and the threadId which
just a CPU core ID. The number of physical threads (core + hyperthreads)
is typically much smaller and it fits in a 16 bits. However to my knowledge
there is no way in java to retrieve such value.

The other consideration to make is that even if running your benchmarks you
can get over 1 million IDs per second on a single thread,
the required coordination of the CAS and the retry formula sent earlier
will greatly reduce the performances as you scale the number of threads.
This falls under the Amdahl's law (
https://en.wikipedia.org/wiki/Amdahl%27s_law) and more precisely under the
USL Universal Scalability Law (
http://www.perfdynamics.com/Manifesto/USLscalability.html and
https://www.infoq.com/articles/top-10-performance-mistakes) for which as
you increase the number of processor the performance hits a plateau
(Amdahl) or even degrades (USL). If you run these tests I'm pretty sure
you'll see similar effects.

As such is impossible to generate TOTALLY unique IDs without coordination,
the real question is that what is the price of the coordination for a given
risk of collisions.

I guess each project should decide whether it is more important the
monotonicity or the performance.

@Max
In your latest version there is a subtle issue here:
https://github.com/maxcountryman/flake/blob/master/src/flake/utils.clj#L43
and
https://github.com/maxcountryman/flake/blob/master/src/flake/utils.clj#L56.
System/nanoTime has NO RELATION with the wall clock and i must not be used
in such way.
System/nanoTime can have also negative values (this is platform dependent)
it is guaranteed to be monotonic but not in respect of
System/currentTimeMillis. The change i was suggesting is like following:

  * at init! you initialize your base timestamp with "start-timestamp =
(System/currentTimeMillis) and in addition you create a new field called
"timer = (system/nanoTime)"
  * when you create a new flake instead of reading the new value of
System/currentTimeMillis you use a new call (system/nanoTime) and you check
the time elapsed since "timer" so new-timestap = (start-timestamp +
((system/nanoTime) - timer) / 1000000)

There is another potential issue with the generate! in case of clock hard
reset. If a reset happens exception will be thrown in tight loop
potentially causing a DOS.

Bruno

On Wed, Jun 22, 2016 at 12:29 AM, Max Countryman <m...@me.com> wrote:

> Brian,
>
> I think you make good points here, especially with regard to the size of
> IDs.
>
> I’d also like to point out that while adding the process and thread IDs
> helps, it doesn’t eliminate the possibility of duplicate IDs: this is why
> it’s necessary to write out the last used timestamp in a separate thread.
>
> Just a clarification with regard to disk persistence: we aren’t writing
> out the epoch, we’re writing out the last used timestamp periodically, in
> its own thread. Yes, the `init!` API is cumbersome, but it’s an important
> safety valve which helps protect against duplicate IDs.
>
> My understanding from reading the documentation and various StackOverflow
> answers is that System/nanoTime is monotonic, but I don’t know what
> guarantees it makes across threads.
>
>
> Max
>
>
> On Jun 21, 2016, at 10:00, Brian Platz <brian.platz@place.works> wrote:
>
> Bruno,
>
> I think the more you can reduce the chance of collision the better and the
> thread-local capability is a good idea, but in the process you've almost
> doubled the bits.
>
> For me anyhow, an ID need to be produceable at a reasonable rate (1
> million a second per machine is good for me), have near-zero probability of
> collision and take up the least amount of space possible.
>
> Under those criteria, I think 128 bits is a reasonable target and the
> thread-safe atom I would expect to handle such volume (although I haven't
> tested).
>
> If you need a billion per second and don't want 100 machines producing
> them, then I think you are at the point of needing to have thread
> independence and probably have to increase the bit-count, and your ideas
> provide a good path towards such a solution.
>
> Your comment on the file persistence is a good one, I wonder if the
> potential problems are real enough to warrant the risks.
>
> My other curiosity is if System/nanoTime is guaranteed to increment across
> threads. I know at least a while ago that this guarantee did not exist.
>
> -Brian
>
>
> On Tuesday, June 21, 2016 at 8:38:58 AM UTC-4, Bruno Bonacci wrote:
>>
>>
>> Hi this change it is actually easier than it sounds. Looking at the code,
>> I came across a couple of things which I think might be better.
>>
>> 1) use of filesystem persistence.
>>
>> Not too sure that the file based persistence is a good idea. Maybe this
>> is a good idiomatic design for Erlang, but definitely it doesn't look nice
>> in Clojure.
>>
>> In particular I'm not too sure that by storing the init time epoc we
>> actually accomplish anything at all.
>> I would argue that there are a number of problems there, race conditions
>> on data, tmp file purged out, and still doesn't protect against the case
>> the clock drift during the use.
>>
>> 2) use of CAS (atom) for storing the VM state.
>> If if is truly decentralized then you shouldn't need an atom at all. The
>> disadvantage of the CAS is that, when many thread race to the same change,
>> only one will succeed and all the other ones will fail and retry. Which
>> mean that if you have 100 threads (for example) only 1 will succeed all the
>> other 99 will fail and retry. Again at the second round only 1 will succeed
>> and 98 will retry, and so on.
>> Therefore the total number of attempts will be
>>
>>
>> <https://lh3.googleusercontent.com/-ZVELcKNoB9M/V2kxgYmlFMI/AAAAAAAAB8Q/nR6jLFjKSI0611-WiQpQHXAcY3SueVIdwCLcB/s1600/Screen%2BShot%2B2016-06-21%2Bat%2B13.21.24.png>
>>
>> If you want to develop a real "*decentralized*" id generator, I think,
>> you need to drop the atom in favour of a thread local store.
>> Now to do so and make collision impossible we need to add more bits:
>>
>>
>>    -     64 bits - ts (i.e. a timestamp )
>>    -     48 bits - worker-id/node (i.e. MAC address)
>>    -     32 bits - worker-id/process (pid)
>>    -     64 bits - worker-id/thread (thread num)
>>    -     32 bits - seq-no (i.e. a counter)
>>
>> By adding the process id (pid) and the thread id there is possibility of
>> having two systems running and creating the same id at the same time.
>> Finally by using thread-local storage there is no need of process level
>> coordination (atom) and no risk of retries because every process is
>> stepping on each others toes.
>>
>> With such setup 100 threads will be able to increment their own thread
>> local counter independently (given that you have 100 execution cores).
>>
>> What do you think?
>> Bruno
>>
>>
>>
>>>
>>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure/fRYCowf6VUg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

Reply via email to