Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

Max Countryman Tue, 21 Jun 2016 16:30:07 -0700

Brian,

I think you make good points here, especially with regard to the size of IDs.


I’d also like to point out that while adding the process and thread IDs helps, 
it doesn’t eliminate the possibility of duplicate IDs: this is why it’s 
necessary to write out the last used timestamp in a separate thread.

Just a clarification with regard to disk persistence: we aren’t writing out the 
epoch, we’re writing out the last used timestamp periodically, in its own 
thread. Yes, the `init!` API is cumbersome, but it’s an important safety valve 
which helps protect against duplicate IDs.

My understanding from reading the documentation and various StackOverflow 
answers is that System/nanoTime is monotonic, but I don’t know what guarantees 
it makes across threads.


Max


> On Jun 21, 2016, at 10:00, Brian Platz <[email protected]> wrote:
> 
> Bruno,
> 
> I think the more you can reduce the chance of collision the better and the 
> thread-local capability is a good idea, but in the process you've almost 
> doubled the bits.
> 
> For me anyhow, an ID need to be produceable at a reasonable rate (1 million a 
> second per machine is good for me), have near-zero probability of collision 
> and take up the least amount of space possible.
> 
> Under those criteria, I think 128 bits is a reasonable target and the 
> thread-safe atom I would expect to handle such volume (although I haven't 
> tested).
> 
> If you need a billion per second and don't want 100 machines producing them, 
> then I think you are at the point of needing to have thread independence and 
> probably have to increase the bit-count, and your ideas provide a good path 
> towards such a solution.
> 
> Your comment on the file persistence is a good one, I wonder if the potential 
> problems are real enough to warrant the risks.
> 
> My other curiosity is if System/nanoTime is guaranteed to increment across 
> threads. I know at least a while ago that this guarantee did not exist.
> 
> -Brian
> 
> 
> On Tuesday, June 21, 2016 at 8:38:58 AM UTC-4, Bruno Bonacci wrote:
> 
> Hi this change it is actually easier than it sounds. Looking at the code, I 
> came across a couple of things which I think might be better.
> 
> 1) use of filesystem persistence.
> 
> Not too sure that the file based persistence is a good idea. Maybe this is a 
> good idiomatic design for Erlang, but definitely it doesn't look nice in 
> Clojure.
>  
> In particular I'm not too sure that by storing the init time epoc we actually 
> accomplish anything at all.
> I would argue that there are a number of problems there, race conditions on 
> data, tmp file purged out, and still doesn't protect against the case the 
> clock drift during the use.
> 
> 2) use of CAS (atom) for storing the VM state.
> If if is truly decentralized then you shouldn't need an atom at all. The 
> disadvantage of the CAS is that, when many thread race to the same change, 
> only one will succeed and all the other ones will fail and retry. Which mean 
> that if you have 100 threads (for example) only 1 will succeed all the other 
> 99 will fail and retry. Again at the second round only 1 will succeed and 98 
> will retry, and so on.
> Therefore the total number of attempts will be 
> 
>  
> <https://lh3.googleusercontent.com/-ZVELcKNoB9M/V2kxgYmlFMI/AAAAAAAAB8Q/nR6jLFjKSI0611-WiQpQHXAcY3SueVIdwCLcB/s1600/Screen%2BShot%2B2016-06-21%2Bat%2B13.21.24.png>
> 
> If you want to develop a real "decentralized" id generator, I think, you need 
> to drop the atom in favour of a thread local store.
> Now to do so and make collision impossible we need to add more bits:
> 
>     64 bits - ts (i.e. a timestamp )
>     48 bits - worker-id/node (i.e. MAC address)
>     32 bits - worker-id/process (pid) 
>     64 bits - worker-id/thread (thread num)
>     32 bits - seq-no (i.e. a counter)
> By adding the process id (pid) and the thread id there is possibility of 
> having two systems running and creating the same id at the same time.
> Finally by using thread-local storage there is no need of process level 
> coordination (atom) and no risk of retries because every process is stepping 
> on each others toes.
> 
> With such setup 100 threads will be able to increment their own thread local 
> counter independently (given that you have 100 execution cores).
> 
> What do you think?
> Bruno
> 
>  
> 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en 
> <http://groups.google.com/group/clojure?hl=en>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

Reply via email to