Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

2016-06-21 Thread Bruno Bonacci




>
> Another thing I've noticed is that you are using (System/currentTimeMillis
> ) to get the wall clock on every generation.
>
> (System/currentTimeMillis) causes a low level system call which in turn 
> causes a context switch.
>
> Maybe one way to improve could be use a initial (System/currentTimeMillis) 
> on the first init! and then
> use System/nanoTime to calculate the time elapsed from the init.
> The advantage would be that System/nanoTime runs in the UserSpace (not 
> Kernel Space) and it doesn't require
> a system call (so no context switch).
>
> This could really help the case of a bulk production of IDs and any other 
> burst situation.
>
>
> I really like this idea. I’m certainly open to pull requests if you wanted 
> to take a stab at it otherwise I may try my hand at making this 
> improvement. :)
>

Hi this change it is actually easier than it sounds. Looking at the code, I 
came across a couple of things which I think might be better.

1) use of filesystem persistence.

Not too sure that the file based persistence is a good idea. Maybe this is 
a good idiomatic design for Erlang, but definitely it doesn't look nice in 
Clojure.
 
In particular I'm not too sure that by storing the init time epoc we 
actually accomplish anything at all.
I would argue that there are a number of problems there, race conditions on 
data, tmp file purged out, and still doesn't protect against the case the 
clock drift during the use.

2) use of CAS (atom) for storing the VM state.
If if is truly decentralized then you shouldn't need an atom at all. The 
disadvantage of the CAS is that, when many thread race to the same change, 
only one will succeed and all the other ones will fail and retry. Which 
mean that if you have 100 threads (for example) only 1 will succeed all the 
other 99 will fail and retry. Again at the second round only 1 will succeed 
and 98 will retry, and so on.
Therefore the total number of attempts will be 



If you want to develop a real "*decentralized*" id generator, I think, you 
need to drop the atom in favour of a thread local store.
Now to do so and make collision impossible we need to add more bits:


   - 64 bits - ts (i.e. a timestamp )
   - 48 bits - worker-id/node (i.e. MAC address)
   - 32 bits - worker-id/process (pid) 
   - 64 bits - worker-id/thread (thread num)
   - 32 bits - seq-no (i.e. a counter)
   
By adding the process id (pid) and the thread id there is possibility of 
having two systems running and creating the same id at the same time.
Finally by using thread-local storage there is no need of process level 
coordination (atom) and no risk of retries because every process is 
stepping on each others toes.

With such setup 100 threads will be able to increment their own thread 
local counter independently (given that you have 100 execution cores).

What do you think?
Bruno

 

>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

2016-06-21 Thread Bruno Bonacci
Sorry, it looks like images are only visible in the google groups

https://groups.google.com/forum/#!topic/clojure/fRYCowf6VUg

Bruno

On Tue, Jun 21, 2016 at 1:38 PM, Bruno Bonacci 
wrote:

>
> 
>
>>
>> Another thing I've noticed is that you are using (
>> System/currentTimeMillis) to get the wall clock on every generation.
>>
>> (System/currentTimeMillis) causes a low level system call which in turn
>> causes a context switch.
>>
>> Maybe one way to improve could be use a initial (System/currentTimeMillis)
>> on the first init! and then
>> use System/nanoTime to calculate the time elapsed from the init.
>> The advantage would be that System/nanoTime runs in the UserSpace (not
>> Kernel Space) and it doesn't require
>> a system call (so no context switch).
>>
>> This could really help the case of a bulk production of IDs and any other
>> burst situation.
>>
>>
>> I really like this idea. I’m certainly open to pull requests if you
>> wanted to take a stab at it otherwise I may try my hand at making this
>> improvement. :)
>>
>
> Hi this change it is actually easier than it sounds. Looking at the code,
> I came across a couple of things which I think might be better.
>
> 1) use of filesystem persistence.
>
> Not too sure that the file based persistence is a good idea. Maybe this is
> a good idiomatic design for Erlang, but definitely it doesn't look nice in
> Clojure.
>
> In particular I'm not too sure that by storing the init time epoc we
> actually accomplish anything at all.
> I would argue that there are a number of problems there, race conditions
> on data, tmp file purged out, and still doesn't protect against the case
> the clock drift during the use.
>
> 2) use of CAS (atom) for storing the VM state.
> If if is truly decentralized then you shouldn't need an atom at all. The
> disadvantage of the CAS is that, when many thread race to the same change,
> only one will succeed and all the other ones will fail and retry. Which
> mean that if you have 100 threads (for example) only 1 will succeed all the
> other 99 will fail and retry. Again at the second round only 1 will succeed
> and 98 will retry, and so on.
> Therefore the total number of attempts will be
>
>
> 
>
> If you want to develop a real "*decentralized*" id generator, I think,
> you need to drop the atom in favour of a thread local store.
> Now to do so and make collision impossible we need to add more bits:
>
>
>- 64 bits - ts (i.e. a timestamp )
>- 48 bits - worker-id/node (i.e. MAC address)
>- 32 bits - worker-id/process (pid)
>- 64 bits - worker-id/thread (thread num)
>- 32 bits - seq-no (i.e. a counter)
>
> By adding the process id (pid) and the thread id there is possibility of
> having two systems running and creating the same id at the same time.
> Finally by using thread-local storage there is no need of process level
> coordination (atom) and no risk of retries because every process is
> stepping on each others toes.
>
> With such setup 100 threads will be able to increment their own thread
> local counter independently (given that you have 100 execution cores).
>
> What do you think?
> Bruno
>
>
>
>>
>> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "Clojure" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/clojure/fRYCowf6VUg/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.g

tips on writing modern idiomatic code

2016-06-21 Thread Sergey Didenko
Hi,

What would you advise for writing-rewriting your Clojure code in MODERN
idiomatic way?

Using Kibit?

Pasting your code samples on some review site?

Asking help in IRC channel?

Asking here?

Reading some noticeable open source projects?

Reading some new Clojure book?

I ask about the latest Clojure specifically.

I have not given very focused attention to Clojure since version 1.4 and
would like to grasp the WHOLE PICTURE of "good" modern Clojure. Currently
it feels like a lot of latest knowledge is located in different pieces all
over the internet. Or may be I just don't know where to look.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: tips on writing modern idiomatic code

2016-06-21 Thread Leon Grapenthin
https://github.com/bbatsov/clojure-style-guide is a good place to start. 

On Tuesday, June 21, 2016 at 2:46:22 PM UTC+2, Sergey Didenko wrote:
>
> Hi,
>
> What would you advise for writing-rewriting your Clojure code in MODERN 
> idiomatic way?
>
> Using Kibit?
>
> Pasting your code samples on some review site?
>
> Asking help in IRC channel?
>
> Asking here?
>
> Reading some noticeable open source projects? 
>
> Reading some new Clojure book?
>
> I ask about the latest Clojure specifically. 
>
> I have not given very focused attention to Clojure since version 1.4 and 
> would like to grasp the WHOLE PICTURE of "good" modern Clojure. Currently 
> it feels like a lot of latest knowledge is located in different pieces all 
> over the internet. Or may be I just don't know where to look.
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

2016-06-21 Thread Brian Platz
Bruno,

I think the more you can reduce the chance of collision the better and the 
thread-local capability is a good idea, but in the process you've almost 
doubled the bits.

For me anyhow, an ID need to be produceable at a reasonable rate (1 million 
a second per machine is good for me), have near-zero probability of 
collision and take up the least amount of space possible.

Under those criteria, I think 128 bits is a reasonable target and the 
thread-safe atom I would expect to handle such volume (although I haven't 
tested).

If you need a billion per second and don't want 100 machines producing 
them, then I think you are at the point of needing to have thread 
independence and probably have to increase the bit-count, and your ideas 
provide a good path towards such a solution.

Your comment on the file persistence is a good one, I wonder if the 
potential problems are real enough to warrant the risks.

My other curiosity is if System/nanoTime is guaranteed to increment across 
threads. I know at least a while ago that this guarantee did not exist.

-Brian


On Tuesday, June 21, 2016 at 8:38:58 AM UTC-4, Bruno Bonacci wrote:
>
>
> Hi this change it is actually easier than it sounds. Looking at the code, 
> I came across a couple of things which I think might be better.
>
> 1) use of filesystem persistence.
>
> Not too sure that the file based persistence is a good idea. Maybe this is 
> a good idiomatic design for Erlang, but definitely it doesn't look nice in 
> Clojure.
>  
> In particular I'm not too sure that by storing the init time epoc we 
> actually accomplish anything at all.
> I would argue that there are a number of problems there, race conditions 
> on data, tmp file purged out, and still doesn't protect against the case 
> the clock drift during the use.
>
> 2) use of CAS (atom) for storing the VM state.
> If if is truly decentralized then you shouldn't need an atom at all. The 
> disadvantage of the CAS is that, when many thread race to the same change, 
> only one will succeed and all the other ones will fail and retry. Which 
> mean that if you have 100 threads (for example) only 1 will succeed all the 
> other 99 will fail and retry. Again at the second round only 1 will succeed 
> and 98 will retry, and so on.
> Therefore the total number of attempts will be 
>
>
> 
>
> If you want to develop a real "*decentralized*" id generator, I think, 
> you need to drop the atom in favour of a thread local store.
> Now to do so and make collision impossible we need to add more bits:
>
>
>- 64 bits - ts (i.e. a timestamp )
>- 48 bits - worker-id/node (i.e. MAC address)
>- 32 bits - worker-id/process (pid) 
>- 64 bits - worker-id/thread (thread num)
>- 32 bits - seq-no (i.e. a counter)
>
> By adding the process id (pid) and the thread id there is possibility of 
> having two systems running and creating the same id at the same time.
> Finally by using thread-local storage there is no need of process level 
> coordination (atom) and no risk of retries because every process is 
> stepping on each others toes.
>
> With such setup 100 threads will be able to increment their own thread 
> local counter independently (given that you have 100 execution cores).
>
> What do you think?
> Bruno
>
>  
>
>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: tips on writing modern idiomatic code

2016-06-21 Thread Sean Corfield
On 6/21/16, 5:46 AM, "Sergey Didenko"  wrote:
> What would you advise for writing-rewriting your Clojure code in MODERN 
> idiomatic way?

It’s a good question and I get the impression that a) it’s constantly evolving 
as we all gain more experience building large systems with Clojure(Script) and 
b) different people / companies have inherently different approaches to what is 
“modern and idiomatic”.

Leon suggested the clojure-style-guide and that’s good for the basics but I 
don’t think it really goes deep enough into idiom “in the large” (and I doubt 
it can, because of the two points above). It’s the basis for our in-house 
coding guidelines, which add company-specific guidelines too, as they pertain 
to our code base.

I’d also recommend Zach Tellman’s work-in-progress Elements of Clojure:

http://elementsofclojure.com

I think this is an interesting blog post to read, for idiom:

https://rasterize.io/blog/clojure-the-good-parts.html

I’m not sure I agree with all the details of it but mostly it seems like 
reasonable advice (and we opened a number of tickets at work to review where 
we’re out of line with it).

> Using Kibit?

Haven’t used it so I can’t comment. We used Eastwood for a while but found too 
many “false positives” so it slowly fell out of favor at work.

> Pasting your code samples on some review site?

Maybe, but I personally don’t know of good sites for that. I’ll be interested 
to hear recommendations.

> Asking help in IRC channel?

Or the clojurians.net Slack channel. That has a great #beginners channel with 
lots of helpful people, and the main #clojure channel also fields a lot of 
style / review questions for more advanced stuff.

> Asking here?

Folks sometimes post links to code and ask for feedback so that seems like a 
good avenue too.

> Reading some noticeable open source projects?

This question comes up fairly regularly but I don’t recall there being much 
consensus on what constitutes good, idiomatic, modern Clojure in OSS projects – 
I think that’s partly because many OSS projects are libraries that have to 
either do something gnarly (and non-idiomatic) or have to pander to performance 
concerns (in non-idiomatic ways)?

> Reading some new Clojure book?

Read all of them! ☺ Books take a while to write and usually by the time they 
come out, Clojure has moved on a bit. I’ve tended to stick with recommending 
Clojure Programming (Emerick, Carper, Grand; O’Reilly) even tho’ it’s now four 
years old and was written for Clojure 1.3 (but tested against 1.4) but there 
are certainly several more modern books that target more recent versions of 
Clojure.

As an example of the problems that authors face, Clojure in Action (Ed 1) was 
hampered by covering Clojure 1.2 yet wasn’t released until after Clojure 1.3 
was released (with massive changes in contrib that made it hard to follow all 
the examples in the book). Ed 2 appeared late last year and targets Clojure 1.6 
but we’re already in the alpha builds of Clojure 1.9. The authors discussed the 
book in the Java Ranch forums and were sad that they hadn’t been able to 
include transducers from Clojure 1.7 (which also brought us reader 
conditionals).

Last year also saw the release of Clojure Applied and Living Clojure. Of the 
former, the authors say:

“While we believe the style and forms we describe are in wide use, it’s 
difficult to say what is and isn’t idiomatic, especially among members of an 
innovative and opinionated community.”

I think that really sums up the issue.

Since 1.4, we’ve had:

• Reducers
• Reader literal enhancements
• New threading macros
• EDN reader
• Destructuring with namespaced keys
• “some” functions
• Transducers
• Reader conditionals
• More string functions
• Socket server/REPL
• clojure.spec and a raft of new predicate functions

…as well as many other minor enhancements along the way. Some of those have 
_definitely_ changed the way we write code at World Singles to varying degrees 
(primarily the threading macros, EDN reader, “some” functions, and string 
functions).

We have written one transducer – but haven’t yet adopted the built-in 
transducers wholesale.

We’re reviewing clojure.spec (after having invested quite a bit of time in the 
past working with both Schema and core.typed).

We recently switched from Leiningen to Boot (which had a massive impact on our 
build/test processes and how we think about tooling).

We also recently adopted Stuart Sierra’s Component library and are working 
toward his “Reloaded” workflow (legacy global state prevents full adoption – 
and we’re working aggressively to eliminate this).

We’re just starting back down the path of a serious investigation of core.async 
(after using it a year or two back for a proof of concept Clojure(Script) 
application with Reagent and Sente).

I think the big shifts for us, in terms of idiom, over the next year will be:

• More transducer usage, maybe more reducer usage too (parallel fold)
• Namespaced keywords and clojur

clojure.string unexpected behaviors

2016-06-21 Thread Elena Machkasova
Greetings,

I was looking at clojure.string functions, and noticed that some have 
unexpected (especially for less experienced programmers) behavior on 
non-string arguments. For instance, 'capitalize' applies toString to its 
argument, effectively making it possible to pass any type, but with 
unexpected results. Here are some examples that may be really confusing to 
novices, especially since it's not immediately obvious that the argument is 
returned as a string when it's printed back:

(str/capitalize [\a \B \c]) ; returns "[\a \b \c]"
(str/capitalize (char-array "aBc")) ; returns the address, as a string

Interestingly, 'reverse' doesn't allow non-string arguments since it uses a 
StringBuilder, and not toString, to create a string, and there are a few 
other clojure.string functions that behave like 'reverse' in this regard. As 
a minimum, this is inconsistent with 'capitalize'. 

As a separate issue, blank? returns 'true' when passed 'false' (since the 
check is for false, not specifically for nil), but (blank? true) is a type 
error. 

It is fairly easy for experienced programmers to understand what's going on 
by reading the source code, but none of these behaviors are documented, and 
would confuse beginners.  

Is there anything that I am overlooking in these design decisions, or 
should this implementation be changed? 

Thanks!

Elena 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: clojure.string unexpected behaviors

2016-06-21 Thread Alex Miller
There are some comments at the top of clojure.string 
(http://clojure.github.io/clojure/#clojure.string) about expected usage. In 
particular, you should expect all clojure.string functions to accept 
CharSequence (a parent interface of String, StringBuffer, and StringBuilder 
and return the same (usually a concrete String). All other input types are 
not supported. The calls to toString() are generally a way to convert any 
of the CharSequence impls to a concrete String so they can be further acted 
upon. The results of calling any of these functions with something other 
than a CharSequence are unspecified.

I expect that specs on clojure.string functions will provide an additional 
layer of checking around this in the future.


On Tuesday, June 21, 2016 at 1:45:39 PM UTC-5, Elena Machkasova wrote:
>
> Greetings,
>
> I was looking at clojure.string functions, and noticed that some have 
> unexpected (especially for less experienced programmers) behavior on 
> non-string arguments. For instance, 'capitalize' applies toString to its 
> argument, effectively making it possible to pass any type, but with 
> unexpected results. Here are some examples that may be really confusing to 
> novices, especially since it's not immediately obvious that the argument is 
> returned as a string when it's printed back:
>
> (str/capitalize [\a \B \c]) ; returns "[\a \b \c]"
> (str/capitalize (char-array "aBc")) ; returns the address, as a string
>
> Interestingly, 'reverse' doesn't allow non-string arguments since it uses 
> a StringBuilder, and not toString, to create a string, and there are a few 
> other clojure.string functions that behave like 'reverse' in this regard. As 
> a minimum, this is inconsistent with 'capitalize'. 
>
> As a separate issue, blank? returns 'true' when passed 'false' (since the 
> check is for false, not specifically for nil), but (blank? true) is a type 
> error. 
>
> It is fairly easy for experienced programmers to understand what's going 
> on by reading the source code, but none of these behaviors are documented, 
> and would confuse beginners.  
>
> Is there anything that I am overlooking in these design decisions, or 
> should this implementation be changed? 
>
> Thanks!
>
> Elena 
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: clojure.string unexpected behaviors

2016-06-21 Thread Sean Corfield
Alex gave you a correct (but fairly short) answer. I’d like to expand on it a 
bit, partly in light of a certain recent blog post, partly because of a 
personal “hot button”…

This is going to be along the same lines as clojure.set functions producing 
“garbage” output if you give them “garbage” input (not sets). This is the 
classic computer science “undefined” behavior that we see in many other 
languages, where the behavior is only defined for the specific types of inputs.

I would _love_ all the clojure.string functions to be defined for nil as an 
input – since we have (str nil) => “” – but adding that nil check, or even 
calling str directly in clojure.string functions, adds quite an overhead for 
all uses (so that proposal would never be accepted).

Looking at the clojure.string namespace docstring, it is explicit about nil 
arguments:

“passing nil will result in a NullPointerException unless documented otherwise.”

This makes me sad ☺ It’s almost my only source of NPEs in my Clojure code and I 
pretty much always want the empty string behavior in case of nil. Oh well.

The docstring also has this to say about argument types:

“When a function is documented to accept a string argument, it will take any 
implementation of the correct *interface* on the host platform. In Java, this 
is CharSequence, which is more general than String. In ordinary usage you will 
almost always pass concrete strings.”

That explains why it calls toString() – to convert CharSequence (or any of its 
implementations) to String. Unfortunately, much as with clojure.set functions, 
if you pass any argument that is not an implementation of CharSequence but 
happens to support toString() – which is nearly anything – then you get 
“garbage” out because you passed “garbage” in.

I think everyone passes a non-string value to a clojure.string function at 
least once and then scratches their head at the bizarre result (which often 
pops up a long way down the chain of string manipulation and therefore some 
distance from the bug).

It’s the price we pay for improved performance in the “correct” use cases ☹

As for blank? Yes, that seems like the docstring needs correcting since it 
returns “True if s is falsey (nil or false), empty, or contains only 
whitespace.”

Sean Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/

"If you're not annoying somebody, you're not really alive."
-- Margaret Atwood


On 6/21/16, 11:24 AM, "Elena Machkasova"  wrote:

Greetings,

I was looking at clojure.string functions, and noticed that some have 
unexpected (especially for less experienced programmers) behavior on non-string 
arguments. For instance, 'capitalize' applies toString to its argument, 
effectively making it possible to pass any type, but with unexpected results. 
Here are some examples that may be really confusing to novices, especially 
since it's not immediately obvious that the argument is returned as a string 
when it's printed back:

(str/capitalize [\a \B \c]) ; returns "[\a \b \c]"
(str/capitalize (char-array "aBc")) ; returns the address, as a string

Interestingly, 'reverse' doesn't allow non-string arguments since it uses a 
StringBuilder, and not toString, to create a string, and there are a few other 
clojure.string functions that behave like 'reverse' in this regard. As a 
minimum, this is inconsistent with 'capitalize'. 

As a separate issue, blank? returns 'true' when passed 'false' (since the check 
is for false, not specifically for nil), but (blank? true) is a type error. 

It is fairly easy for experienced programmers to understand what's going on by 
reading the source code, but none of these behaviors are documented, and would 
confuse beginners.  

Is there anything that I am overlooking in these design decisions, or should 
this implementation be changed? 

Thanks!

Elena 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message beca

Re: clojure.string unexpected behaviors

2016-06-21 Thread Alex Miller
On Tuesday, June 21, 2016 at 2:45:45 PM UTC-5, Sean Corfield wrote:
>
> Alex gave you a correct (but fairly short) answer. I’d like to expand on 
> it a bit, partly in light of a certain recent blog post, partly because of 
> a personal “hot button”… 
>
> This is going to be along the same lines as clojure.set functions 
> producing “garbage” output if you give them “garbage” input (not sets). 
> This is the classic computer science “undefined” behavior that we see in 
> many other languages, where the behavior is only defined for the specific 
> types of inputs. 
>

I'm not a fan of the word "garbage" in this case or "garbage in / garbage 
out". I think that implies a value judgement about both input and output 
that is incorrect here.

"specified"/"unspecified" is much better. The clojure.string functions 
specify the behavior for a class of inputs and leave other cases 
unspecified. Things that are unspecified may still be useful (but you 
should not rely on that behavior as it is not specified) and things that 
are unspecified now may become specified in the future.

 

> I would _love_ all the clojure.string functions to be defined for nil as 
> an input – since we have (str nil) => “” – but adding that nil check, or 
> even calling str directly in clojure.string functions, adds quite an 
> overhead for all uses (so that proposal would never be accepted). 
>
> Looking at the clojure.string namespace docstring, it is explicit about 
> nil arguments: 
>
> “passing nil will result in a NullPointerException unless documented 
> otherwise.” 
>
> This makes me sad ☺ It’s almost my only source of NPEs in my Clojure code 
> and I pretty much always want the empty string behavior in case of nil. Oh 
> well. 
>
> The docstring also has this to say about argument types: 
>
> “When a function is documented to accept a string argument, it will take 
> any implementation of the correct *interface* on the host platform. In 
> Java, this is CharSequence, which is more general than String. In ordinary 
> usage you will almost always pass concrete strings.” 
>
> That explains why it calls toString() – to convert CharSequence (or any of 
> its implementations) to String. Unfortunately, much as with clojure.set 
> functions, if you pass any argument that is not an implementation of 
> CharSequence but happens to support toString() – which is nearly anything – 
> then you get “garbage” out because you passed “garbage” in. 
>
> I think everyone passes a non-string value to a clojure.string function at 
> least once and then scratches their head at the bizarre result (which often 
> pops up a long way down the chain of string manipulation and therefore some 
> distance from the bug). 
>
> It’s the price we pay for improved performance in the “correct” use cases 
> ☹ 
>

Performance is not the only consideration here (or maybe even the prime 
one). A nil is not a string and should be distinguishable from an empty 
string in many cases.
 

> As for blank? Yes, that seems like the docstring needs correcting since it 
> returns “True if s is falsey (nil or false), empty, or contains only 
> whitespace.” 
>

I do not think this needs updating. blank? follows the rules of 
clojure.string you stated above (other than it's stated extension to also 
cover nil). In other words: (blank? false) is unspecified because you have 
passed a boolean, not a CharSequence. I would spec this as something like

(s/fdef clojure.string/blank?
  :args (s/cat :s (s/nilable #(instance? CharSequence %)))
  :ret boolean?)

 

>
> Sean Corfield -- (904) 302-SEAN 
> An Architect's View -- http://corfield.org/ 
>
> "If you're not annoying somebody, you're not really alive." 
> -- Margaret Atwood 
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: clojure.string unexpected behaviors

2016-06-21 Thread Sean Corfield
On 6/21/16, 1:28 PM, "Alex Miller"  wrote:
> I'm not a fan of the word "garbage" in this case or "garbage in / garbage 
> out".

For me there’s no judgment involved in the “GIGO” principle but fair enough.

> "specified"/"unspecified" is much better.

It’s inaccurate. Unspecified behavior is where you get well-defined behavior 
but the language does not specify what particular well-defined behavior you get 
(and the implementation is not required to document it). Undefined behavior is 
where you may get _any_ behavior (and, again, the implementation is not 
required to document it). There is also implementation-defined behavior where 
the language does not specify the behavior but the implementation _is_ required 
to document it (and it represents correct code).

This is undefined behavior.

(Sorry, that’s what nearly a decade of ANSI Standards Committee work does to 
someone!)

> A nil is not a string and should be distinguishable from an empty string in 
> many cases.

We disagree on the degree of punning here ☺ I’m not asking for the change. I 
understand why it is the way it is (I just don’t like it ☺).
 
> blank? follows the rules of clojure.string you stated above (other than it's 
> stated extension to also cover nil).

Good point. Yes, I’m persuaded.

Sean Corfield -- (904) 302-SEAN 
An Architect's View -- http://corfield.org/ 

"If you're not annoying somebody, you're not really alive." 
-- Margaret Atwood 




-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


New issue of Clojure Gazette is out

2016-06-21 Thread Alan Thompson
If you haven't seen the Clojure Gazette, please give it a try!

http://us4.campaign-archive2.com/?u=a33b5228d1b5bf2e0c68a83f4&id=70c69d167d&e=c39662b4e4

Alan

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

2016-06-21 Thread Max Countryman
Brian,

I think you make good points here, especially with regard to the size of IDs.

I’d also like to point out that while adding the process and thread IDs helps, 
it doesn’t eliminate the possibility of duplicate IDs: this is why it’s 
necessary to write out the last used timestamp in a separate thread.

Just a clarification with regard to disk persistence: we aren’t writing out the 
epoch, we’re writing out the last used timestamp periodically, in its own 
thread. Yes, the `init!` API is cumbersome, but it’s an important safety valve 
which helps protect against duplicate IDs.

My understanding from reading the documentation and various StackOverflow 
answers is that System/nanoTime is monotonic, but I don’t know what guarantees 
it makes across threads.


Max


> On Jun 21, 2016, at 10:00, Brian Platz  wrote:
> 
> Bruno,
> 
> I think the more you can reduce the chance of collision the better and the 
> thread-local capability is a good idea, but in the process you've almost 
> doubled the bits.
> 
> For me anyhow, an ID need to be produceable at a reasonable rate (1 million a 
> second per machine is good for me), have near-zero probability of collision 
> and take up the least amount of space possible.
> 
> Under those criteria, I think 128 bits is a reasonable target and the 
> thread-safe atom I would expect to handle such volume (although I haven't 
> tested).
> 
> If you need a billion per second and don't want 100 machines producing them, 
> then I think you are at the point of needing to have thread independence and 
> probably have to increase the bit-count, and your ideas provide a good path 
> towards such a solution.
> 
> Your comment on the file persistence is a good one, I wonder if the potential 
> problems are real enough to warrant the risks.
> 
> My other curiosity is if System/nanoTime is guaranteed to increment across 
> threads. I know at least a while ago that this guarantee did not exist.
> 
> -Brian
> 
> 
> On Tuesday, June 21, 2016 at 8:38:58 AM UTC-4, Bruno Bonacci wrote:
> 
> Hi this change it is actually easier than it sounds. Looking at the code, I 
> came across a couple of things which I think might be better.
> 
> 1) use of filesystem persistence.
> 
> Not too sure that the file based persistence is a good idea. Maybe this is a 
> good idiomatic design for Erlang, but definitely it doesn't look nice in 
> Clojure.
>  
> In particular I'm not too sure that by storing the init time epoc we actually 
> accomplish anything at all.
> I would argue that there are a number of problems there, race conditions on 
> data, tmp file purged out, and still doesn't protect against the case the 
> clock drift during the use.
> 
> 2) use of CAS (atom) for storing the VM state.
> If if is truly decentralized then you shouldn't need an atom at all. The 
> disadvantage of the CAS is that, when many thread race to the same change, 
> only one will succeed and all the other ones will fail and retry. Which mean 
> that if you have 100 threads (for example) only 1 will succeed all the other 
> 99 will fail and retry. Again at the second round only 1 will succeed and 98 
> will retry, and so on.
> Therefore the total number of attempts will be 
> 
>  
> 
> 
> If you want to develop a real "decentralized" id generator, I think, you need 
> to drop the atom in favour of a thread local store.
> Now to do so and make collision impossible we need to add more bits:
> 
> 64 bits - ts (i.e. a timestamp )
> 48 bits - worker-id/node (i.e. MAC address)
> 32 bits - worker-id/process (pid) 
> 64 bits - worker-id/thread (thread num)
> 32 bits - seq-no (i.e. a counter)
> By adding the process id (pid) and the thread id there is possibility of 
> having two systems running and creating the same id at the same time.
> Finally by using thread-local storage there is no need of process level 
> coordination (atom) and no risk of retries because every process is stepping 
> on each others toes.
> 
> With such setup 100 threads will be able to increment their own thread local 
> counter independently (given that you have 100 execution cores).
> 
> What do you think?
> Bruno
> 
>  
> 
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 

Re: [ANN] Flake 0.4.0: Decentralized, k-ordered unique ID generator

2016-06-21 Thread Max Countryman
I also released Flake 0.4.2 today which includes an important bugfix where two 
competing threads could have caused duplicate IDs in certain circumstances as 
well as a new method for deriving timestamps.


> On Jun 21, 2016, at 16:29, Max Countryman  wrote:
> 
> Brian,
> 
> I think you make good points here, especially with regard to the size of IDs.
> 
> I’d also like to point out that while adding the process and thread IDs 
> helps, it doesn’t eliminate the possibility of duplicate IDs: this is why 
> it’s necessary to write out the last used timestamp in a separate thread.
> 
> Just a clarification with regard to disk persistence: we aren’t writing out 
> the epoch, we’re writing out the last used timestamp periodically, in its own 
> thread. Yes, the `init!` API is cumbersome, but it’s an important safety 
> valve which helps protect against duplicate IDs.
> 
> My understanding from reading the documentation and various StackOverflow 
> answers is that System/nanoTime is monotonic, but I don’t know what 
> guarantees it makes across threads.
> 
> 
> Max
> 
> 
>> On Jun 21, 2016, at 10:00, Brian Platz > > wrote:
>> 
>> Bruno,
>> 
>> I think the more you can reduce the chance of collision the better and the 
>> thread-local capability is a good idea, but in the process you've almost 
>> doubled the bits.
>> 
>> For me anyhow, an ID need to be produceable at a reasonable rate (1 million 
>> a second per machine is good for me), have near-zero probability of 
>> collision and take up the least amount of space possible.
>> 
>> Under those criteria, I think 128 bits is a reasonable target and the 
>> thread-safe atom I would expect to handle such volume (although I haven't 
>> tested).
>> 
>> If you need a billion per second and don't want 100 machines producing them, 
>> then I think you are at the point of needing to have thread independence and 
>> probably have to increase the bit-count, and your ideas provide a good path 
>> towards such a solution.
>> 
>> Your comment on the file persistence is a good one, I wonder if the 
>> potential problems are real enough to warrant the risks.
>> 
>> My other curiosity is if System/nanoTime is guaranteed to increment across 
>> threads. I know at least a while ago that this guarantee did not exist.
>> 
>> -Brian
>> 
>> 
>> On Tuesday, June 21, 2016 at 8:38:58 AM UTC-4, Bruno Bonacci wrote:
>> 
>> Hi this change it is actually easier than it sounds. Looking at the code, I 
>> came across a couple of things which I think might be better.
>> 
>> 1) use of filesystem persistence.
>> 
>> Not too sure that the file based persistence is a good idea. Maybe this is a 
>> good idiomatic design for Erlang, but definitely it doesn't look nice in 
>> Clojure.
>>  
>> In particular I'm not too sure that by storing the init time epoc we 
>> actually accomplish anything at all.
>> I would argue that there are a number of problems there, race conditions on 
>> data, tmp file purged out, and still doesn't protect against the case the 
>> clock drift during the use.
>> 
>> 2) use of CAS (atom) for storing the VM state.
>> If if is truly decentralized then you shouldn't need an atom at all. The 
>> disadvantage of the CAS is that, when many thread race to the same change, 
>> only one will succeed and all the other ones will fail and retry. Which mean 
>> that if you have 100 threads (for example) only 1 will succeed all the other 
>> 99 will fail and retry. Again at the second round only 1 will succeed and 98 
>> will retry, and so on.
>> Therefore the total number of attempts will be 
>> 
>>  
>> 
>> 
>> If you want to develop a real "decentralized" id generator, I think, you 
>> need to drop the atom in favour of a thread local store.
>> Now to do so and make collision impossible we need to add more bits:
>> 
>> 64 bits - ts (i.e. a timestamp )
>> 48 bits - worker-id/node (i.e. MAC address)
>> 32 bits - worker-id/process (pid) 
>> 64 bits - worker-id/thread (thread num)
>> 32 bits - seq-no (i.e. a counter)
>> By adding the process id (pid) and the thread id there is possibility of 
>> having two systems running and creating the same id at the same time.
>> Finally by using thread-local storage there is no need of process level 
>> coordination (atom) and no risk of retries because every process is stepping 
>> on each others toes.
>> 
>> With such setup 100 threads will be able to increment their own thread local 
>> counter independently (given that you have 100 execution cores).
>> 
>> What do you think?
>> Bruno
>> 
>>  
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com 
>> 
>> Note that posts from new membe