On Thursday, 17 December 2015 14:59:37 UTC, Herwig Hochleitner wrote:
>
> Am 17.12.2015 02:35 schrieb "Mikera" <mike.r.an...@gmail.com <javascript:>
> >:
>
> > What's the plan with Tuples more broadly?
>
> Speaking as a kibitzer to the process: Suppose somebody was to carry this 
> along, I'd like to see these points addressed:
>
> IIRC, the breaking factor to the proposal were slow-downs in real-world 
> programs, likely due to pollution of jvm's polymorphic inline caches. It 
> seems necessary to have a benchmark, exercising the data-structure part of 
> clojure.core with real-world degrees of polymorphism, replicating the 
> slow-downs, Rich saw for the proposal. When we have such a realistic basis, 
> to which we can amend expected best- and worst-cases, it's much easier to 
> have a conversion about expected benefits and draw-backs, performance wise.
>

I don't actually recall seeing any benchmarks showing slow-downs in 
real-world programs. Rich made an apparently unsubstantiated assertion that 
these exist but didn't provide his analysis (see CLJ-1517). 

On the other hand Zach ran some benchmarks on JSON decoding and found a 
roughly 2x speedup. That's a pretty big deal for code implementing JSON 
APIs (which is probably a reasonable example of real world, 
nested-data-structure heavy code).

Does anyone have any actual evidence of this supposed slowdown? i.e. is 
there a standard benchmark that is considered acceptable for general 
purpose / real world performance in Clojure applications? If so I'm happy 
to run it and figure out why any slowdown with Tuples is happening. My 
strong suspicion is that the following is true:
1) The Tuples generally provide a noticeable speedup (as demonstrated by 
the various micro-benchmarks)
2) There are a few hotspots where Tuples *don't* make sense because of PIC 
pressure / megamorphic call sites (repeated conj on vectors might be an 
example....). These cases can revealed by more macro-level benchmarking.
3) We should be able to identify these cases of 2) and revert to generating 
regular PersistentVectors (or switching to Transients....). In that case 
the Tuple patches may develop from being a debatable patch with some 
problematic trade-offs to a pretty clear all-round improvement (in both 
micro and macro benchmarks).

The key point regarding 3): code that is performance sensitive (certainly 
in core, maybe in some libs) should consider whether a Tuple is a good idea 
or not (for any given call-site). These may need addressing individually, 
but this is incremental to the inclusion of Tuples themselves. The 
performance comparison isn't as simple as "current vs. tuples patch", it 
should be "current vs. tuples patch + related downstream optimisation" 
because that is what you are going to see in the released version.

Also it should be remembered that JVMs are getting smarter (escape analysis 
allowing allocation of small objects on the stack etc.) and the Clojure 
compiler is also getting smarter (direct linking etc.). Tuples could 
potentially give further upside in these cases, so there is a broader 
context to be considered. My view is that the balance will shift more in 
favour of Tuples over time as the respective runtime components get smarter 
at taking advantage of type specialisation (happy to hear other views, of 
course).

 

> The second thing, bothering me about the proposal: To me (as a 
> non-authority on the matter), checking in generated files is borderline 
> unacceptable. I'd much rather see such classes generated as part of the 
> build process, e.g. by:
> - using ant or maven plugins to generate java source, or,
> - using macros to generate byte code as part of AOT compilation
>

I agree checking in generated files is a bad idea, that was why I actually 
created hand-coded variants of Zach's original Tuple code as part of 
CLJ-1517. My reasoning for this was as follows:
1) You do in fact want some hand-coded differences, e.g. making the 2-Tuple 
work as a MapEntry, having a single immutable instance of Tuple0 etc.). It 
is annoying to handle these special cases in a code generator
2) Class generation at compile time is fiddly and would complicate the 
build / development process (definitely not a good thing!)
3) It is simpler to maintain a small, fixed number of concrete Java source 
files than it is to maintain a code-generator for the same (which may be 
less lines of code, but has much higher conceptual overhead)
 

> ====
>
> So, while the second point certainly would make a proposal more appealing, 
> the first one is mandatory due diligence. I'm really glad, that cognitect 
> acted as a gate-keeper there and saved us from microbenchmark-hell. 
>

Really? I think this CLJ-1517 issue is an example of how *not* to do OSS 
development.
a) Substantial potential improvements (demonstrated with numerous 
benchmarks) sitting unresolved for well over a year with limited / very 
slow feedback
b) Motivated, skilled contributors initially being encouraged to work on 
this but find themselves getting ignored / annoyed with the process / 
confused by lack of communication (certainly myself and I suspect I also 
speak for Zach here) 
c) Rich commits his own patch, to the surprise of contributors. I provided 
some (admittedly imperfect, but hopefully directionally correct) evidence 
that Zach's approach is better. Rich's patch subsequently gets reverted, 
but we are just back to square one.
d) Lack of clarity on process / requirements for ultimately getting a patch 
accepted. What benchmark of "real world usage" is actually wanted? I've 
seen little / no communication on this despite multiple requests.

This is all meant as honest constructive criticism, I hope Cognitect can 
learn from it. If anyone from Cognitect wants more detailed feedback on how 
I think the process could be improved, happy to provide. To be clear I'm 
not angry about this, nor am I the kind of person to demand that my patches 
get accepted, I am just a little sad that my favourite language appears to 
be held back by the lack of a fully collaborative, open development process.

I also have a related philosophical point about the "burden of proof" for 
accepting patches that may cause regressions. For functional / API changes 
the right standard is "beyond reasonable doubt" because any regression is a 
breaking change to user code and therefore usually unacceptable. For 
performance-related patches the standard should be "on the balance of 
probabilities" because regressions in less common cases are acceptable 
providing the overall performance impact (for the average real world user) 
is expected to be positive.
 

> I'd really love to write some more, about my ideas and alternatives to 
> generating tuple arities 1-8, but I also think we ought to have that 
> benchmark before discussing this point any further.
>
> kind regards
>

Interested to hear your views Herwig  - it's always worth discussing ideas 
and alternatives, this can help inform the ultimate solution.  FWIW I think 
most of the wins for Tuples are for the very small arities (0-4), larger 
sizes than that are probably much more marginal in value.

I agree macro-level benchmarks would be great to inform the debate, but 
just to repeat my point d) above - different contributors asked multiple 
times what sort of real world benchmark would be considered informative but 
these requests seem to have been ignored so far. Would be great if the core 
team could provide some guidance here (Alex? Rich?)

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to