Re: [ANN] Clojure 1.7.0-RC1 now available

Alex Miller Tue, 26 May 2015 20:30:07 -0700

On Tuesday, May 26, 2015 at 8:45:25 PM UTC-5, Marshall Bockrath-Vandegrift 
wrote:
>
> Ugh -- looks like the iterator value re-use behavior for EnumMap entrySet 
> was "fixed" post Java 1.6 (my example was under Java 1.6, which I believe 
> is still a Clojure-supported version of Java?). 
>


Java 1.6 is still supported in Clojure 1.7 (but will likely be removed in 
the next release or two).
 

> I can throw together a synthetic example, but I think the people following 
> this thread get what's happening.  The point isn't whether this pattern is 
> a "good idea" or not (it certainly isn't) but whether existing Java APIs 
> people want to interop with use it (they certainly do).
>

The point I was getting at is really whether you should consider this to be 
broken with the old behavior too. 
 

> I presently depend on no less than 3 separate Java library APIs I 
> currently know for a fact depend on this behavior:
> - Hadoop ReduceContextImpl$ValueIterator
> - Mahout DenseVector$AllIterator/NonDefaultIterator
> - LensKit FastIterators
>

Can you point to code for "the original behavior allowed room to transform 
the mutated object into an object which *could* be safely cached in a 
'downstream' seq"? By what means does this transformation occur? It sounds 
to me like you are starting with an Iterator, creating a seq, then walking 
the seq exactly once, one element at a time, and producing a new 
transformed seq or other output. 

If you did reuse that IteratorSeq, all of the elements of the sequence 
would point to the same object which would be in the "last" state like the 
java 1.6 example you gave. Thus, the "caching" capability of the seq can't 
possibly be something you're using. And if that's true, then why are you 
paying the allocation and synchronization costs of making the seq at all? 
Why not just use the iterator directly, thus skipping all the extra 
allocation that these object-reusing high-performance iterators are working 
so hard to avoid in the first place? In 1.7, transducers would give you 
exactly the capability to walk the source iterator, apply a transducer 
version of your transformation, and output to a collection (via into), a 
value (via transduce), or a lazy sequence (via sequence). I think you would 
find this faster as well due to reduced allocation (possibly greatly 
reduced depending on the transformation).
 

> It is an option to explicitly construct `IteratorSeq` instances (I 
> actually had verified that approach this afternoon for the Hadoop API in 
> Parkour), but I'm not happy about it. That approach places a definite 
> burden on integration library maintainers to implement the change, and on 
> end-users to realize they need to upgrade for Clojure 1.7 compatibility. 
> The `Iterator` interface just fundamentally in no way guarantees that the 
> `next()` yielded values are functional-safe in the sense necessary to 
> support chunking. I understand the desire to increase performance, but I 
> don't think it's worth the potential silent and bewildering breakage in 
> interop.
>

It would be possible to restore the non-chunking IteratorSeq behavior but 
retain it for eduction and the other places where we first made the swap. 
I'm not quite ready yet to concede that that's necessary. It seems like the 
only constraints under which this usage makes sense is exactly the case 
where it's unnecessary and even harmful.

 

> On Tue, May 26, 2015 at 9:18 PM Alex Miller <a...@puredanger.com> wrote:
>
>> That's not what I see with 1.7.0-RC1 (or any of the betas). I tried with 
>> both Java 1.7.0_25 and 1.8.0-b132.
>>
>> user=> *clojure-version*
>> {:major 1, :minor 7, :incremental 0, :qualifier "RC1"}
>> user=> (->> (map vector (java.util.EnumSet/allOf 
>> java.util.concurrent.TimeUnit) (range)) (into {}) (java.util.EnumMap.) 
>> (.entrySet) (map str) (into []))
>> ["NANOSECONDS=0" "MICROSECONDS=1" "MILLISECONDS=2" "SECONDS=3" 
>> "MINUTES=4" "HOURS=5" "DAYS=6"]
>>
>> Re "implementing the not-uncommon Java pattern of mutating and 
>> re-yielding the same object on each `next()` invocation". I'm assuming that 
>> you're somehow expecting to traverse one seq node, then having an 
>> opportunity to mutate something (the source, the iterator, the return 
>> object) in between each new advancement of the seq node? That seems a) not 
>> common at all, b) a bad idea even in Java and c) dangerous even before this 
>> change. In either case you end up with a seq that points to a succession of 
>> the same repeated (mutable and mutating) object - this violates most 
>> expectations we as Clojure users have of sequences. Any sort of chunking 
>> (map, filter, etc) over the top of that seq would force realization up to 
>> 32 elements beyond the head causing the same issue. 
>>
>> The original one-at-a-time IteratorSeq is still there (for now) and you 
>> can still make one if you want via (clojure.lang.IteratorSeq/create iter) 
>> but I would consider it deprecated. I think a custom lazy-seq or a 
>> loop-recur would be a better way to handle this case, which in my opinion 
>> is highly unusual. That said, my ears are open if this is an issue for a 
>> large number of people.
>>
>>
>> On Tuesday, May 26, 2015 at 6:24:54 PM UTC-5, Marshall 
>> Bockrath-Vandegrift wrote:
>>>
>>> The difference is that the original behavior allowed room to transform 
>>> the mutated object into an object which *could* be safely cached in a 
>>> "downstream" seq, while the new behavior pumps the iterator through 32 
>>> mutations before user-level code has a chance to see it.  Contrived example 
>>> using the Java standard libary:
>>>
>>> Clojure 1.6.0:
>>> (->> (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit) 
>>> (range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
>>> #=> ["NANOSECONDS=0" "MICROSECONDS=1" "MILLISECONDS=2" "SECONDS=3" 
>>> "MINUTES=4" "HOURS=5" "DAYS=6"]
>>>
>>> Clojure 1.7.0-RC1:
>>> (->> (map vector (java.util.EnumSet/allOf java.util.concurrent.TimeUnit) 
>>> (range)) (into {}) (java.util.EnumMap.) (.entrySet) (map str) (into []))
>>> #=> ["DAYS=6" "DAYS=6" "DAYS=6" "DAYS=6" "DAYS=6" "DAYS=6" "DAYS=6"]
>>>
>>> IMHO the latter behavior demonstrates a mismatch where chunked seqs and 
>>> iterators are simple incompatible.
>>>
>>> On Tue, May 26, 2015 at 5:33 PM Alex Miller <a...@puredanger.com> wrote:
>>>
>>>> In what way is it broken? Both before and after wrapped a mutable 
>>>> iterator into a caching seq. The new one is different in that it chunks so 
>>>> reads 32 at a time instead of 1. However combining either with other 
>>>> chunking sequence operations would have the same effect which is to say 
>>>> that using that mutable iterator with anything else, or having 
>>>> expectations 
>>>> about its rate of consumption was as dubious before as it is now.
>>>>
>>>> Unless of course I misunderstand your intent, which possible because I 
>>>> am on a phone without easy access to look further at the commit and am 
>>>> going by memory.
>>>>
>>>>
>>>>
>>>> On May 26, 2015, at 2:17 PM, Marshall Bockrath-Vandegrift <
>>>> llas...@gmail.com> wrote:
>>>>
>>>> Some of my code is broken by 
>>>> commit c47e1bbcfa227723df28d1c9e0a6df2bcb0fecc1, which landed in 
>>>> 1.7.0-alpha6 (I lasted tested with -alpha5 and have been unfortunately 
>>>> busy 
>>>> since).  The culprit is the switch to producing seqs over iterators as 
>>>> chunked iterators.  This would appear to break seq-based traversal of any 
>>>> iterator implementing the not-uncommon Java pattern of mutating and 
>>>> re-yielding the same object on each `next()` invocation.
>>>>
>>>> I'm unable to find an existing ticket for this apparent-regression.  
>>>> Should I create one, or did I miss the existing ticket, or is there some 
>>>> mitigating issue which makes this a non-problem?
>>>>
>>>> Thanks.
>>>>
>>>> -Marshall
>>>>
>>>> On Thu, May 21, 2015 at 12:31 PM Alex Miller <a...@puredanger.com> 
>>>> wrote:
>>>>
>>>>> Clojure 1.7.0-RC1 is now available.
>>>>>
>>>>>
>>>>>  
>>  

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [ANN] Clojure 1.7.0-RC1 now available

Reply via email to