I had the same question - are you running independent thread-isolated lazy-seqs on different sources in different threads? Or are you creating one lazy-seq and then *using* it to do different things in multiple threads?
In the first case, the synchronization in lazy-seq only happens in a thread-isolated context (each instance is independent) and the JVM can optimize this in lots of ways. In the later, there may be other better options in this case. I was a bit confused by the CPU usage you reported - it was unclear to me what workload you were running in Clojure and Java cases. Presumably on the same workload, you'd want to have higher CPU usage indicating you are keeping more cores busy (assuming they're not busy doing extra stuff). If your problem was locking contention, I would expect to see much lower CPU usage though. You might try just taking a few thread dumps while your program is running to see if it is really something else in your logic that's slow (rather than contention). Tools like YourKit can help identify locking contention hotspots too. On Tuesday, September 15, 2015 at 7:50:25 AM UTC-5, Alan Thompson wrote: > > Do you have a corresponding example of the parallel code? I'm not sure > which part(s) are being delegated to other threads. > > Often it is just the I/O cost of reading the file that is the dominant > cost, so parallelism doesn't buy you much. > > Alan > > On Mon, Sep 14, 2015 at 9:10 PM, Andy L <core....@gmail.com <javascript:>> > wrote: > >> Hi, >> >> I would like ask for some advise with regards to kind of unusual >> interaction between lazy-seq and threads. I have a code opening some big >> compressed text files and processing them line by line. The code reduced to >> a viable example would look like that: >> >> (with-open [i (-> "mybigfile.gz" clojure.java.io/input-stream >> java.util.zip.GZIPInputStream. clojure.java.io/reader)] (count (line-seq >> i))) >> >> where for the sake of visualization, the processing is replaced by a >> simple counting. >> >> In a single thread situation, everything works very well, with >> performance numbers close to Java (or even equal with >> "-XX:MaxInlineLevel=16"). However, once I run it in threads, either native >> Java Thread or future, instead of nice effect parallel processing, things >> are even slower from as they would be run sequentially. Interestingly >> enough, JVM pegs at 500-600% of CPU (I have 8 cores). I was not sure what >> was the reason, and in order to rule out some basics assumptions, I created >> a Java equivalent. It runs at 200% CPU and scales above 4 cores - which is >> exactly what I want, and matches gzip behavior. (I can run almost 6 "gunzip >> -c mybigfile.gz | wc -l" which all taking 100% CPU each). >> >> Next logical step was to look into Clojure sources. What I am finding >> out, is that lazy-seq is synchronized: >> https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LazySeq.java >> >> . From what I understand, JIT optimizes the single thread case and removes >> "synchronized" guards, however as soon as other threads come into play I am >> forced to pay price for synchronization, which causes the performance >> degradation*. >> >> Interestingly enough, JIT optimizes a version without GZIPInputStream and >> am getting same results as with Java with multiple threads. I have to run >> it with "-XX:MaxInlineLevel=16" though. With a default >> "-XX:MaxInlineLevel=9", JIT does not kick in and performance is not there. >> There is probably another switch in JVM which would help hinting JIT >> better, however I am not convinces that this is a right direction. >> >> I really like semantics of line-seq, however without that "synchronized" >> part, as in my context there is no way that two threads touch same seq. >> >> I would like ask for some advise, what would be my options here. The last >> resort is to write handling code in Java, but I really want to avoid this. >> >> Best, >> Andy >> >> *My analysis might be wrong of course. >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clo...@googlegroups.com >> <javascript:> >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+u...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+u...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.