Re: Question about Clojure parallelization

2017-09-18 Thread darren92 . kim
Thank for all of your advices. 1. I used iota library for convenience. 2. Even though I updated Clojure hash-map by using the transient function, it was very slow. Because the final HashMap size is about 300,000,000. So I had no choice but to using mutable Java data structure, HashMap. 3. I cou

Question about Clojure parallelization

2017-09-12 Thread Na eim
I guess this is the cause (+ solution): https://hackernoon.com/file-processing-in-clojure-can-easily-become-cpu-bound-3c1c38669daf -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that

Re: Question about Clojure parallelization

2017-09-12 Thread Ghadi Shayban
One thing slowing you down is that your function "parallel-process" is calling fold on a line-seq, which is not a foldable source, so you won't get any parallelism. It devolves to a sequential reduce. As an alternative, consider partitioning the lines into batches of a few thousand, then pipel

Re: Question about Clojure parallelization

2017-09-12 Thread James Reeves
As you suspect, your Clojure code isn't very performant. First, you're doing lots of updates to an immutable map, which isn't going to be efficient. Clojure allows immutable data structures to be changed, temporarily, into mutable ones using the transient function. Alternatively, sometimes it's wo

Re: Question about Clojure parallelization

2017-09-12 Thread aufc
https://github.com/thebusby/iota could be useful in your case. Iota is a Clojure library for handling large text files in memory, and offers the following benefits; * Tuned for Clojure's reducers, letting you reduce over large files quickly. * Uses Java NIO's mmap() for rapid IO and handling fil

Re: Question about Clojure parallelization

2017-09-12 Thread Didier
If performance is the concern, you won't be able to beat Java with Clojure. That said, it should be possible to match Java's performance. On Tuesday, 12 September 2017 09:43:58 UTC-7, darren...@gmail.com wrote: > > Hi, > > I am a researcher of Natural Language Processing. > My team want to know

Question about Clojure parallelization

2017-09-12 Thread darren92 . kim
Hi, I am a researcher of Natural Language Processing. My team want to know how well does Clojure parallelize and how much time is reduced compared by Java single thread version. The problem we want to solve is, there is a big corpus file (just now 500MB). Reading sentences line by line, find a