Re: Question about Clojure parallelization

2017-09-18 Thread darren92 . kim
Thank for all of your advices. 1. I used iota library for convenience. 2. Even though I updated Clojure hash-map by using the transient function, it was very slow. Because the final HashMap size is about 300,000,000. So I had no choice but to using mutable Java data structure, HashMap. 3. I cou

Re: Question about Clojure parallelization

2017-09-12 Thread Ghadi Shayban
One thing slowing you down is that your function "parallel-process" is calling fold on a line-seq, which is not a foldable source, so you won't get any parallelism. It devolves to a sequential reduce. As an alternative, consider partitioning the lines into batches of a few thousand, then pipel

Re: Question about Clojure parallelization

2017-09-12 Thread James Reeves
As you suspect, your Clojure code isn't very performant. First, you're doing lots of updates to an immutable map, which isn't going to be efficient. Clojure allows immutable data structures to be changed, temporarily, into mutable ones using the transient function. Alternatively, sometimes it's wo

Re: Question about Clojure parallelization

2017-09-12 Thread aufc
https://github.com/thebusby/iota could be useful in your case. Iota is a Clojure library for handling large text files in memory, and offers the following benefits; * Tuned for Clojure's reducers, letting you reduce over large files quickly. * Uses Java NIO's mmap() for rapid IO and handling fil

Re: Question about Clojure parallelization

2017-09-12 Thread Didier
If performance is the concern, you won't be able to beat Java with Clojure. That said, it should be possible to match Java's performance. On Tuesday, 12 September 2017 09:43:58 UTC-7, darren...@gmail.com wrote: > > Hi, > > I am a researcher of Natural Language Processing. > My team want to know