Thank for all of your advices.
1. I used iota library for convenience.
2. Even though I updated Clojure hash-map by using the transient function,
it was very slow. Because the final HashMap size is about 300,000,000. So
I had no choice but to using mutable Java data structure, HashMap.
3. I cou
One thing slowing you down is that your function "parallel-process" is
calling fold on a line-seq, which is not a foldable source, so you won't
get any parallelism. It devolves to a sequential reduce.
As an alternative, consider partitioning the lines into batches of a few
thousand, then pipel
As you suspect, your Clojure code isn't very performant.
First, you're doing lots of updates to an immutable map, which isn't going
to be efficient. Clojure allows immutable data structures to be changed,
temporarily, into mutable ones using the transient function. Alternatively,
sometimes it's wo
https://github.com/thebusby/iota could be useful in your case.
Iota is a Clojure library for handling large text files in memory, and offers
the following benefits;
* Tuned for Clojure's reducers, letting you reduce over large files quickly.
* Uses Java NIO's mmap() for rapid IO and handling fil
If performance is the concern, you won't be able to beat Java with Clojure.
That said, it should be possible to match Java's performance.
On Tuesday, 12 September 2017 09:43:58 UTC-7, darren...@gmail.com wrote:
>
> Hi,
>
> I am a researcher of Natural Language Processing.
> My team want to know