Thank for all of your advices.
1. I used iota library for convenience.
2. Even though I updated Clojure hash-map by using the transient function,
it was very slow. Because the final HashMap size is about 300,000,000. So
I had no choice but to using mutable Java data structure, HashMap.
3. I cou
I guess this is the cause (+ solution):
https://hackernoon.com/file-processing-in-clojure-can-easily-become-cpu-bound-3c1c38669daf
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that
One thing slowing you down is that your function "parallel-process" is
calling fold on a line-seq, which is not a foldable source, so you won't
get any parallelism. It devolves to a sequential reduce.
As an alternative, consider partitioning the lines into batches of a few
thousand, then pipel
As you suspect, your Clojure code isn't very performant.
First, you're doing lots of updates to an immutable map, which isn't going
to be efficient. Clojure allows immutable data structures to be changed,
temporarily, into mutable ones using the transient function. Alternatively,
sometimes it's wo
https://github.com/thebusby/iota could be useful in your case.
Iota is a Clojure library for handling large text files in memory, and offers
the following benefits;
* Tuned for Clojure's reducers, letting you reduce over large files quickly.
* Uses Java NIO's mmap() for rapid IO and handling fil
If performance is the concern, you won't be able to beat Java with Clojure.
That said, it should be possible to match Java's performance.
On Tuesday, 12 September 2017 09:43:58 UTC-7, darren...@gmail.com wrote:
>
> Hi,
>
> I am a researcher of Natural Language Processing.
> My team want to know
Hi,
I am a researcher of Natural Language Processing.
My team want to know how well does Clojure parallelize and how much time is
reduced compared by Java single thread version.
The problem we want to solve is,
there is a big corpus file (just now 500MB).
Reading sentences line by line, find a