I need to process large binary files, i.e. to remove ^M characters.

Let's assume files are about 50MB - small enough to be processed in
memory (but not with a naive implementation).

The following code works, except it throws OutOfMemoryError for file
as small as 6MB:

(defn read-bin-file [file]
  (to-byte-array (as-file file)))

(defn remove-cr-from-file [file]
  (let [dirty-bytes (read-bin-file file)
        clean-bytes (filter #(not (= 13 %)) dirty-bytes)
        changed?    (< (count clean-bytes) (alength dirty-
bytes))]    ; OutOfMemoryError
    (if changed?
      (write-bin-file file clean-bytes)   ; writing works fine
      nil)))

How to force 'filter' to be efficient, i.e. create another array
instead of a memory-blowing list?

How to approach processing large binary data in Clojure?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to