On Sat, Jul 6, 2013 at 11:42 AM, Denis Papathanasiou <denis.papathanas...@gmail.com> wrote: > (def my-text (slurp "mytext.txt")) > (def my-sentences (partition-by ispunc? my-text)) > > Unfortunately, this returns a sequence of 1, whose first and only element > contains the entire text, since ispunc? depends on looking at a single > character. > > So I tried producing a list of chars from the string and passing it to > partition-by with ispunc? like this: > > (def my-text-chars (partition (count my-text) my-text)) > (def my-sentences (partition-by ispunc? (first my-text-chars))) > > That worked, in that it's logically "correct", but when I try to access any > of the elements in my-sentences I get a java.lang.OutOfMemoryError (the > source text file, "mytext.txt" is 1.3 mb in size). > > So is there a simpler and more idiomatic way of doing this without using up > all the heap space?
If that kind of splitting is really all you require, (clojure.string/split my-text #"[.!?;]") or (re-seq #"[^.!?;]+" my-text) For fancier stuff look into an opennlp wrapper or something like it. https://github.com/dakrone/clojure-opennlp Lars Nilsson -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.