On Sunday, July 7, 2013 6:06:06 AM UTC-4, Jim foo.bar wrote: > > I'm not sure I follow what you mean...both regexes posted here preserve > the punctuation...here is mine (ignore the names - it is in fact the same > regex): >
You're right; I was actually referring to the suggestions Lars had made. > > [snip] > > Similar thing happens with Lars's simpler regex...just use 're-seq' > instead of 'split' > That wasn't my experience: #'user/sentences user=> (nth sentences 0) " THE country of the ancient Mexicans, or Aztecs as they were called, formed but a very small part of the extensive territories comprehended in the modern republic of Mexico" user=> (nth sentences 1) " Its boundaries cannot be defined with certainty" user=> (nth sentences 2) " They were much enlarged in the latter days of the empire, when they may be considered as reaching from about the eighteenth degree north to the twenty-first on the Atlantic" Actually, I also thought of a way to do it with the simple example suggested by Lars w/o using the nlp package (this only works b/c there are no pipe characters in the text file I'm processing): user=> (def sentences (clojure.string/split(clojure.string/replace my-text #"([.?!;])\s{1}" "$1|||") #"\|\|\|")) #'user/sentences user=> (nth sentences 0) " THE country of the ancient Mexicans, or Aztecs as they were called, formed but a very small part of the extensive territories comprehended in the modern republic of Mexico." user=> (nth sentences 1) "Its boundaries cannot be defined with certainty." user=> (nth sentences 2) "They were much enlarged in the latter days of the empire, when they may be considered as reaching from about the eighteenth degree north to the twenty-first on the Atlantic;" user=> (nth sentences 3) "and from the fourteenth to the nineteenth, including a very narrow strip, on the Pacific." -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.