On Sunday, July 7, 2013 6:06:06 AM UTC-4, Jim foo.bar wrote:
>
>  I'm not sure I follow what you mean...both regexes posted here preserve 
> the punctuation...here is mine (ignore the names - it is in fact the same 
> regex):
>

You're right; I was actually referring to the suggestions Lars had made. 

>
> [snip]
>
> Similar thing happens with Lars's simpler regex...just use 're-seq' 
> instead of 'split'
>

That wasn't my experience:

#'user/sentences
user=> (nth sentences 0)
"    THE country of the ancient Mexicans, or Aztecs as they were called, 
formed but a very small part of the extensive territories comprehended in 
the modern republic of Mexico"
user=> (nth sentences 1)
" Its boundaries cannot be defined with certainty"
user=> (nth sentences 2)
" They were much enlarged in the latter days of the empire, when they may 
be considered as reaching from about the eighteenth degree north to the 
twenty-first on the Atlantic"
 
Actually, I also thought of a way to do it with the simple example 
suggested by Lars w/o using the nlp package (this only works b/c there are 
no pipe characters in the text file I'm processing):

user=> (def sentences (clojure.string/split(clojure.string/replace my-text 
#"([.?!;])\s{1}" "$1|||") #"\|\|\|"))
#'user/sentences
user=> (nth sentences 0)
"    THE country of the ancient Mexicans, or Aztecs as they were called, 
formed but a very small part of the extensive territories comprehended in 
the modern republic of Mexico."
user=> (nth sentences 1)
"Its boundaries cannot be defined with certainty."
user=> (nth sentences 2)
"They were much enlarged in the latter days of the empire, when they may be 
considered as reaching from about the eighteenth degree north to the 
twenty-first on the Atlantic;"
user=> (nth sentences 3)
"and from the fourteenth to the nineteenth, including a very narrow strip, 
on the Pacific."

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to