Look in the Clojure source, file LispReader.java, classes RegexReader and
StringReader for the code that reads strings and regular expressions.

Basically the difference for regular expressions is that since things like
\d to match a single decimal digit, or \s to match a single whitespace
character, are so common in regexes, the regex reader in Clojure tries to
help the developer out by not requiring them to escape the backslashes,
which they would have to do if they specified the regex using a normal
string.  For example, these two are equivalent:

#"\d+\s+"
(re-pattern "\\d+\\s+")

The first is much easier to read, and the improvement is even more
noticeable for longer regexes.


Starting in core.clj with function print leads you to function pr-on, which
then calls method print-method if *print-dup* has its default value of
false.  Look in file core_print.clj for "regex" and you will find the
print-method method for printing regex patterns.


You say your first and last examples print as:

#"a
b"

but I see this with a Clojure 1.5.1 REPL:

user=> (print (re-pattern "a\nb"))
#"a
b"nil
user=> (print (re-pattern "a\\\nb"))
#"a\
b"nil

Not exactly the same, and not too surprising to me in how they differ.

The patterns (re-pattern "a\nb") and (re-pattern "a\\nb") both match the
same strings, because the first one matches exactly the three characters
a,newline,b, and so does the second one.  The first one matches it because
the regex itself contains a newline character to match.  The second one
matches it because it contains a backslash-n, which in Java regex's can be
used to denote that a newline character should be matched.

I don't understand why (re-pattern "a\\\nb") would match the same thing.  I
would have guessed that it wouldn't, but it does indeed do so.  For all I
know that could be bug or weird dark corner case in the Java regex
library.  I would have expected such a regex to match the only the
4-character sequence a,backslash,newline,b.

Andy


On Thu, Mar 28, 2013 at 5:08 PM, Mark Engelberg <mark.engelb...@gmail.com>wrote:

> I'm in reader hell right now, trying to puzzle out how escape sequences
> and printing work for strings and regular expressions.
>
> I notice that:
> (re-pattern "a\nb")
> (re-pattern "a\\nb")
> (re-pattern "a\\\nb")
>
> all produce semantically equivalent regular expressions that match "a\nb"
>
> The middle one prints the way I'd expect, as #"a\nb"
>
> However, the first and last example print as:
> #"a
> b"
>
> Even weirder, printing it with pr has no effect, and it still prints as:
> #"a
> b"
>
> I can sort of imagine why the middle one (re-pattern "a\\nb") might be
> stored internally in a somewhat different format than the other two, but I
> really can't figure out why the "machine-oriented print" of pr would still
> print the blank line rather than \n in this context.
>
> Bug or feature?
>
> Can anyone point me to the relevant code where I can get a better
> understanding of how the reading and printing of regexps differs from
> strings?
>
> --Mark
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to