I love it! On Tue, Oct 7, 2008 at 4:37 PM, Chouser <[EMAIL PROTECTED]> wrote:
> Ok, I know we've been over this before, but nothing was actually done. > > For the record: > > http://groups.google.com/group/clojure/browse_thread/thread/81b361a4e82602b7/0313c224a480a161 > > So here is my attempt formalize a simple proposal. > > The reader should take the literal contents of #"..." and pass to > Pattern.compile as a raw string, making no changes to the contents. > That means all backslashes (\) and double quotes (") would be passed > right in. The only other thing the reader need concern itself with, > is that when it sees a \" it should not treat that double-quote as the > end of the pattern, but rather keep on doing until it sees a > double-quote that is not preceded by a backslash. Nevertheless is > would pass both the quoting \ and the following " to Pattern.compile. > > That's it. Simple. It works because Java's Pattern itself understands > backslash quoting, including literal chars like backslash and double > quote, hex and octal patterns, as well as other regex patterns. > > Some examples: > > 1. Simple text > (re-find #"foo" "foo") --> "foo" > > 2. Pre-defined character class > (re-find #"\w*" "[EMAIL PROTECTED]") --> "foo" > > 3. Special character (regex and string) > (re-find #"\t" "\t") --> "\t" > > 4. Scary special character (regex only) > Note that the escape sequences available inside #"" are Java Pattern > escape sequences, and therefore by definition different from Clojure > String escape sequences. Of course this is what you need for \w and > such to work: > (re-find #"\a" "\u0007") --> beep "" > > 5. Special character (string only) > The revere of the previous example -- Clojure strings understand "\b" > as (str \backspace), but Java patterns do not, so this example uses > hex instead: > (re-find #"\x08" "\b") --> "\b" > > 6. Hex > (re-find #"\x31" "1") --> "1" > > 7. Octal > (re-find #"\061" "1") --> "1" > > 8. Word boundary: > (re-find #"\bfoo" "foo") --> "foo" > > 9. Quoting fun -- double quote, a single character: > (re-find #"\"" "\"") --> "\"" > > 10. Quoting fun -- backslash, a single character: > (re-find #"\\" "\\") --> "\\" > > 11. Open paren > (re-find #"\(" "(") --> "(" > > I think this demonstrates you can create any pattern you might need. > For reference, here are the above patterns expressed in the current > (not the proposed) reader syntax: > > 1. #"foo" > 2. #"\\w*" > 3. #"\t" or #"\\t" > 4. #"\\a" (but #"\a" makes the reader blow up) > 5. #"\\x08" > 6. #"\\x31" > 7. #"\061" or #"\\061" > 8. #"\\bfoo" (note #"\bfoo" is legal, but doesn't do what you want) > 9. #"\"" or #"\\\"" (but #"\\"" blows up the reader) > 10. #"\\\\" (but #"\\" is illegal) > 11. #"\\(" (but #"\(" is illegal) > > Somehow I'm not sure that communicates how much I dislike the current > syntax. Oh well, maybe others can chime in on that point. I > implemented this to provide the examples above, not because I think > this is a done deal or anything. Please comment! > > Here is a new print method to match the attached patch to LispReader: > > (defmethod print-method java.util.regex.Pattern [p w] > (.write w "#\"") > (.write w (.pattern p)) > (.write w "\"")) > > That print method will take a bit more work to properly quote some > Patterns that could be created by means other than the Clojure > literal. > > --Chouser > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---