On Saturday, May 24, 2014 5:01:42 PM UTC-5, Gregg Reynolds wrote: > > On Sat, May 24, 2014 at 3:14 PM, Benjamin R. Haskell > <clo...@benizi.com<javascript:>> > wrote: > > On Sat, May 24, 2014 at 3:09 PM, Gregg Reynolds > > <d...@mobileink.com<javascript:>> > wrote: > >> > >> Hi, > >> > >> In working on an ANTLR grammar for Clojure I came across this regex in > >> clojure.lang.LispReader which is used in matchSymbol: > >> > >> symbolPat == [:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*) > >> > >> Look at the first part of the second group: > >> > >> /|[\\D&&[^/]] > >> > >> Am I missing something or is that equal to \\D? > > > > > > That would be equal to \\D, but you're missing that /|[\\D&&[^/]][^/]* > is > > the alternation of / and [\\D&&[^/]][^/]* rather than ( the alternation > of / > > and [\\D&&[^/]] ) concatenated with [^/]* > > Aha. So concatenation of [] binds more tightly than '|'? Or maybe it > follows from greedy matching on the second alternative. In any case I > made the opposite assumption. > > > > > '/' is special-cased as a symbol. It can only be used (with an optional > > namespace) if it's the only character in the name. > > Something doesn't look right. > > user=> :a/b/c/d > :a/b/c/d >
Per the reader page (http://clojure.org/reader): In symbols, "'/' has special meaning, it can be used *once* in the middle of a symbol to separate the namespace from the name" (with special casing for the name itself being "/") and "Keywords are like symbols". Thus I would conclude that :a/b/c/d is an invalid keyword and all bets are off as to how it is parsed. > user=> (namespace :a/b/c/d) > "a" > user=> (name :a/b/c/d) > "b/c/d" > user=> (symbol "x/y/z" "foo") > x/y/z/foo > user=> (type 'x/y/z/foo) > clojure.lang.Symbol > user=> (namespace (symbol "x/y/z" "foo")) > "x/y/z" > user=> (namespace 'x/y/z/foo) > "x" > user=> (name 'x/y/z/foo) > "y/z/foo" > user=> (name (symbol "x/y/z" "foo")) > "foo" > ...etc... > > The sym regex gets it right - '/' chars are part of the ns string: > > user> (def longrgx (re-pattern > "[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)")) > user> (re-find longrgx "x/y/z/foo") > ["x/y/z/foo" "x/y/z/" "foo"] > > But it doesn't match a final '/' preceded by a namespace string: > user> (re-find longrgx "x/y/z/") > ["x/y/z" "x/y/" "z"] > > unless its doubled: > user> (re-find longrgx "x/y/z//") > ["x/y/z//" "x/y/z/" "/"] > > So if the name portion of a symbol cannot contain '/' then why > user=> (name 'x/y/z/foo) > "y/z/foo" > user=> (name (symbol "x" "y/z/foo")) > "y/z/foo" > > Ok, clojure.core/name calls clojure.lang.Named/getName, implemented by > clojure.lang.Symbol. > > Conclusion: it looks like there is an inconsistency between the Symbol > regex and matchSymbol processing, on the one hand, and Symbol.intern, > which is called by matchSymbol and analyzes the passed string to set > its name and namespace fields: > > /* in clojure.lang.Symbol > static public Symbol intern(String nsname){ > int i = nsname.indexOf('/'); > if(i == -1 || nsname.equals("/")) > return new Symbol(null, nsname.intern()); > else > return new Symbol(nsname.substring(0, i).intern(), > nsname.substring(i + 1).intern()); > } > > I think this code is fine for valid symbols. > I guess this isn't a big problem, since programming continues apace, > but it is confusing and sure looks like a bug from here. By now I > would guess lots of code depends on (name foo) returning a string with > embedded '/'. Or maybe not. > As these are invalid symbol names (per the reader), I would expect no code should depend on it. If you're going outside those bounds (which is perfectly fine if you're not using the reader) then you should probably explicitly specify the namespace and name separately or you're likely to encounter something weird. If you'd like to file a ticket on that intern method, it could probably be made more robust but I'm not sure it would be worth changing. Would also need to be very careful about introducing performance issues in this path. And on symbolPat, there actually *is* a bug in it - the first group should actually be using a possessive quantifier. It's actually allowing more than it should there (see http://dev.clojure.org/jira/browse/CLJ-1252). Tried to fix that in 1.6 but found people actually rely on symbols starting with digits so we rolled back. I have a todo ticket to do some cleanup for that - http://dev.clojure.org/jira/browse/CLJ-1286. > Thanks, > > Gregg > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.