On Saturday, May 24, 2014 5:01:42 PM UTC-5, Gregg Reynolds wrote:
>
> On Sat, May 24, 2014 at 3:14 PM, Benjamin R. Haskell 
> <clo...@benizi.com<javascript:>> 
> wrote: 
> > On Sat, May 24, 2014 at 3:09 PM, Gregg Reynolds 
> > <d...@mobileink.com<javascript:>> 
> wrote: 
> >> 
> >> Hi, 
> >> 
> >> In working on an ANTLR grammar for Clojure I came across this regex in 
> >> clojure.lang.LispReader which is used in matchSymbol: 
> >> 
> >> symbolPat == [:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*) 
> >> 
> >> Look at the first part of the second group: 
> >> 
> >> /|[\\D&&[^/]] 
> >> 
> >> Am I missing something or is that equal to \\D? 
> > 
> > 
> > That would be equal to \\D, but you're missing that /|[\\D&&[^/]][^/]* 
> is 
> > the alternation of / and [\\D&&[^/]][^/]* rather than ( the alternation 
> of / 
> > and [\\D&&[^/]] ) concatenated with [^/]* 
>
> Aha.  So concatenation of [] binds more tightly than '|'?  Or maybe it 
> follows from greedy matching on the second alternative.  In any case I 
> made the opposite assumption. 
>
> > 
> > '/' is special-cased as a symbol.  It can only be used (with an optional 
> > namespace) if it's the only character in the name. 
>
> Something doesn't look right. 
>
> user=> :a/b/c/d 
> :a/b/c/d 
>

Per the reader page (http://clojure.org/reader):

In symbols, "'/' has special meaning, it can be used *once* in the middle 
of a symbol to separate the namespace from the name" (with special casing 
for the name itself being "/") and "Keywords are like symbols". Thus I 
would conclude that :a/b/c/d is an invalid keyword and all bets are off as 
to how it is parsed.
 

> user=> (namespace :a/b/c/d) 
> "a" 
> user=> (name :a/b/c/d) 
> "b/c/d" 
> user=> (symbol "x/y/z" "foo") 
> x/y/z/foo 
> user=> (type 'x/y/z/foo) 
> clojure.lang.Symbol 
> user=> (namespace (symbol "x/y/z" "foo")) 
> "x/y/z" 
> user=> (namespace 'x/y/z/foo) 
> "x" 
> user=> (name 'x/y/z/foo) 
> "y/z/foo" 
> user=> (name (symbol "x/y/z" "foo")) 
> "foo" 
> ...etc... 
>
> The sym regex gets it right - '/' chars are part of the ns string: 
>
> user> (def longrgx (re-pattern 
> "[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)")) 
> user> (re-find longrgx "x/y/z/foo") 
> ["x/y/z/foo" "x/y/z/" "foo"] 
>
> But it doesn't match a final '/' preceded by a namespace string: 
> user> (re-find longrgx "x/y/z/") 
> ["x/y/z" "x/y/" "z"] 
>
> unless its doubled: 
> user> (re-find longrgx "x/y/z//") 
> ["x/y/z//" "x/y/z/" "/"] 
>
> So if the name portion of a symbol cannot contain '/' then why 
> user=> (name 'x/y/z/foo) 
> "y/z/foo" 
> user=> (name (symbol "x" "y/z/foo")) 
> "y/z/foo" 
>
> Ok, clojure.core/name calls clojure.lang.Named/getName, implemented by 
> clojure.lang.Symbol. 
>
> Conclusion: it looks like there is an inconsistency between the Symbol 
> regex and matchSymbol processing, on the one hand, and Symbol.intern, 
> which is called by matchSymbol and analyzes the passed string to set 
> its name and namespace fields: 
>
> /* in clojure.lang.Symbol 
> static public Symbol intern(String nsname){ 
>     int i = nsname.indexOf('/'); 
>     if(i == -1 || nsname.equals("/")) 
>         return new Symbol(null, nsname.intern()); 
>     else 
>         return new Symbol(nsname.substring(0, i).intern(), 
> nsname.substring(i + 1).intern()); 
> } 
>
>
I think this code is fine for valid symbols.
 

> I guess this isn't a big problem, since programming continues apace, 
> but it is confusing and sure looks like a bug from here.  By now I 
> would guess lots of code depends on (name foo) returning a string with 
> embedded '/'.  Or maybe not. 
>

As these are invalid symbol names (per the reader), I would expect no code 
should depend on it. If you're going outside those bounds (which is 
perfectly fine if you're not using the reader) then you should probably 
explicitly specify the namespace and name separately or you're likely to 
encounter something weird.

If you'd like to file a ticket on that intern method, it could probably be 
made more robust but I'm not sure it would be worth changing. Would also 
need to be very careful about introducing performance issues in this path.

And on symbolPat, there actually *is* a bug in it - the first group should 
actually be using a possessive quantifier. It's actually allowing more than 
it should there (see http://dev.clojure.org/jira/browse/CLJ-1252). Tried to 
fix that in 1.6 but found people actually rely on symbols starting with 
digits so we rolled back. I have a todo ticket to do some cleanup for that 
- http://dev.clojure.org/jira/browse/CLJ-1286.


> Thanks, 
>
> Gregg 
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to