On Sat, May 24, 2014 at 3:14 PM, Benjamin R. Haskell <cloj...@benizi.com> wrote:
> On Sat, May 24, 2014 at 3:09 PM, Gregg Reynolds <d...@mobileink.com> wrote:
>>
>> Hi,
>>
>> In working on an ANTLR grammar for Clojure I came across this regex in
>> clojure.lang.LispReader which is used in matchSymbol:
>>
>> symbolPat == [:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)
>>
>> Look at the first part of the second group:
>>
>> /|[\\D&&[^/]]
>>
>> Am I missing something or is that equal to \\D?
>
>
> That would be equal to \\D, but you're missing that /|[\\D&&[^/]][^/]* is
> the alternation of / and [\\D&&[^/]][^/]* rather than ( the alternation of /
> and [\\D&&[^/]] ) concatenated with [^/]*

Aha.  So concatenation of [] binds more tightly than '|'?  Or maybe it
follows from greedy matching on the second alternative.  In any case I
made the opposite assumption.

>
> '/' is special-cased as a symbol.  It can only be used (with an optional
> namespace) if it's the only character in the name.

Something doesn't look right.

user=> :a/b/c/d
:a/b/c/d
user=> (namespace :a/b/c/d)
"a"
user=> (name :a/b/c/d)
"b/c/d"
user=> (symbol "x/y/z" "foo")
x/y/z/foo
user=> (type 'x/y/z/foo)
clojure.lang.Symbol
user=> (namespace (symbol "x/y/z" "foo"))
"x/y/z"
user=> (namespace 'x/y/z/foo)
"x"
user=> (name 'x/y/z/foo)
"y/z/foo"
user=> (name (symbol "x/y/z" "foo"))
"foo"
...etc...

The sym regex gets it right - '/' chars are part of the ns string:

user> (def longrgx (re-pattern "[:]?([\\D&&[^/]].*/)?(/|[\\D&&[^/]][^/]*)"))
user> (re-find longrgx "x/y/z/foo")
["x/y/z/foo" "x/y/z/" "foo"]

But it doesn't match a final '/' preceded by a namespace string:
user> (re-find longrgx "x/y/z/")
["x/y/z" "x/y/" "z"]

unless its doubled:
user> (re-find longrgx "x/y/z//")
["x/y/z//" "x/y/z/" "/"]

So if the name portion of a symbol cannot contain '/' then why
user=> (name 'x/y/z/foo)
"y/z/foo"
user=> (name (symbol "x" "y/z/foo"))
"y/z/foo"

Ok, clojure.core/name calls clojure.lang.Named/getName, implemented by
clojure.lang.Symbol.

Conclusion: it looks like there is an inconsistency between the Symbol
regex and matchSymbol processing, on the one hand, and Symbol.intern,
which is called by matchSymbol and analyzes the passed string to set
its name and namespace fields:

/* in clojure.lang.Symbol
static public Symbol intern(String nsname){
    int i = nsname.indexOf('/');
    if(i == -1 || nsname.equals("/"))
        return new Symbol(null, nsname.intern());
    else
        return new Symbol(nsname.substring(0, i).intern(),
nsname.substring(i + 1).intern());
}

I guess this isn't a big problem, since programming continues apace,
but it is confusing and sure looks like a bug from here.  By now I
would guess lots of code depends on (name foo) returning a string with
embedded '/'.  Or maybe not.

Thanks,

Gregg

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to