Hi John,

If you really need to know the boundaries of character references you
should enable the 'notify-char-refs' [1] feature. Note that this only
applies to the content of elements (i.e. not attribute values).

Thanks.

[1]
http://xerces.apache.org/xerces2-j/features.html#scanner.notify-char-refs

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

John Byrne <[EMAIL PROTECTED]> wrote on 04/22/2008 04:58:15 PM:

> "The distinction is syntactic, not semantic. Nothing that's looking at
> the semantic content of XML documents should care about it... and
> nothing should be looking at the purely syntactic details of XML except
> the parser. "
>
> True. And from that point of view, what I am working on is, in fact, and
> kind of parser, albiet a very specialized one. One of the things my
> parser needs to do is detect the presence of these character references.
> I need to distinguish between  &#65; and a letter A character. Now I
> could go and write the code to do this, but I thought since Xerces must
> already have a way of doing this, I'd go ahead and use that instead.
>
> I imagine that there is a callback method somewhere in the XNI API that
> handles the translation of these references into their "normative"
> representation.
>
> As regards the correctness of my design, all I can say is that I've have
> given it quite a lot of thought, and I'm confident that my solution it
> the best option available to me. Unfortunately I'm not in a position to
> go into a lot of detail. While I do appreciate any and all advice, be it
> theoretical or otherwise, what I really need is a practical solution!
>
>
> [EMAIL PROTECTED] wrote:
> >
> > > &amp; might be treated as being the same as &#38;, but these are both
> > > distinct from ordinary text
> >
> > As far as XML is concerned,  neither is "distinct from ordinary text"
> > -- they're just representations of the & character.
> >
> > For comparison, consider &#65;. XML doesn't distinguish between this
> > and a simple capital-A character.
> >
> > The distinction is syntactic, not semantic. Nothing that's looking at
> > the semantic content of XML documents should care about it... and
> > nothing should be looking at the purely syntactic details of XML
> > except the parser.
> >
> > ______________________________________
> > "... Three things see no end: A loop with exit code done wrong,
> > A semaphore untested, And the change that comes along. ..."
> >  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
> > (http://www.ovff.org/pegasus/songs/threes-rev-11.html)
> >
------------------------------------------------------------------------
> >
> > No virus found in this incoming message.
> > Checked by AVG.
> > Version: 7.5.524 / Virus Database: 269.23.3/1390 - Release Date:
> 21/04/2008 16:23
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to