Jason wrote: > Hi all, > I have come across what I think is a curious but insidious bug with > the java lucene hit highlighter. [...] > when I search for -> Acquisition Plan <- > in my search results I get: > <summary>(ancilliary stuff deleted).... > attached to the <em>Acquisition</em> > < em>Plan</em>and signed</summary> > > notice the space between the < and e in the second < em>
Sorry, Jason, I don't have a solutions for you, but in case there's any question about whether "< em>" is well-formed XML/XHTML/HTML: 1. It is not well-formed XML (and thus cannot be well-formed XHTML) - from <http://www.w3.org/TR/xml/#sec-starttags>: [40] STag ::= '<' Name (S Attribute)* S? '>' [5] Name ::= (Letter | '_' | ':') (NameChar)* ("Letter" & "NameChar" declarations omitted - suffice to say whitespace is excluded.) 2. AFAICT (IANASG), SGML (and hence the [pre-XHTML] HTML profiles of it) disallows space chars between the '<' and the element name (a.k.a. "generic identifier") - from <http://www.oasis-open.org/cover/sgmlsyn/sgmlsyn.htm#C7.4>: [14] start-tag = ( stago , < document type specification [28] , generic identifier specification [29] , attribute specification list [31] , s [5] *, tagc ) | > minimized start-tag [15] [29] generic identifier specification = generic identifier [30] | rank stem [120] [30] generic identifier = name [55] [120] rank stem = name [55] [55] name = name start character [53] , name character [52] * (Note 1: "name" & "name start character" declarations omitted - suffice to say whitespace is excluded.) (Note 2: "document type specification" declaration omitted, because all HTML profiles include the "CONCUR NO" option, thus excluding this syntax.) (Note 3: "minimized start-tag" declaration omitted, because although all HTML profiles include the "SHORTTAG YES" option, the element-minimization aspects of this option [as distinct from attribute minimization, e.g. omitted and unquoted attribute values] are not supported by mainstream browsers; in any case, whitespace is disallowed prior to generic identifiers in all of the minimized start tag forms.) 3. Firefox 2.0.0.1 and IE 7.0 on WinXP both render "< em>...</em>" as literal "< em>..." - the (malformed) start tag is rendered as non-markup plain text, and the close tag is not displayed. Steve --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]