Hello David,

Prior to DM’s fix, “&x20;” would be interpreted “ ” even though it is invalid syntax. Also, “ ” would be treated as an error incorrectly.

After DM’s fix, “&x20;” is properly treated as an error and treated as if the ampersand were meant to be a literal ampersand in the text and “ ” is properly treated as “ ”. You compare this behavior to that of a web browser which uses similar technology and behaves the same by checking this jsfiddle: https://jsfiddle.net/binki/mwnLv49f/ .

Sorry, I am just reading the Jira issue and the code—I don’t have a build environment set up so I can’t actually test it. But it looks to me like the changes DM made do indeed fix something here.

Thanks.


On 8/7/2025 12:24 AM, David Haslam wrote:
Hi DM,

I’m puzzled.

You seems to have thought there was a bug which actually wasn’t.

Please refer to https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references <https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references?wprov=sfti1#>

The # was not a bug !

Regards

David

Sent from Proton Mail <https://proton.me/mail/home> for iOS


On Wed, Aug 6, 2025 at 22:20, DM Smith < dmsm...@crosswire.org <mailto:On Wed, Aug 6, 2025 at 22:20, DM Smith <<a href=>> wrote:
I’ve just checked in a change for osis2mod.

MODTOOLS-17 To osis2mod, added conversion of hex and decimal numeric entities to UTF-8, with special handling of <, >, &, ', and ".

Also:
* Fixed a bug in hex numeric entities which defined &xHHHH; rather than &#xHHHH;.
* Added entity sanity check of maximum length of 32.
* Refactored entity handling into handleEntities and comment handling into handleComments. * Changed t_entitytype and t_commentstate into class enums EntityType and CommentState.
* Added -d 1024 for entity and comment parsing.

Note: The coding allows for 0 padding of the numeric entities.
Note: The 5 need to be treated specially.
&#38; or &#x26; → &amp;
&#60; or &#x3C; → &lt;
&#62; or &#x3E; → &gt;
&#34; or &#x22; → &quot; or "
&#39; or &#x27; → &apos; or '
When converted to these forms, &quot; should be transformed into " except in attributes using " and likewise &apos; into ' except in attributes having ‘.

I need to update the wiki to match.

In Him,
DM Smith

_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list:sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to