Re: [O] Orgmode → ODT: Certain chars break export

Tory S. Anderson Fri, 13 Feb 2015 07:19:23 -0800

>From a user perspective just stripping the characters seems best to me, but 
>finding out what the characters seems obnoxious. Neither a quick search nor 
>skimming the ODT doc specification[1][2] seem to give any insight into a set 
>of illegal characters. Does elisp have anything similar to Java's 
>"isWhitespace"[3] that could be used to check character features?


Rasmus <ras...@gmx.us> writes:

> torys.ander...@gmail.com (Tory S. Anderson) writes:
>
>> While we're on the topic of ODT export problems: I was in the process
>> of converting PDF to Text to Org to ODT/DocX and discovered that
>> certain characters seem to break exported odt documents, which fail
>> with a line and col number. So far the only one I know for sure is the
>> "" (Char: C-l (12, #o14, #xc)). Hopefully a single fix can handle
>> all such cases.
>>
>> You probably don't need it, but I verified with the following file:
>> http://toryanderson.com/files/breakorg.org
>
> The export is fine, but the produced XML is invalid since it contains an
> illegal character.  But how to resolve this?  Should ox strip illegal
> charterers (if so what are they)?  If so, could they be used for entities?
>
> —Rasmus

Footnotes: 
[1]  https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
[2]  
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1415196_253892949
[3]  http://www.fileformat.info/info/unicode/char/000c/index.htm

Re: [O] Orgmode → ODT: Certain chars break export

Reply via email to