Hi, it seems the document does not make it through the list for some reason, can you report an issue at https://bz.apache.org/bugzilla/ and attach it there. This way we also have a better trail of work on the problem.
Dominik. On Mon, Oct 7, 2019 at 6:33 AM Teresa Kim <teresa....@linguamatics.com.invalid> wrote: > Hi Dominik > > > Sure I attached the symbol_test.doc document in the previous email. > > I think I cannot attach the document in email? > > Is there anyway I can share the document? > > > Thanks > > T. > > On 06/10/2019 16:29, Dominik Stadler wrote: > > Hi, > > > > can you share an example document which shows the behavior? > > > > Thanks... Dominik. > > > > > > On Sun, Oct 6, 2019 at 6:48 AM Teresa Kim > > <teresa....@linguamatics.com.invalid> wrote: > > > >> Hi > >> > >> > >> I have documents (either 'doc' or 'docx') that have a special character > >> for 'greater than equal' and using codes in 'WordToHtmlConverter', I see > >> those characters are converted into '('. > >> > >> I tried with the latest apache poi release 4.1.0. > >> > >> > >> My java code is: > >> > >> > >> public class TestWordtoHtmlConverter { > >> > >> public static void main(String[] args ) { > >> try { > >> HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new > >> FileInputStream(args[0])); > >> > >> WordToHtmlConverter wordToHtmlConverter = new > WordToHtmlConverter( > >> > DocumentBuilderFactory.newInstance().newDocumentBuilder() > >> .newDocument()); > >> > >> wordToHtmlConverter.processDocument(wordDocument); > >> Document htmlDocument = wordToHtmlConverter.getDocument(); > >> ByteArrayOutputStream out = new ByteArrayOutputStream(); > >> DOMSource domSource = new DOMSource(htmlDocument); > >> StreamResult streamResult = new StreamResult(out); > >> > >> TransformerFactory tf = TransformerFactory.newInstance(); > >> Transformer serializer = tf.newTransformer(); > >> serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); > >> serializer.setOutputProperty(OutputKeys.INDENT, "yes"); > >> serializer.setOutputProperty(OutputKeys.METHOD, "html"); > >> serializer.transform(domSource, streamResult); > >> out.close(); > >> > >> String result = new String(out.toByteArray()); > >> System.out.println(result); > >> } catch (Exception e) { > >> } > >> > >> Is there anyway I can correctly identify these symbols? > >> > >> > >> In the sample document, I am interested in getting 'bad one'. > >> > >> > >> Thanks > >> > >> T. > >> > >> > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@poi.apache.org > >> For additional commands, e-mail: user-h...@poi.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@poi.apache.org > For additional commands, e-mail: user-h...@poi.apache.org > >