I’ve encountered that when the input text file has control characters, for 
example ^M

The fix I used was to remove all control characters from the input text files 
ahead of time via python.

Best,
John Caskey
UW-Madison
jrcas...@wisc.edu
________________________________
From: Greg Silverman <g...@umn.edu.INVALID>
Sent: Sunday, March 6, 2022 12:40:00 PM
To: dev@ctakes.apache.org <dev@ctakes.apache.org>
Subject: Issue with serializable XML

Got the error during processing of a large set of documents about mid-way
through:
org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 character: ,
0x1c

I encountered this once before, but I don't remember what the fix was.
Running apache-ctakes-4.0.1-SNAPSHOT.

Thanks!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
g...@umn.edu

Reply via email to