Awesome! Thanks a ton. Greg--
On Sun, Mar 6, 2022 at 6:11 PM JOHN R CASKEY <jrcas...@medicine.wisc.edu.invalid> wrote: > Hi Greg, > > > > I created a class based on https://stackoverflow.com/a/93029 (see > attached). The usage could be: > > > > df_text = df[‘TEXT_FIELD’].tolist() > > cleaned = [XMLcleaner(x).xmlstring for x in df_text] > > df[‘TEXT_FIELD’] = cleaned > > > > Best, > > John > > > > *From: *Greg Silverman <g...@umn.edu.INVALID> > *Date: *Sunday, March 6, 2022 at 5:10 PM > *To: *jrcas...@medicine.wisc.edu.invalid > <jrcas...@medicine.wisc.edu.invalid> > *Cc: *dev@ctakes.apache.org <dev@ctakes.apache.org> > *Subject: *Re: Issue with serializable XML > > Hi John, > I thought I did. I'm using a pandas dataframe and passing it through this: > files['note_text'] = files['note_text'].apply(lambda x: > x.replace('[^\x00-\x7F]','')) ... obviously it wasn't enough. > Any suggestions? > > Thanks! > > Greg-- > > On Sun, Mar 6, 2022 at 2:46 PM JOHN R CASKEY > <jrcas...@medicine.wisc.edu.invalid> wrote: > > > I’ve encountered that when the input text file has control characters, > for > > example ^M > > > > The fix I used was to remove all control characters from the input text > > files ahead of time via python. > > > > Best, > > John Caskey > > UW-Madison > > jrcas...@wisc.edu > > ________________________________ > > From: Greg Silverman <g...@umn.edu.INVALID> > > Sent: Sunday, March 6, 2022 12:40:00 PM > > To: dev@ctakes.apache.org <dev@ctakes.apache.org> > > Subject: Issue with serializable XML > > > > Got the error during processing of a large set of documents about mid-way > > through: > > org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0 > character: , > > 0x1c > > > > I encountered this once before, but I don't remember what the fix was. > > Running apache-ctakes-4.0.1-SNAPSHOT. > > > > Thanks! > > > > Greg-- > > > > -- > > Greg M. Silverman > > Senior Systems Developer > > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > > Department of Surgery > > University of Minnesota > > g...@umn.edu > > > > > -- > Greg M. Silverman > Senior Systems Developer > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> > Department of Surgery > University of Minnesota > g...@umn.edu > -- Greg M. Silverman Senior Systems Developer NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group> Department of Surgery University of Minnesota g...@umn.edu