Awesome! Thanks a ton.

Greg--

On Sun, Mar 6, 2022 at 6:11 PM JOHN R CASKEY
<jrcas...@medicine.wisc.edu.invalid> wrote:

> Hi Greg,
>
>
>
> I created a class based on https://stackoverflow.com/a/93029 (see
> attached). The usage could be:
>
>
>
> df_text = df[‘TEXT_FIELD’].tolist()
>
> cleaned = [XMLcleaner(x).xmlstring for x in df_text]
>
> df[‘TEXT_FIELD’] = cleaned
>
>
>
> Best,
>
> John
>
>
>
> *From: *Greg Silverman <g...@umn.edu.INVALID>
> *Date: *Sunday, March 6, 2022 at 5:10 PM
> *To: *jrcas...@medicine.wisc.edu.invalid
> <jrcas...@medicine.wisc.edu.invalid>
> *Cc: *dev@ctakes.apache.org <dev@ctakes.apache.org>
> *Subject: *Re: Issue with serializable XML
>
> Hi John,
> I thought I did. I'm using a pandas dataframe and passing it through this:
> files['note_text'] = files['note_text'].apply(lambda x:
> x.replace('[^\x00-\x7F]','')) ... obviously it wasn't enough.
> Any suggestions?
>
> Thanks!
>
> Greg--
>
> On Sun, Mar 6, 2022 at 2:46 PM JOHN R CASKEY
> <jrcas...@medicine.wisc.edu.invalid> wrote:
>
> > I’ve encountered that when the input text file has control characters,
> for
> > example ^M
> >
> > The fix I used was to remove all control characters from the input text
> > files ahead of time via python.
> >
> > Best,
> > John Caskey
> > UW-Madison
> > jrcas...@wisc.edu
> > ________________________________
> > From: Greg Silverman <g...@umn.edu.INVALID>
> > Sent: Sunday, March 6, 2022 12:40:00 PM
> > To: dev@ctakes.apache.org <dev@ctakes.apache.org>
> > Subject: Issue with serializable XML
> >
> > Got the error during processing of a large set of documents about mid-way
> > through:
> > org.xml.sax.SAXParseException: Trying to serialize non-XML 1.0
> character: ,
> > 0x1c
> >
> > I encountered this once before, but I don't remember what the fix was.
> > Running apache-ctakes-4.0.1-SNAPSHOT.
> >
> > Thanks!
> >
> > Greg--
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > Department of Surgery
> > University of Minnesota
> > g...@umn.edu
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
g...@umn.edu

Reply via email to