The option Sean mentioned of writing your own custom consumer (without the UIMA id that is causing your issues) should meet these needs I believe.
Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com britt.fi...@wiredinformatics.com On Oct 7, 2014, at 11:29 AM, Kim Ebert <kim.eb...@perfectsearchcorp.com> wrote: > Hi Sean, > > Well of course that makes plenty of sense. Testing different cTakes > configurations you would expect different output. In our testing we've > found several cases where running with the same configuration outputs > different data under different moons. Having consistent results helps us > know if we've made improvements to our quality or not. Having output > that is in a predictable order makes checking to see if there are > differences much cheaper when you are dealing with larger data sets. > > Kim Ebert > 1.801.669.7342 > Perfect Search Corp > http://www.perfectsearchcorp.com/ > > On 10/07/2014 08:50 AM, Finan, Sean wrote: >> Hi Kim, >> >> One might want compare the Sentence detector that uses end of line >> characters as sentence splitters with one that does not. Such a change in >> sentence splitting would not only effect the sentence type discoveries but >> also practically every type that follows. >> >> Another might want to compare a note with "skin cancer" vs. one in which you >> replace "skin cancer" with "melanoma" just to see what the CUI differences >> might be. There are changes in two words vs. one, 11 characters vs. 8, a >> removed adjective(?), and of course changes in CUIs. >> >> Of course, if you are just running notes on a new moon and then again on a >> full moon ... >> >> Sean >> >> -----Original Message----- >> From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com] >> Sent: Tuesday, October 07, 2014 10:41 AM >> To: dev@ctakes.apache.org >> Subject: Re: cTakes output predictability >> >> Sean, >> >> "...being different because of a possibly intentional difference." >> >> I would like you to elaborate a bit on the what would be intentionally >> different between the processing of the same document multiple times. It >> would help my understanding of cTakes. >> >> Thanks, >> >> Kim Ebert >> 1.801.669.7342 >> Perfect Search Corp >> http://www.perfectsearchcorp.com/ >> >> On 10/07/2014 07:30 AM, Finan, Sean wrote: >>> Steve Bethard wrote: >>>> I spent some time writing a script for diff-ing CASes >>> I urge anyone interested in comparing cTakes CASes / output to use this >>> type of approach. Comparison of program output is a post-process task, and >>> unless absolutely necessary code to juggle data and metadata belongs there. >>> Attempts to force every module past, present and Future to abide by fixed >>> orderings, enumerations etc. is not as simple a task as one might initially >>> think - especially if third-party libraries are involved. I won't get into >>> problems associated with why one is comparing output (swapped module?) and >>> IDs, orders etc. being different because of a possibly intentional >>> difference. >>> >>> In addition to or instead of creating a post-processing script, one could >>> write a new "cas-consumer" that writes output in a desired format - but >>> this should not require changes to engines. >>> >>> "If it ain't broke, don't fix it" >>> >>> Sean >>> >>> >>> -----Original Message----- >>> From: Steven Bethard [mailto:steven.beth...@gmail.com] >>> Sent: Monday, October 06, 2014 11:23 PM >>> To: dev@ctakes.apache.org >>> Subject: Re: cTakes output predictability >>> >>> On Mon, Oct 6, 2014 at 3:59 PM, Bruce Tietjen >>> <bruce.tiet...@perfectsearchcorp.com> wrote: >>>> Since I started working with cTakes some time ago, I have found it >>>> difficult to compare the output between subsequent runs on the same >>>> files because annotations are often assigned different IDs, are >>>> listed in different order, etc. >>> At one point, I spent some time writing a script for diff-ing CASes >>> that intended to address some of these kinds of issues. It's still >>> here in cTAKES: >>> >>> ctakes-temporal/src/main/java/org/apache/ctakes/temporal/data/analysis >>> /CompareFeatureStructures.java >>> >>> You might see if you could use or adapt that to your needs. >>> >>> Steve >
signature.asc
Description: Message signed with OpenPGP using GPGMail