The option Sean mentioned of writing your own custom consumer (without the UIMA 
id that is causing your issues) should meet these needs I believe. 

                         
Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

On Oct 7, 2014, at 11:29 AM, Kim Ebert <kim.eb...@perfectsearchcorp.com> wrote:

> Hi Sean,
> 
> Well of course that makes plenty of sense. Testing different cTakes
> configurations you would expect different output. In our testing we've
> found several cases where running with the same configuration outputs
> different data under different moons. Having consistent results helps us
> know if we've made improvements to our quality or not. Having output
> that is in a predictable order makes checking to see if there are
> differences much cheaper when you are dealing with larger data sets.
> 
> Kim Ebert
> 1.801.669.7342
> Perfect Search Corp
> http://www.perfectsearchcorp.com/
> 
> On 10/07/2014 08:50 AM, Finan, Sean wrote:
>> Hi Kim,
>> 
>> One might want compare the Sentence detector that uses end of line 
>> characters as sentence splitters with one that does not.  Such a change in 
>> sentence splitting would not only effect the sentence type discoveries but 
>> also practically every type that follows.
>> 
>> Another might want to compare a note with "skin cancer" vs. one in which you 
>> replace "skin cancer" with "melanoma" just to see what the CUI differences 
>> might be.  There are changes in two words vs. one, 11 characters vs. 8, a 
>> removed adjective(?), and of course changes in CUIs.
>> 
>> Of course, if you are just running notes on a new moon and then again on a 
>> full moon ...
>> 
>> Sean
>> 
>> -----Original Message-----
>> From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com] 
>> Sent: Tuesday, October 07, 2014 10:41 AM
>> To: dev@ctakes.apache.org
>> Subject: Re: cTakes output predictability
>> 
>> Sean,
>> 
>> "...being different because of a possibly intentional difference."
>> 
>> I would like you to elaborate a bit on the what would be intentionally 
>> different between the processing of the same document multiple times. It 
>> would help my understanding of cTakes.
>> 
>> Thanks,
>> 
>> Kim Ebert
>> 1.801.669.7342
>> Perfect Search Corp
>> http://www.perfectsearchcorp.com/
>> 
>> On 10/07/2014 07:30 AM, Finan, Sean wrote:
>>> Steve Bethard wrote:
>>>> I spent some time writing a script for diff-ing CASes
>>> I urge anyone interested in comparing cTakes CASes / output to use this 
>>> type of approach.  Comparison of program output is a post-process task, and 
>>> unless absolutely necessary code to juggle data and metadata belongs there. 
>>>  Attempts to force every module past, present and Future to abide by fixed 
>>> orderings, enumerations etc. is not as simple a task as one might initially 
>>> think - especially if third-party libraries are involved.  I won't get into 
>>> problems associated with why one is comparing output (swapped module?) and 
>>> IDs, orders etc. being different because of a possibly intentional 
>>> difference.
>>> 
>>> In addition to or instead of creating a post-processing script, one could 
>>> write a new "cas-consumer" that writes output in a desired format - but 
>>> this should not require changes to engines.
>>> 
>>> "If it ain't broke, don't fix it"
>>> 
>>> Sean
>>> 
>>> 
>>> -----Original Message-----
>>> From: Steven Bethard [mailto:steven.beth...@gmail.com]
>>> Sent: Monday, October 06, 2014 11:23 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: cTakes output predictability
>>> 
>>> On Mon, Oct 6, 2014 at 3:59 PM, Bruce Tietjen 
>>> <bruce.tiet...@perfectsearchcorp.com> wrote:
>>>> Since I started working with cTakes some time ago, I have found it 
>>>> difficult to compare the output between subsequent runs on the same 
>>>> files because annotations are often assigned different IDs, are 
>>>> listed in different order, etc.
>>> At one point, I spent some time writing a script for diff-ing CASes 
>>> that intended to address some of these kinds of issues. It's still 
>>> here in cTAKES:
>>> 
>>> ctakes-temporal/src/main/java/org/apache/ctakes/temporal/data/analysis
>>> /CompareFeatureStructures.java
>>> 
>>> You might see if you could use or adapt that to your needs.
>>> 
>>> Steve
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to