Re: cTakes output predictability

Kim Ebert Tue, 07 Oct 2014 09:08:57 -0700

It concerns me a bit by making the code return consistent results would
be so concerning. This should be the default mode of operation.


Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/

On 10/07/2014 09:59 AM, britt fitch wrote:
> I think changing the code raises at least some concerns of affecting
> others, while adding a custom consumer raises zero. Given how easy it
> is to write a custom consumer, that is my vote. 
>
>                        
>
> Britt Fitch
> Wired Informatics
> 265 Franklin St Ste 1702
> Boston, MA 02110
> http://wiredinformatics.com
> britt.fi...@wiredinformatics.com
>
> On Oct 7, 2014, at 11:56 AM, Kim Ebert
> <kim.eb...@perfectsearchcorp.com
> <mailto:kim.eb...@perfectsearchcorp.com>> wrote:
>
>> I think we may really prefer the first method. Since it doesn't appear
>> that there are any consequences with moving forward with changing the
>> code, we would really like to move forward with this approach.
>>
>> Kim Ebert
>> 1.801.669.7342
>> Perfect Search Corp
>> http://www.perfectsearchcorp.com/
>>
>> On 10/07/2014 09:35 AM, britt fitch wrote:
>>> The option Sean mentioned of writing your own custom consumer (without
>>> the UIMA id that is causing your issues) should meet these needs I
>>> believe. 
>>>
>>>       
>>>
>>> Britt Fitch
>>> Wired Informatics
>>> 265 Franklin St Ste 1702
>>> Boston, MA 02110
>>> http://wiredinformatics.com
>>> britt.fi...@wiredinformatics.com
>>>
>>> On Oct 7, 2014, at 11:29 AM, Kim Ebert
>>> <kim.eb...@perfectsearchcorp.com
>>> <mailto:kim.eb...@perfectsearchcorp.com>> wrote:
>>>
>>>> Hi Sean,
>>>>
>>>> Well of course that makes plenty of sense. Testing different cTakes
>>>> configurations you would expect different output. In our testing we've
>>>> found several cases where running with the same configuration outputs
>>>> different data under different moons. Having consistent results
>>>> helps us
>>>> know if we've made improvements to our quality or not. Having output
>>>> that is in a predictable order makes checking to see if there are
>>>> differences much cheaper when you are dealing with larger data sets.
>>>>
>>>> Kim Ebert
>>>> 1.801.669.7342
>>>> Perfect Search Corp
>>>> http://www.perfectsearchcorp.com/
>>>>
>>>> On 10/07/2014 08:50 AM, Finan, Sean wrote:
>>>>> Hi Kim,
>>>>>
>>>>> One might want compare the Sentence detector that uses end of line
>>>>> characters as sentence splitters with one that does not.  Such a
>>>>> change in sentence splitting would not only effect the sentence type
>>>>> discoveries but also practically every type that follows.
>>>>>
>>>>> Another might want to compare a note with "skin cancer" vs. one in
>>>>> which you replace "skin cancer" with "melanoma" just to see what the
>>>>> CUI differences might be.  There are changes in two words vs. one,
>>>>> 11 characters vs. 8, a removed adjective(?), and of course changes
>>>>> in CUIs.
>>>>>
>>>>> Of course, if you are just running notes on a new moon and then
>>>>> again on a full moon ...
>>>>>
>>>>> Sean
>>>>>
>>>>> -----Original Message-----
>>>>> From: Kim Ebert [mailto:kim.eb...@perfectsearchcorp.com]
>>>>> Sent: Tuesday, October 07, 2014 10:41 AM
>>>>> To: dev@ctakes.apache.org
>>>>> Subject: Re: cTakes output predictability
>>>>>
>>>>> Sean,
>>>>>
>>>>> "...being different because of a possibly intentional difference."
>>>>>
>>>>> I would like you to elaborate a bit on the what would be
>>>>> intentionally different between the processing of the same document
>>>>> multiple times. It would help my understanding of cTakes.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Kim Ebert
>>>>> 1.801.669.7342
>>>>> Perfect Search Corp
>>>>> http://www.perfectsearchcorp.com/
>>>>>
>>>>> On 10/07/2014 07:30 AM, Finan, Sean wrote:
>>>>>> Steve Bethard wrote:
>>>>>>> I spent some time writing a script for diff-ing CASes
>>>>>> I urge anyone interested in comparing cTakes CASes / output to use
>>>>>> this type of approach.  Comparison of program output is a
>>>>>> post-process task, and unless absolutely necessary code to juggle
>>>>>> data and metadata belongs there.  Attempts to force every module
>>>>>> past, present and Future to abide by fixed orderings, enumerations
>>>>>> etc. is not as simple a task as one might initially think -
>>>>>> especially if third-party libraries are involved.  I won't get into
>>>>>> problems associated with why one is comparing output (swapped
>>>>>> module?) and IDs, orders etc. being different because of a possibly
>>>>>> intentional difference.
>>>>>>
>>>>>> In addition to or instead of creating a post-processing script, one
>>>>>> could write a new "cas-consumer" that writes output in a desired
>>>>>> format - but this should not require changes to engines.
>>>>>>
>>>>>> "If it ain't broke, don't fix it"
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Steven Bethard [mailto:steven.beth...@gmail.com]
>>>>>> Sent: Monday, October 06, 2014 11:23 PM
>>>>>> To: dev@ctakes.apache.org
>>>>>> Subject: Re: cTakes output predictability
>>>>>>
>>>>>> On Mon, Oct 6, 2014 at 3:59 PM, Bruce Tietjen
>>>>>> <bruce.tiet...@perfectsearchcorp.com> wrote:
>>>>>>> Since I started working with cTakes some time ago, I have found it
>>>>>>> difficult to compare the output between subsequent runs on the same
>>>>>>> files because annotations are often assigned different IDs, are
>>>>>>> listed in different order, etc.
>>>>>> At one point, I spent some time writing a script for diff-ing CASes
>>>>>> that intended to address some of these kinds of issues. It's still
>>>>>> here in cTAKES:
>>>>>>
>>>>>> ctakes-temporal/src/main/java/org/apache/ctakes/temporal/data/analysis
>>>>>> /CompareFeatureStructures.java
>>>>>>
>>>>>> You might see if you could use or adapt that to your needs.
>>>>>>
>>>>>> Steve
>

Re: cTakes output predictability

Reply via email to