Re: Permutations
Hi Kim, Thanks for pointing that out. https://issues.apache.org/jira/browse/CTAKES-310 has been opened for this. If you commit the changes, we can see if we can include in the 3.2.1 patch release. I was looking at the changelist for this file, and it may look like some of these optimizations may have been intentional by Sean so he may have some more insight in this bit of the logic. On Thu, Sep 4, 2014 at 6:22 PM, Kim Ebert wrote: > Hi All, > > I was reviewing the use of permutations, and I noticed that we sorted > the permutation list before creating the string to do the concept lookup > with. It also appears that we were sorting the object that was stored in > the parent list. > > I've made a few changes, and now it appears I can discover some > additional concepts based upon the permutations. > > Let me know what you think of the following changes. > > Thanks, > > Kim > > === modified file > 'ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java' > --- > ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java > 2014-07-31 22:00:48 + > +++ > ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java > 2014-09-04 18:39:59 + > @@ -210,11 +210,12 @@ >final List> permutationList = iv_permCacheMap.get( > permutationIndex ); >for ( List permutations : permutationList ) { > // Moved sort and offset calculation from inner (per > MetaDataHit) iteration 2-21-2013 spf > - Collections.sort( permutations ); > + List permutationsSorted = (List) > ((ArrayList)permutations).clone(); > + Collections.sort( permutationsSorted ); > int startOffset = firstWordStartOffset; > int endOffset = firstWordEndOffset; > - if ( !permutations.isEmpty() ) { > -int firstIdx = permutations.get( 0 ); > + if ( !permutationsSorted.isEmpty() ) { > +int firstIdx = permutationsSorted.get( 0 ); > if ( firstIdx <= firstTokenIndex ) { > firstIdx--; > } > @@ -222,7 +223,7 @@ > if ( firstToken.getStartOffset() < firstWordStartOffset ) { > startOffset = firstToken.getStartOffset(); > } > -int lastIdx = permutations.get( permutations.size() - 1 ); > +int lastIdx = permutationsSorted.get( > permutationsSorted.size() - 1 ); > if ( lastIdx <= firstTokenIndex ) { > lastIdx--; > } > > > -- > Kim Ebert > 1.801.669.7342 > Perfect Search Corp > http://www.perfectsearchcorp.com/ >
Re: Permutations
Hi Pei and Sean, Sean, any thoughts about this would be helpful. We also had issues in cTAKES 2.5. Here is the patch for 2.5. Before I got the patch to 3.0 Sean made his changes. === modified file 'src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java' --- src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java 2012-11-28 01:56:50 + +++ src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java 2013-02-06 16:39:37 + @@ -294,14 +294,16 @@ Iterator mdhIterator = mdhSet.iterator(); while (mdhIterator.hasNext()) { -MetaDataHit mdh = (MetaDataHit) mdhIterator.next(); +MetaDataHit mdh = (MetaDataHit) mdhIterator.next(); + +List permutationSorted = (List) ((ArrayList)permutation).clone(); // figure out start and end offsets -Collections.sort(permutation); +Collections.sort(permutationSorted); int startOffset; -if (permutation.size() > 0) +if (permutationSorted.size() > 0) { -int firstIdx = ((Integer) permutation.get(0)).intValue(); +int firstIdx = ((Integer) permutationSorted.get(0)).intValue(); if (firstIdx <= firstTokenIndex.intValue()) { firstIdx--; @@ -322,9 +324,9 @@ } int endOffset; -if (permutation.size() > 0) +if (permutationSorted.size() > 0) { -int lastIdx = ((Integer) permutation.get(permutation.size() - 1)).intValue(); +int lastIdx = ((Integer) permutationSorted.get(permutationSorted.size() - 1)).intValue(); if (lastIdx <= firstTokenIndex.intValue()) { lastIdx--; Kim Ebert 1.801.669.7342 Perfect Search Corp http://www.perfectsearchcorp.com/ On 09/05/2014 10:17 AM, Pei Chen wrote: > Hi Kim, > Thanks for pointing that out. > https://issues.apache.org/jira/browse/CTAKES-310 has been opened for > this. > If you commit the changes, we can see if we can include in the 3.2.1 > patch release. > I was looking at the changelist for this file, and it may look like > some of these optimizations may have been intentional by Sean so he > may have some more insight in this bit of the logic. > > On Thu, Sep 4, 2014 at 6:22 PM, Kim Ebert > wrote: >> Hi All, >> >> I was reviewing the use of permutations, and I noticed that we sorted >> the permutation list before creating the string to do the concept lookup >> with. It also appears that we were sorting the object that was stored in >> the parent list. >> >> I've made a few changes, and now it appears I can discover some >> additional concepts based upon the permutations. >> >> Let me know what you think of the following changes. >> >> Thanks, >> >> Kim >> >> === modified file >> 'ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java' >> --- >> ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java >> 2014-07-31 22:00:48 + >> +++ >> ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java >> 2014-09-04 18:39:59 + >> @@ -210,11 +210,12 @@ >>final List> permutationList = iv_permCacheMap.get( >> permutationIndex ); >>for ( List permutations : permutationList ) { >> // Moved sort and offset calculation from inner (per >> MetaDataHit) iteration 2-21-2013 spf >> - Collections.sort( permutations ); >> + List permutationsSorted = (List) >> ((ArrayList)permutations).clone(); >> + Collections.sort( permutationsSorted ); >> int startOffset = firstWordStartOffset; >> int endOffset = firstWordEndOffset; >> - if ( !permutations.isEmpty() ) { >> -int firstIdx = permutations.get( 0 ); >> + if ( !permutationsSorted.isEmpty() ) { >> +int firstIdx = permutationsSorted.get( 0 ); >> if ( firstIdx <= firstTokenIndex ) { >> firstIdx--; >> } >> @@ -222,7 +223,7 @@ >> if ( firstToken.getStartOffset() < firstWordStartOffset ) { >> startOffset = firstToken.getStartOffset(); >> } >> -int lastIdx = permutations.get( permutations.size() - 1 ); >> +int lastIdx = permutationsSorted.get( >> permutation
RE: Permutations
Hi Kim, Pei, I don't think that I changed anything to which Kim is referring, just a couple of other things that happen to be in the same segment. From the attached it looks like Kim's change is to copy a list and sort the copy, while mine were moving the sort from an inner to outer loop. At any rate, whatever I did I did a year and a half ago and I'm concentrating on the new lookup these days. Sean From: Pei Chen [chen...@apache.org] Sent: Friday, September 05, 2014 12:17 PM To: dev@ctakes.apache.org Subject: Re: Permutations Hi Kim, Thanks for pointing that out. https://issues.apache.org/jira/browse/CTAKES-310 has been opened for this. If you commit the changes, we can see if we can include in the 3.2.1 patch release. I was looking at the changelist for this file, and it may look like some of these optimizations may have been intentional by Sean so he may have some more insight in this bit of the logic. On Thu, Sep 4, 2014 at 6:22 PM, Kim Ebert wrote: > Hi All, > > I was reviewing the use of permutations, and I noticed that we sorted > the permutation list before creating the string to do the concept lookup > with. It also appears that we were sorting the object that was stored in > the parent list. > > I've made a few changes, and now it appears I can discover some > additional concepts based upon the permutations. > > Let me know what you think of the following changes. > > Thanks, > > Kim > > === modified file > 'ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java' > --- > ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java > 2014-07-31 22:00:48 + > +++ > ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java > 2014-09-04 18:39:59 + > @@ -210,11 +210,12 @@ >final List> permutationList = iv_permCacheMap.get( > permutationIndex ); >for ( List permutations : permutationList ) { > // Moved sort and offset calculation from inner (per > MetaDataHit) iteration 2-21-2013 spf > - Collections.sort( permutations ); > + List permutationsSorted = (List) > ((ArrayList)permutations).clone(); > + Collections.sort( permutationsSorted ); > int startOffset = firstWordStartOffset; > int endOffset = firstWordEndOffset; > - if ( !permutations.isEmpty() ) { > -int firstIdx = permutations.get( 0 ); > + if ( !permutationsSorted.isEmpty() ) { > +int firstIdx = permutationsSorted.get( 0 ); > if ( firstIdx <= firstTokenIndex ) { > firstIdx--; > } > @@ -222,7 +223,7 @@ > if ( firstToken.getStartOffset() < firstWordStartOffset ) { > startOffset = firstToken.getStartOffset(); > } > -int lastIdx = permutations.get( permutations.size() - 1 ); > +int lastIdx = permutationsSorted.get( > permutationsSorted.size() - 1 ); > if ( lastIdx <= firstTokenIndex ) { > lastIdx--; > } > > > -- > Kim Ebert > 1.801.669.7342 > Perfect Search Corp > http://www.perfectsearchcorp.com/ >
RE: Permutations
> We also had issues in cTAKES 2.5. > Here is the patch for 2.5. Before I got the patch to 3.0 Sean made his > changes. Thanks for verifying, Sean From: Kim Ebert [kim.eb...@perfectsearchcorp.com] Sent: Friday, September 05, 2014 12:28 PM To: dev@ctakes.apache.org; Finan, Sean Subject: Re: Permutations Hi Pei and Sean, Sean, any thoughts about this would be helpful. We also had issues in cTAKES 2.5. Here is the patch for 2.5. Before I got the patch to 3.0 Sean made his changes. === modified file 'src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java' --- src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java 2012-11-28 01:56:50 + +++ src/edu/mayo/bmi/lookup/algorithms/FirstTokenPermutationImpl.java 2013-02-06 16:39:37 + @@ -294,14 +294,16 @@ Iterator mdhIterator = mdhSet.iterator(); while (mdhIterator.hasNext()) { -MetaDataHit mdh = (MetaDataHit) mdhIterator.next(); +MetaDataHit mdh = (MetaDataHit) mdhIterator.next(); + +List permutationSorted = (List) ((ArrayList)permutation).clone(); // figure out start and end offsets -Collections.sort(permutation); +Collections.sort(permutationSorted); int startOffset; -if (permutation.size() > 0) +if (permutationSorted.size() > 0) { -int firstIdx = ((Integer) permutation.get(0)).intValue(); +int firstIdx = ((Integer) permutationSorted.get(0)).intValue(); if (firstIdx <= firstTokenIndex.intValue()) { firstIdx--; @@ -322,9 +324,9 @@ } int endOffset; -if (permutation.size() > 0) +if (permutationSorted.size() > 0) { -int lastIdx = ((Integer) permutation.get(permutation.size() - 1)).intValue(); +int lastIdx = ((Integer) permutationSorted.get(permutationSorted.size() - 1)).intValue(); if (lastIdx <= firstTokenIndex.intValue()) { lastIdx--; Kim Ebert 1.801.669.7342 Perfect Search Corp http://www.perfectsearchcorp.com/ On 09/05/2014 10:17 AM, Pei Chen wrote: > Hi Kim, > Thanks for pointing that out. > https://issues.apache.org/jira/browse/CTAKES-310 has been opened for > this. > If you commit the changes, we can see if we can include in the 3.2.1 > patch release. > I was looking at the changelist for this file, and it may look like > some of these optimizations may have been intentional by Sean so he > may have some more insight in this bit of the logic. > > On Thu, Sep 4, 2014 at 6:22 PM, Kim Ebert > wrote: >> Hi All, >> >> I was reviewing the use of permutations, and I noticed that we sorted >> the permutation list before creating the string to do the concept lookup >> with. It also appears that we were sorting the object that was stored in >> the parent list. >> >> I've made a few changes, and now it appears I can discover some >> additional concepts based upon the permutations. >> >> Let me know what you think of the following changes. >> >> Thanks, >> >> Kim >> >> === modified file >> 'ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java' >> --- >> ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java >> 2014-07-31 22:00:48 + >> +++ >> ctakes-dictionary-lookup/src/main/java/org/apache/ctakes/dictionary/lookup/algorithms/FirstTokenPermutationImpl.java >> 2014-09-04 18:39:59 + >> @@ -210,11 +210,12 @@ >>final List> permutationList = iv_permCacheMap.get( >> permutationIndex ); >>for ( List permutations : permutationList ) { >> // Moved sort and offset calculation from inner (per >> MetaDataHit) iteration 2-21-2013 spf >> - Collections.sort( permutations ); >> + List permutationsSorted = (List) >> ((ArrayList)permutations).clone(); >> + Collections.sort( permutationsSorted ); >> int startOffset = firstWordStartOffset; >> int endOffset = firstWordEndOffset; >> - if ( !permutations.isEmpty() ) { >> -int firstIdx = permutations.get( 0 ); >> + if ( !permutationsSorted.isEmpty() ) { >> +int firstIdx = permutationsSorted.get( 0 ); >> if ( firstIdx <= firstTokenIndex ) { >> firstIdx--; >> } >> @@