Agreed. Yes, I think this is slight augmentation and extension of the original vision of the clinical common type system- by having it work with other UIMA based NLP system. Having worked on item (3) for cTAKES, I actually think the tough part will be getting consensus and agreement on a system between all parties and less on the required code changes. Hence, just wanted to ping the community to gauge interest and see if this actually makes sense [It would be nice to plug in different POSTaggers or example without having to remap types]. If we have a willing volunteer (Richard :)?) to perform some of the prelim analysis Q1 2014 with our existing type system, perhaps we can actually make this happen.
4a) I think the SHARP4 development group has essentially moved to the cTAKES ASF community which is probably even better since it already has a meritocratic/governance mechanism to handle changes. On Tue, Oct 1, 2013 at 10:39 AM, Wu, Stephen T., Ph.D. <wu.step...@mayo.edu>wrote: > Pei et al, > That was the vision for the SHARP "common type system", except it was > meant to include medical-related projects rather than general projects. > > Steve's process below is probably the most realistic way to do things, and > it's basically how we did the current cTAKES type system. Unfortunately, > the "someone" doing #1 was me, and I didn't realize that it would be quite > difficult. I guess I know more about how to do it now but #1 and #2 were > surprisingly harder than I expected. I'm adding a #4: > > (1) Have someone inspect the various type systems closely and make a > proposal > A. Know each of the type systems on their own. Essential to visualize > them appropriately, but it is still difficult to understand the > implications of type changes just by looking. (By the way, we never came > up with a really great automatic visualization tool, closest was a Protégé > plugin. Excellent visualization would go a long way, especially if edits > were possible.) > B. Categorize portions of type systems to compare and take them a step > at a time. > C. Clearly limit which type systems you are going to consider for your > comparison and reconciliation. > D. Pick a starting point. I found it nearly impossible to create from > scratch when you're staring at 4-5 other type systems. We started from > the old cTAKES type system but that did cause some bias! > E. Develop real criteria (or at least opinions) for choosing between the > many options. > > (2) Agree on the proposal. > A. Multiple projects should make a binding agreement to implement. This > means, most likely, that they somebody needs to have assurance of funding. > In our case, we only made it binding for cTAKES, so it is only used by > cTAKES (as far as I know). > B. With different projects' vested interests on the line, have some real > discussions of what your project is going to give up with the proposed > stuff. > > (3) Spend the time to re-write all the code to use the new type system. > * As Steve said, this is time-consuming, especially if things get broken > and models need to be retrained, etc. > > (4) Ensure maintenance and modifiability across projects. > A. The original SHARP common type system vision handed off the > maintenance to the Software Development Group, but that never really > happened. I hope the Apache community can serve as this to some degree, > but so far it has still depended on unreliable people like myself. > B. A means of having everyone automatically draw from the same source > code would be preferable. > C. If, in the future, you need to consider another UIMA project whose > type system should be reconciled... Well, that's happening right now. I > guess you can worry about it when you get there if you have a community > that's willing to deal with it. > > > Those are just some thoughts. It's not impossible, but neither is it > simple. > > stephen > > > > > On 9/30/13 8:17 PM, "Steven Bethard" <steven.beth...@gmail.com> wrote: > > >We (ClearTK) talked with Richard (DKPro) about doing this for ClearTK > >and DKPro. Basically, both groups were all for it, but the main issue > >was time. Basically you need to: > > > >(1) Have someone inspect the various type systems closely and make a > >proposal > >(2) Agree on the proposal. > >(3) Spend the time to re-write all the code to use the new type system. > > > >Step (3) is especially time consuming, but in fact, we never managed > >to get the free time for step (1). > > > >That all said, ClearTK would love to share a common type system with > >other projects. > > > >Steve > > > > > >On Mon, Sep 30, 2013 at 7:38 PM, Pei Chen <chen...@apache.org> wrote: > >> Richard, I, and few others had an interesting bar conversation... > >> In the spirit of interoperability, What if we had a baseline common type > >> system that could be reused across UIMA compatible NLP systems? > >> Imagine for a moment that OpenNLP, ClearTK, ClearNLP, DKPro, cTAKES > >>etc. if > >> we could come up with a common baseline type system could be be reused? > >> It > >> may sound like a dream, but it could be doable-- if we could factor out > >>and > >> find the common ground? Perhaps we could start with the syntactical > >> features... and then extend it for more specific domain use cases? > >> > >> --Pei > >