Hello cTakes devs,

Before I start submitting pull requests, I wanted to introduce myself to the team. My name is Jeffery Painter and I am currently working at GlaxoSmithKline as a researcher in our AI/ML group for drug safety.  I have worked with the UMLS pretty extensively since around 2006-2007 and was one of the co-developers of what became the OMOP Common Data Model (CDM).  You can see my full list of publications here: https://javastats.com/about.html

I have been a long time contributor to the Apache Turbine and Apache Torque projects, but have not participated with other Apache projects directly, so not sure what the etiquette is for this group.

I had an itch to scratch as I had been using the old Perl UMLS::Similarity module for years, and discovered about a year ago that the ctakes-ytex package could potentially help solve these issues and I have been able to wrangle it into producing what I need to support my work. However, you probably all know this code has not really been touched since 2013 from what I can tell.

I have been able to update the ctakes-ytex build process to now run with modern versions of MySQL (using Ubuntu 23.10 and MySQL version 8.0.35) which I would like to contribute back to the project. In addition, I have found some computational "bugs" in a couple of the kernel metrics in the ctakes-ytex package which I have now been able to correct and match the outputs generated from the old Perl UMLS::Similarity package.  In addition, I've added a couple of metrics provided by the Perl module which were not in the ctakes-ytex code (such as Resnik and Faith algorithms).


I was going to propose submitting as 3 separate pull requests:

PR-1 : update build process to support modern MySQL database

Q's - is it appopriate to update the supplied MRCONSO.RRF and MRSTY.RRF files from UMLS with current versions? How about me updating the pre-built concept graphs as binary .gz files?

To support the MySQL connection, there are XML templates which parse the DB connection and I don't have an elegant way to pass through the & vs & so I had created two separate DB properties (one for direct JDBC connections in the ctake-ytex code that can't work with the & escaped version and another for the XML templates to parse)

PR-2 : submit corrections to the metrics which are broken

PR-3 : submit new metrics

Please let me know if this makes sense, and how best to work with your team. Should I fork the github repo into my own or create a new branch and submit the PRs from there?  What are the preferred ways of working and contributing to the cTakes project?

As I said, I have apache credentials already from my work with the Turbine team since 2003, and I was elected a member a couple years ago :-)


Best,

Jeffery Painter

j...@jivecast.com

pain...@apache.org








Reply via email to