intro and contributing to ctakes-ytex

Jeffery Painter Mon, 22 Jan 2024 10:02:16 -0800

Hello cTakes devs,

Before I start submitting pull requests, I wanted to introduce myself tothe team. My name is Jeffery Painter and I am currently working atGlaxoSmithKline as a researcher in our AI/ML group for drug safety. Ihave worked with the UMLS pretty extensively since around 2006-2007 andwas one of the co-developers of what became the OMOP Common Data Model(CDM). You can see my full list of publications here:https://javastats.com/about.html

I have been a long time contributor to the Apache Turbine and ApacheTorque projects, but have not participated with other Apache projectsdirectly, so not sure what the etiquette is for this group.

I had an itch to scratch as I had been using the old PerlUMLS::Similarity module for years, and discovered about a year ago thatthe ctakes-ytex package could potentially help solve these issues and Ihave been able to wrangle it into producing what I need to support mywork. However, you probably all know this code has not really beentouched since 2013 from what I can tell.

I have been able to update the ctakes-ytex build process to now run withmodern versions of MySQL (using Ubuntu 23.10 and MySQL version 8.0.35)which I would like to contribute back to the project. In addition, Ihave found some computational "bugs" in a couple of the kernel metricsin the ctakes-ytex package which I have now been able to correct andmatch the outputs generated from the old Perl UMLS::Similarity package. In addition, I've added a couple of metrics provided by the Perl modulewhich were not in the ctakes-ytex code (such as Resnik and Faithalgorithms).



I was going to propose submitting as 3 separate pull requests:

PR-1 : update build process to support modern MySQL database

Q's - is it appopriate to update the supplied MRCONSO.RRF and MRSTY.RRFfiles from UMLS with current versions? How about me updating thepre-built concept graphs as binary .gz files?

To support the MySQL connection, there are XML templates which parse theDB connection and I don't have an elegant way to pass through the & vs& so I had created two separate DB properties (one for direct JDBCconnections in the ctake-ytex code that can't work with the &escaped version and another for the XML templates to parse)


PR-2 : submit corrections to the metrics which are broken

PR-3 : submit new metrics

Please let me know if this makes sense, and how best to work with yourteam. Should I fork the github repo into my own or create a new branchand submit the PRs from there? What are the preferred ways of workingand contributing to the cTakes project?

As I said, I have apache credentials already from my work with theTurbine team since 2003, and I was elected a member a couple years ago :-)



Best,

Jeffery Painter

j...@jivecast.com

pain...@apache.org

intro and contributing to ctakes-ytex

Reply via email to