Here's some documentation on the config option. This is actually found in the UMLS::Interface module, which underlies a lot of UMLS::Similarity. This unfortunately makes it a bit hard to find, but in general most details about anything other than the actual similarity measure calculation are found in UMLS::interface.
https://metacpan.org/pod/UMLS::Interface#CONFIGURATION-FILE On Mon, Aug 18, 2014 at 10:45 AM, Ted Pedersen <[email protected]> wrote: > You are correct, the information content files are specific to the sources > and relations you'd like to be using. When the ic files are created, counts > of terms found in your text are propagated up whatever resource you are > using (following the relations you have given) so each different > combination of resources and relations will give you different ic values. > > And you are also correct that using the --config option is the way to > specify the sources and relations. In the simplest case the config files > are short text files with two main fields (SAB and REL). The following says > I'd like to use SNOMEDCT with PAR, CHD relations... > > SAB :: include SNOMEDCT > REL :: include PAR, CHD > > So, if this file was called snomedct.config, then you could use it like > this : > > ted@maraca:~$ create-icfrequency.pl --config config/snomedct.config > ic.out test.txt > Default Settings: > --term > > User Settings: > --config config/snomedct.config > > > CuiFinder User Options: > --config option set > > > UMLS-Interface Configuration Information > Sources (SAB): > SNOMEDCT > Relations (REL): > CHD > PAR > Database: > umls (MMSYS-2013AA-20130404) > > > > PathFinder User Options: > --realtime option set > > ted@maraca:~$ cat config/snomedct.config > SAB :: include SNOMEDCT > REL :: include PAR, CHD > > ted@maraca:~$ cat test.txt > my diabetes is awful and I have the flu too > > This is the output file generated (with frequency counts). I have omitted > all the 0 counts for CUIs (which is most of the concepts is SNOMEDCT). > > SAB :: include SNOMEDCT > REL :: include PAR, CHD > N :: 6 > C0021400<>1 > C0439068<>1 > C0439135<>1 > C0441913<>1 > C1706104<>1 > C1706368<>1 > > More to come... > Ted > > > > On Sat, Aug 16, 2014 at 7:55 PM, Steven Bethard [email protected] > [umls-similarity] <[email protected]> wrote: > >> On Aug 16, 2014, at 4:04 PM, Steven Bethard [email protected] >> [umls-similarity] <[email protected]> wrote: >> > On Jul 30, 2014, at 9:55 AM, Bridget McInnes [email protected] >> [umls-similarity] <[email protected]> wrote: >> >> The icpropagation files need to go into the: >> >> /var/www/umls_similarity/icpropagation/ >> > [snip] >> >> create-icfrequency.pl ICFREQUENCY_FILE INPUTFILE >> > [snip] >> >> create-icpropagation.pl ICPROPAGATION_FILE ICFREQUENCY_FILE >> > >> > Thanks, this solved the problem. Some notes for anyone else who has to >> do this: >> > >> > * The create-icfrequency.pl script took about 20 minutes on a text >> file of about 160M words. >> > * The create-icpropagation.pl script took about 10 minutes >> > * The icpropagation file has to be named >> /var/www/umls_similarity/icpropagation/icprop.msh.par.chd for the sever to >> run >> >> Ok, it looks like this didn’t completely solve the problem because when I >> try sources other than MSH, I get errors like: >> >> "Could not open file >> /var/www/umls_similarity/icpropagation/icprop.fma.par.chd” >> >> How do I run the create-ic* scripts so that they generate all the >> different icprop.* files that the server might search for? It seemed like >> maybe I needed to use the --config option, but I couldn’t find the >> documentation on what a config file looks like. And, assuming someone can >> point me to the config file documentation, do I need to run the script once >> for each combination of MSH/FMA/OMIM/SNOWMEDCT/UMLS_ALL, CUI/PAR/CHD/RB/RN? >> Is there a way to make sure I have all the possible combinations? >> >> Steve >> > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > -- Ted Pedersen http://www.d.umn.edu/~tpederse
