Here's some documentation on the config option. This is actually found in
the UMLS::Interface module, which underlies a lot of UMLS::Similarity. This
unfortunately makes it a bit hard to find, but in general most details
about anything other than the actual similarity measure calculation are
found in UMLS::interface.

https://metacpan.org/pod/UMLS::Interface#CONFIGURATION-FILE




On Mon, Aug 18, 2014 at 10:45 AM, Ted Pedersen <[email protected]> wrote:

> You are correct, the information content files are specific to the sources
> and relations you'd like to be using. When the ic files are created, counts
> of terms found in your text are propagated up whatever resource you are
> using (following the relations you have given) so each different
> combination of resources and relations will give you different ic values.
>
> And you are also correct that using the --config option is the way to
> specify the sources and relations. In the simplest case the config files
> are short text files with two main fields (SAB and REL). The following says
> I'd like to use SNOMEDCT with PAR, CHD relations...
>
> SAB :: include SNOMEDCT
> REL :: include PAR, CHD
>
> So, if this file was called snomedct.config, then you could use it like
> this :
>
> ted@maraca:~$ create-icfrequency.pl --config config/snomedct.config
> ic.out test.txt
> Default Settings:
>   --term
>
> User Settings:
>   --config config/snomedct.config
>
>
> CuiFinder User Options:
>    --config option set
>
>
> UMLS-Interface Configuration Information
>   Sources (SAB):
>     SNOMEDCT
>   Relations (REL):
>     CHD
>     PAR
>   Database:
>     umls (MMSYS-2013AA-20130404)
>
>
>
> PathFinder User Options:
>   --realtime option set
>
>  ted@maraca:~$ cat config/snomedct.config
> SAB :: include SNOMEDCT
> REL :: include PAR, CHD
>
> ted@maraca:~$ cat test.txt
> my diabetes is awful and I have the flu too
>
> This is the output file generated (with frequency counts). I have omitted
> all the 0 counts for CUIs (which is most of the concepts is SNOMEDCT).
>
> SAB :: include SNOMEDCT
> REL :: include PAR, CHD
> N :: 6
> C0021400<>1
> C0439068<>1
> C0439135<>1
> C0441913<>1
> C1706104<>1
> C1706368<>1
>
> More to come...
> Ted
>
>
>
> On Sat, Aug 16, 2014 at 7:55 PM, Steven Bethard [email protected]
> [umls-similarity] <[email protected]> wrote:
>
>> On Aug 16, 2014, at 4:04 PM, Steven Bethard [email protected]
>> [umls-similarity] <[email protected]> wrote:
>> > On Jul 30, 2014, at 9:55 AM, Bridget McInnes [email protected]
>> [umls-similarity] <[email protected]> wrote:
>> >> The icpropagation files need to go into the:
>> >> /var/www/umls_similarity/icpropagation/
>> > [snip]
>> >> create-icfrequency.pl ICFREQUENCY_FILE INPUTFILE
>> > [snip]
>> >> create-icpropagation.pl ICPROPAGATION_FILE ICFREQUENCY_FILE
>> >
>> > Thanks, this solved the problem. Some notes for anyone else who has to
>> do this:
>> >
>> > * The create-icfrequency.pl script took about 20 minutes on a text
>> file of about 160M words.
>> > * The create-icpropagation.pl script took about 10 minutes
>> > * The icpropagation file has to be named
>> /var/www/umls_similarity/icpropagation/icprop.msh.par.chd for the sever to
>> run
>>
>> Ok, it looks like this didn’t completely solve the problem because when I
>> try sources other than MSH, I get errors like:
>>
>> "Could not open file
>> /var/www/umls_similarity/icpropagation/icprop.fma.par.chd”
>>
>> How do I run the create-ic* scripts so that they generate all the
>> different icprop.* files that the server might search for? It seemed like
>> maybe I needed to use the --config option, but I couldn’t find the
>> documentation on what a config file looks like. And, assuming someone can
>> point me to the config file documentation, do I need to run the script once
>> for each combination of MSH/FMA/OMIM/SNOWMEDCT/UMLS_ALL, CUI/PAR/CHD/RB/RN?
>> Is there a way to make sure I have all the possible combinations?
>>
>> Steve
>>
>
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Reply via email to