Thank you for pointing out you are using SNOMEDCT_US, which I think is new as of 2013AB. I am using 2013AA so my comparison is not exact here. I will see if I have a 2013AB or later version installed somewhere.
SNOMEDCT_US is the US edition of SNOMEDCT, which does seem to have some additional content. http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/SNOMEDCT_US/ On Fri, Jul 18, 2014 at 11:21 AM, Albert Lai [email protected] [umls-similarity] <[email protected]> wrote: > > > Chaitanya and I are getting something much larger and I think the indexing > has now been running for around 71 hours. We've generated 513mil rows in > the MMSYS_2013AB_20131113_SNOMEDCT_US_CHD_PAR_table. See below. (I noticed > that yours says SNOMEDCT and not SNOMEDCT_US, though when we try SNOMEDCT, > it says SAB (SNOMEDCT) does not exist in your current UMLS view.) > > -Albert > > > mysql> select * from tableindex; > > +-----------------------------------------------------+-------------------------------------------+ > | TABLENAME | HEX > | > > +-----------------------------------------------------+-------------------------------------------+ > | MMSYS_2013AB_20131113_SNOMEDCT_US_CHD_PAR_intrinsic | > af2b21ff9b5244bb5455ca8bc79d0257c2385d17a | > | MMSYS_2013AB_20131113_SNOMEDCT_US_CHD_PAR_parent | > a2cc318432dc1b3e4f397f102b3ab8e705e4ffc34 | > | MMSYS_2013AB_20131113_SNOMEDCT_US_CHD_PAR_child | > a1a69641a1b573c0baa75f92885a39bb576bceef1 | > | MMSYS_2013AB_20131113_SNOMEDCT_US_CHD_PAR_info | > ad174db1eda8d0e60161f39ebf25429ecf62db544 | > | MMSYS_2013AB_20131113_SNOMEDCT_US_CHD_PAR_cache | > ac59c43986dcdfe4e5770d8a032adc8118a9b8acf | > | MMSYS_2013AB_20131113_SNOMEDCT_US_CHD_PAR_table | > a360e225eec93d0400d75f5faee4256db6e22417b | > > +-----------------------------------------------------+-------------------------------------------+ > 6 rows in set (0.00 sec) > > mysql> select TABLE_NAME,TABLE_ROWS,DATA_LENGTH,UPDATE_TIME from > information_schema.tables where TABLE_SCHEMA='umlsinterfaceindex'; > > +-------------------------------------------+------------+--------------+---------------------+ > | TABLE_NAME | TABLE_ROWS | DATA_LENGTH | > UPDATE_TIME | > > +-------------------------------------------+------------+--------------+---------------------+ > | a1a69641a1b573c0baa75f92885a39bb576bceef1 | 1 | 17 | > 2014-07-14 15:14:07 | > | a2cc318432dc1b3e4f397f102b3ab8e705e4ffc34 | 1 | 17 | > 2014-07-14 15:14:07 | > | a360e225eec93d0400d75f5faee4256db6e22417b | 513902275 | 100299885452 | > 2014-07-18 11:52:24 | > | ac59c43986dcdfe4e5770d8a032adc8118a9b8acf | 0 | 0 | > 2014-07-14 15:14:08 | > | ad174db1eda8d0e60161f39ebf25429ecf62db544 | 0 | 0 | > 2014-07-14 15:13:36 | > | af2b21ff9b5244bb5455ca8bc79d0257c2385d17a | 158185 | 2689145 | > 2014-07-17 03:06:15 | > | tableindex | 6 | 584 | > 2014-07-14 15:14:38 | > > +-------------------------------------------+------------+--------------+---------------------+ > 7 rows in set (0.00 sec) > > > On Friday, July 18, 2014 11:38 AM, "Ted Pedersen [email protected] > [umls-similarity]" <[email protected]> wrote: > > > > And here are the specific tables associated with SNOMEDCT. > > acda27da591145d3b4e9ebf9d3c2a3e1dd4d0f40b | 0 | 16384 | > a61994af45895a14eeb8f720eaba083cb9383b698 | 1 | 16384 | > a494252ed0bdc44c0cac223f40735da144a347be0 | 1 | 16384 | > aa337e568342857c2f7b00366cfba8c8c2859621e | 12761986 | 2353004544 | > > So it's clearly the last one that takes up the most space. How close have > you gotten? > > BTW, I think I'm going to go ahead and remove the SNOMEDCT index I have > and re-create it, just to get a sense of how long that takes (it's been a > while since I've done that so I don't exactly remember). > > Good luck, > Ted > > > On Fri, Jul 18, 2014 at 10:29 AM, Ted Pedersen <[email protected]> wrote: > > We have quite a few different resources indexed, so the output from your > command is a little messy. So to start with, here are the table names for > the indices associated with SNOMEDCT (available via the command shown), > This is using PAR CHD relations with SNOMEDCT in 2013AA version of UMLS... > > ted@maraca:~$ getTableNames.pl --config config/snomedct.config > > CuiFinder User Options: > --config option set > > > UMLS-Interface Configuration Information > Sources (SAB): > SNOMEDCT > Relations (REL): > CHD > PAR > Database: > umls (MMSYS-2013AA-20130404) > > > > > > The tables associated with the given configuration file are as follows: > > Table Table Name > acda27da591145d3b4e9ebf9d3c2a3e1dd4d0f40b > MMSYS_2013AA_20130404_SNOMEDCT_CHD_PAR_cache > a61994af45895a14eeb8f720eaba083cb9383b698 > MMSYS_2013AA_20130404_SNOMEDCT_CHD_PAR_child > a494252ed0bdc44c0cac223f40735da144a347be0 > MMSYS_2013AA_20130404_SNOMEDCT_CHD_PAR_parent > aa337e568342857c2f7b00366cfba8c8c2859621e > MMSYS_2013AA_20130404_SNOMEDCT_CHD_PAR_table > > > > On Fri, Jul 18, 2014 at 8:57 AM, [email protected] > [umls-similarity] <[email protected]> wrote: > > > Hello > > Can you share the output of the following SQL query: > > select TABLE_NAME,TABLE_ROWS,DATA_LENGTH from information_schema.tables > where TABLE_SCHEMA='umlsinterfaceindex'; > > > This will give us an idea of how much more we have to go. > > Also, is it possible for you to share the SQL dumps of the indexed tables. > That would be awesome ! > > Thanks, > Chaitanya. > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse
