Re: Reuters

2006-04-21 Thread Marvin Humphrey
On Apr 21, 2006, at 11:56 AM, Malcolm Clark wrote: has anyone attempted to index/search the Reuters collection which consists of SGML? Mine seems to run through the process okay but alas I'm left with nothing in the index when I check with Luke or my own Search Engine. Anyone got any hints

Re: Reuters

2006-04-21 Thread Malcolm Clark
Okay converting to XML sounds like a great option. Thanks, Malcolm - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Reuters

2006-04-21 Thread Lorenzo Viscanti
Some months ago I created an index from the reuters collection. I converted the SGML files to XML using a tool that I've found somewhere on the net (just google for it), then I parsed the files to create the index, using a standard DOM parser. If you have problems parsing the SGML files I think you