On Apr 21, 2006, at 11:56 AM, Malcolm Clark wrote:
has anyone attempted to index/search the Reuters collection which
consists of SGML?
Mine seems to run through the process okay but alas I'm left with
nothing in the index when I check with Luke or my own Search Engine.
Anyone got any hints
Okay converting to XML sounds like a great option.
Thanks,
Malcolm
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Some months ago I created an index from the reuters collection. I converted
the SGML files to XML using a tool that I've found somewhere on the net
(just google for it), then I parsed the files to create the index, using a
standard DOM parser. If you have problems parsing the SGML files I think you