On 4/26/06, Hannes Carl Meyer <[EMAIL PROTECTED]> wrote: > > Hi All, > > I would like enable users to do an acronym search on my index. > My idea is the following: > > 1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document (which > is going to be indexed)
In case you havent already looked at, you might find this useful. http://www.cs.waikato.ac.nz/~nzdl/publications/1999/Yeates-Auto-Extract.pdf 2.) Store the extracted acronyms in a field, for example called "case" > > 3.) On search, asking the user to use case:"ABS" to search for acronyms I would rather store them in the same field with others, so that you can do phrase queries. Store the acronyms just like you would store synonyms. More information on how to store synonyms is in "Lucene in Action" book. This would facilitate queries like "USA President". If you store "USA" in a separate field, you wouldn't be able to match this query. Any experience with this kind of pattern? Other ideas or best practices? I would also look at HMMs/CRFs to extract acronyms. You need to come up with a list of features to identify a potential acronym. For ex: - All Caps - The acronym appears repeatedly in the rest of the text - Found in the acronym dictionary...etc Hope this helps, --Rajesh Munavalli Blog: http://munavalli.blogspot.com