Hi All, I am working on some language data and i need to index/search it. I have used lucene for indexing plain text documents before as well (no fancy tricks, just plain text indexing). The data that i have now is transcribed text and is heavily marked up. (Its mostly conversations and interviews). I can easily remove the markup and extract the text and feed it to a lucene indexer but i need to preserve some important markups so that at the time of retrieval the text can make some sense. Now if i leave the required markup intact and index the documents, i fear the markup will get tokenized too and will become searchable. I dont want the markup to be searched but i need to keep it somehow attached with the actual text to make the retrieval process easy. Can you suggest me what/how to do it? Correct me if i am wrong. :)
Thanks for the help. Sulman. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org