If this file has a predefined construct, e.g.: title: someting location: new york .... then you can write a simple parser that extracts that information.
But I think otherwise this falls outside the scope of Lucene, unless I misunderstood you. If I had to give it a long shot though, I'd try to index all the data using WhitespaceAnalyzer, and then query for "Location". I'd also use the Highlighter in contrib to find matching segments of text, and take whatever has come after "Location". You should know though how much to take after Location ... Maybe if you post here a sample input, it'll trigger something in me :). Shai On Wed, Aug 12, 2009 at 12:27 AM, xs2Abhishek <abhis...@ontinc.com> wrote: > > Hi, > > I am trying to make a decision on weather or not I can use Lucene for my > requirements, which mainly include data tagging. I have to be able to parse > or index a .txt file and then be able to extract text accordingly. For e.g > if the input document has some text like: "Location: New York" , so for > this > input I should be able to extract "New York" if key word Location is > present. I am trying to learn about Lucene and looked into > "tokensFromAnalysis(analyzer, text)". But i'm still not sure how I could > extract data using lucene. Can I use queries to extract this piece of > information? > > Any help on this would be appreciated. > > Thanks, > Abhishek > -- > View this message in context: > http://www.nabble.com/How-to-tune-Analyzer-for-Text-Extraction-tp24926082p24926082.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >