Re: How to tune Analyzer for Text Extraction

2009-08-12 Thread xs2Abhishek
Hi, Well you completely understood my problem:wistle:, the point you mentioned about how much to extract after the word Location is something i'll have to figure out. So lets say that the input to my system would be:- " Location : Montvale, NJ Duration : 7 months " Now the problem is when the in

Re: How to tune Analyzer for Text Extraction

2009-08-12 Thread xs2Abhishek
Hi, Thanks for your replies, it really helped me a lot. Thanks&Regards, Abhishek -- View this message in context: http://www.nabble.com/How-to-tune-Analyzer-for-Text-Extraction-tp24926082p24938899.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Re: How to tune Analyzer for Text Extraction

2009-08-12 Thread Julien Nioche
Hi, you should also have a look at GATE (http://gate.ac.uk) which comes with a NER application called ANNIE. You could use it to analyse your docs before indexing them with Lucene or SOLR. As Grant mentioned, UIMA can also be used for that as there are a number of NER annotators available for it

Re: How to tune Analyzer for Text Extraction

2009-08-12 Thread Grant Ingersoll
On Aug 11, 2009, at 5:27 PM, xs2Abhishek wrote: Hi, I am trying to make a decision on weather or not I can use Lucene for my requirements, which mainly include data tagging. I have to be able to parse or index a .txt file and then be able to extract text accordingly. For e.g if the inpu

Re: How to tune Analyzer for Text Extraction

2009-08-11 Thread Shai Erera
If this file has a predefined construct, e.g.: title: someting location: new york then you can write a simple parser that extracts that information. But I think otherwise this falls outside the scope of Lucene, unless I misunderstood you. If I had to give it a long shot though, I'd try to in

Re: How to tune Analyzer for Text Extraction

2009-08-11 Thread Michael Wechner
xs2Abhishek schrieb: Hi, I am trying to make a decision on weather or not I can use Lucene for my requirements, which mainly include data tagging. I have to be able to parse or index a .txt file and then be able to extract text accordingly. For e.g if the input document has some text like: "Loca