Lucene will probably only be helpful if you know what you are looking for, e.g. that you search for a given person, a given street and given time intervals.

Is this what you want to do?

If you instead are looking for a way to really extract any person, street and time interval that a document is associated with you probably want to look for a natural language processing project that can do something like semantic part of speech tagging for you.


      karl

13 jan 2010 kl. 17.39 skrev Ortelli, Gian Luca:

Hi community,



I have a general understanding of Lucene concepts, and I'm wondering if
it's the right tool for my job:



- I need to extract data like e.g. time intervals ("8am - 12pm"), street
addresses from a set of files. The common issue with this data unit is
that they contain spaces and are not always definable through regexes.



- the extraction must take into consideration the "proximity": for
example, a mail address which is close to the work "Contacts" will
receive a higher rank, since I'm looking for contact data.



Do you think I can get any advantage from building a solution on Lucene?



 Gianluca



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to