Lucene will probably only be helpful if you know what you are looking
for, e.g. that you search for a given person, a given street and given
time intervals.
Is this what you want to do?
If you instead are looking for a way to really extract any person,
street and time interval that a document is associated with you
probably want to look for a natural language processing project that
can do something like semantic part of speech tagging for you.
karl
13 jan 2010 kl. 17.39 skrev Ortelli, Gian Luca:
Hi community,
I have a general understanding of Lucene concepts, and I'm wondering
if
it's the right tool for my job:
- I need to extract data like e.g. time intervals ("8am - 12pm"),
street
addresses from a set of files. The common issue with this data unit is
that they contain spaces and are not always definable through regexes.
- the extraction must take into consideration the "proximity": for
example, a mail address which is close to the work "Contacts" will
receive a higher rank, since I'm looking for contact data.
Do you think I can get any advantage from building a solution on
Lucene?
Gianluca
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org