For search phrases there's no need to "detect the phrases" at indexing time
- the position of each "word" is saved in the index and then used at search
time to match phrase queries. (also see 'query syntax document'.)
Lucene takes plain text as document input - extraction of content text and
prope
Hi Maryam,
You can index the content of specific field as UN_TOKENIZED and then you can
do phrase search on that field..
It will search for only phrases not tokens...
To index HTML pages you can use any HTML parser...
this may be useful to you..
http://lucene.apache.org/java/docs/api/org/apache
- Original Message -
From: "Maryam" <[EMAIL PROTECTED]>
To:
Sent: Thursday, March 15, 2007 7:55 AM
Subject: Indexing HTML pages and phrases
Hi,
I am wondering if we can index a phrase (not term) in
Lucene? Also, I am not usre if it can index HTML
pages? I need to have access to the