Re: Indexing HTML pages and phrases

2007-03-16 Thread Doron Cohen
For search phrases there's no need to "detect the phrases" at indexing time - the position of each "word" is saved in the index and then used at search time to match phrase queries. (also see 'query syntax document'.) Lucene takes plain text as document input - extraction of content text and prope

Re: Indexing HTML pages and phrases

2007-03-14 Thread Bhavin Pandya
Hi Maryam, You can index the content of specific field as UN_TOKENIZED and then you can do phrase search on that field.. It will search for only phrases not tokens... To index HTML pages you can use any HTML parser... this may be useful to you.. http://lucene.apache.org/java/docs/api/org/apache

Re: Indexing HTML pages and phrases

2007-03-14 Thread Bhavin Pandya
- Original Message - From: "Maryam" <[EMAIL PROTECTED]> To: Sent: Thursday, March 15, 2007 7:55 AM Subject: Indexing HTML pages and phrases Hi, I am wondering if we can index a phrase (not term) in Lucene? Also, I am not usre if it can index HTML pages? I need to have access to the