(12/04/06 2:34), okayndc wrote:
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more
current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
"
I want to retain the formatted HTML in a result but, want to ignore (or
filter out) HTML tags in a search, if this makes sense?
On Thu, Apr 5, 2012 at 3:44 PM, Steven A Rowe wrote:
> okayndc,
>
> A field configured to use HTMLStripCharFilter as part of its index-time
> analyzer will strip out HT
okayndc,
A field configured to use HTMLStripCharFilter as part of its index-time
analyzer will strip out HTML tags before index terms are created by the
tokenizer, so HTML tags will not be put into the index. As a result, queries
for HTML tags cannot match the original documents' HTML tags (in
Hello,
I want to ignore HTML tags within a search. ~ I should not be able to
search for a HTML tag (ex. ) and get back the highlighted HTML tag
(ex. ) in a result set.
Thanks
On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe wrote:
> Hi okayndc,
>
> What *do* you want?
>
> Steve
>
> -Origina
Hi Mike,
Response inline:
On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless
wrote:
> I'm assuming this is a "build once and never change" index...? Else,
> it sounds like you should never run forceMerge...
Correct. The forceMerge was merely to preserve the previous 2.3
behavior of using opti
Hi okayndc,
What *do* you want?
Steve
-Original Message-
From: okayndc [mailto:bodymo...@gmail.com]
Sent: Thursday, April 05, 2012 1:34 PM
To: java-user@lucene.apache.org
Subject: HTML tags and Lucene highlighting
Hello,
I currently use Lucene version 3.0...probably need to upgrade to
I'm assuming this is a "build once and never change" index...? Else,
it sounds like you should never run forceMerge...
To preserve insertion order you just need to use one of the
Log*MergePolicy (which you are already doing). Merge factor doesn't
affect this...
For the fastest way to get to a s
I recently migrated a legacy Lucene application from 2.3 to 3.5. The
code was filled with numerous custom
filter/analyzers/similarites/collectors. Took about a week to convert
all the token streams to the new API and removed deprecated classes.
Most importantly, there is a collector that enables fa
Hello,
I currently use Lucene version 3.0...probably need to upgrade to a more
current version soon.
The problem that I have is when I test search for a an HTML tag (ex.
), Lucene returns
the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to
"filter" HTML tags?
I have read up
Thank you both Mike and Shai for your answers.
If anyone has a similiar problem:
I ended up using a column that provides my own "document ids", whose
values I got using the fieldcache.
I then precalculate the indirection per IndexReader and store it in a
WeakHashMap to save the extra lookup.
I use QueryParser parser = new QueryParser(Version.LUCENE_32,
"title", analyzer); i get results back.but when i change to
MatchAllDocsQuery i get 0 result.
I missed something in my code? I google and search the mailing list,it
seems many users are confused with MatchAllDocsQuery,Some advice?
tha
I use lucene 3.5 in my app.I set a field Store.YES,INDEX.NO.
I look for the stored filed value via luke3.5,I do not see the stored field
value.But set the field Store.YES,Index.NOT_ANALYZED, i get the stored
field value,
that's why? thanks
--
*
*twitter.com/loujianwen
12 matches
Mail list logo