Re: HTML tags and Lucene highlighting

2012-04-05 Thread Koji Sekiguchi
(12/04/06 2:34), okayndc wrote: Hello, I currently use Lucene version 3.0...probably need to upgrade to a more current version soon. The problem that I have is when I test search for a an HTML tag (ex. ), Lucene returns the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "

Re: HTML tags and Lucene highlighting

2012-04-05 Thread okayndc
I want to retain the formatted HTML in a result but, want to ignore (or filter out) HTML tags in a search, if this makes sense? On Thu, Apr 5, 2012 at 3:44 PM, Steven A Rowe wrote: > okayndc, > > A field configured to use HTMLStripCharFilter as part of its index-time > analyzer will strip out HT

RE: HTML tags and Lucene highlighting

2012-04-05 Thread Steven A Rowe
okayndc, A field configured to use HTMLStripCharFilter as part of its index-time analyzer will strip out HTML tags before index terms are created by the tokenizer, so HTML tags will not be put into the index. As a result, queries for HTML tags cannot match the original documents' HTML tags (in

Re: HTML tags and Lucene highlighting

2012-04-05 Thread okayndc
Hello, I want to ignore HTML tags within a search. ~ I should not be able to search for a HTML tag (ex. ) and get back the highlighted HTML tag (ex. ) in a result set. Thanks On Thu, Apr 5, 2012 at 3:24 PM, Steven A Rowe wrote: > Hi okayndc, > > What *do* you want? > > Steve > > -Origina

Re: Slow merging after upgrading to 3.5

2012-04-05 Thread Ivan Brusic
Hi Mike, Response inline: On Thu, Apr 5, 2012 at 11:36 AM, Michael McCandless wrote: > I'm assuming this is a "build once and never change" index...?  Else, > it sounds like you should never run forceMerge... Correct. The forceMerge was merely to preserve the previous 2.3 behavior of using opti

RE: HTML tags and Lucene highlighting

2012-04-05 Thread Steven A Rowe
Hi okayndc, What *do* you want? Steve -Original Message- From: okayndc [mailto:bodymo...@gmail.com] Sent: Thursday, April 05, 2012 1:34 PM To: java-user@lucene.apache.org Subject: HTML tags and Lucene highlighting Hello, I currently use Lucene version 3.0...probably need to upgrade to

Re: Slow merging after upgrading to 3.5

2012-04-05 Thread Michael McCandless
I'm assuming this is a "build once and never change" index...? Else, it sounds like you should never run forceMerge... To preserve insertion order you just need to use one of the Log*MergePolicy (which you are already doing). Merge factor doesn't affect this... For the fastest way to get to a s

Slow merging after upgrading to 3.5

2012-04-05 Thread Ivan Brusic
I recently migrated a legacy Lucene application from 2.3 to 3.5. The code was filled with numerous custom filter/analyzers/similarites/collectors. Took about a week to convert all the token streams to the new API and removed deprecated classes. Most importantly, there is a collector that enables fa

HTML tags and Lucene highlighting

2012-04-05 Thread okayndc
Hello, I currently use Lucene version 3.0...probably need to upgrade to a more current version soon. The problem that I have is when I test search for a an HTML tag (ex. ), Lucene returns the highlighted HTML tag ~ which is what I DO NOT want. Is there a way to "filter" HTML tags? I have read up

Re: Document-Ids and Merges

2012-04-05 Thread Christoph Kaser
Thank you both Mike and Shai for your answers. If anyone has a similiar problem: I ended up using a column that provides my own "document ids", whose values I got using the fieldcache. I then precalculate the indirection per IndexReader and store it in a WeakHashMap to save the extra lookup.

lucene matchalldocsquery

2012-04-05 Thread jianwen lou
I use  QueryParser parser = new QueryParser(Version.LUCENE_32, "title",   analyzer); i get results back.but when i change to MatchAllDocsQuery i get 0 result. I missed something in my code? I google and search the mailing list,it seems many users are confused with MatchAllDocsQuery,Some advice? tha

can not store field when field store.yes,index.no

2012-04-05 Thread jianwen lou
I use lucene 3.5 in my app.I set a field Store.YES,INDEX.NO. I look for the stored filed value via luke3.5,I do not see the stored field value.But set the field Store.YES,Index.NOT_ANALYZED, i get the stored field value, that's why? thanks -- * *twitter.com/loujianwen