Hi,
I am new to lucene. Recently I was assigned for some lucene related
workitems.
Now there is one problem. Before, we use StandardAnalyzer in our
application, and our application has been online for about two years.
Now, we must to write a Custom Analyzer to replace the StandarAnalyzer for
enhanc
HI
You were using a system for two years and it used an index created using
lucene with the StandardAnalyzer. So, There must be an index creation code
with your system.
Anyway,Since you have the book “*Lucene in action*” you can find how to
create an index by reading chapter 2 (Indexing). Please
Hi,
> "If you’re changing analyzers, you should rebuild your index using the
new analyzer so that all documents are analyzed in the same manner."
It says everything: Take your original data and re-create the index.
Indexing is a lossy operation, so you must recreate the index using *all*
the orig
Hmm, I see. Thanks very much.
2011/1/21 Uwe Schindler
> Hi,
>
> > "If you’re changing analyzers, you should rebuild your index using the
> new analyzer so that all documents are analyzed in the same manner."
>
> It says everything: Take your original data and re-create the index.
> Indexing is a
The standard recommendation for paging is to re-execute the search
for second and subsequent pages and return the second or subsequent
chunk of hits. Would that not work in your case?
An alternative is to read and cache hits from the initial search but
that is generally more complex.
--
Ian.
O
First of all try it on different folder than your current index folder. new
analyzer will make different index but same data. First you should create
index on different folder than just replace your new index with current
index files. If it fits, then replace the code and it will work.
2011/1/21 黄
Hi,
Each night I optimize an index that contains 35 millions docs. Its takes
about 1.5 hours. For maintenance reasons, it may happen that the machine
gets rebooted. In that case, server gets a chance to gracefully shutdown,
but eventually, the reboot script will kill the processes that did not
The problem is, that due to the "filtering" AFTER having searched the index, we
don't know how many TopDocs to read in order have "enough" for page x.
Does lucene's search allow injecting kind of a "voter"/"vetoer", which is
called for any hit (ScoreDoc) lucene has encountered. This voter should
The best thing is to re-index from your original source data, but if that is
not available, you can also re-index stored fields, assuming that you
created the index using stored fields for text fields. You would have to
write custom code to retrieve the stored values (not the actual terms since
> The problem is, that due to the "filtering" AFTER having searched the index,
> we don't know how many TopDocs to read in order have "enough" for page x.
Think of a number and double it? Unless the number get really high
lucene is generally plenty fast enough. Or read n and if, after
filtering
You can write a custom Collector that does this (just not delegating the
collect(int) call) and wrap TopDocsCollector with that.
Alternative: Plug in a Filter that filters your documents during the query.
As doing this on iterating hits is often costly, the ideal solution would be
to create a cach
If you call optimize(false), that'll return immediately but run the
optimize "in the background" (assuming you are using the default
ConcurrentMergeScheduler).
Later, when it's time to stop optimizing, call IW.close(false), which
will abort any running merges yet keep any merges that had finished
Would that happen "automagically" at finalization?
paul
Le 21 janv. 2011 à 15:13, Michael McCandless a écrit :
> If you call optimize(false), that'll return immediately but run the
> optimize "in the background" (assuming you are using the default
> ConcurrentMergeScheduler).
>
> Later, when i
No.
If you just do IW.close() <-- no boolean specified, then that defaults
to IW.close(true) which means "wait for all BG merges to finish".
So "normally" IW.close() reserves the right to take a long time.
But IW.close(false) should finish relatively quickly...
Mike
On Fri, Jan 21, 2011 at 9:2
Hello all,
Does anyone know if it is possible in Lucene to do a query based on the
string length of the value of a field?
For example, if I wanted all index matches where a specific field like
'first_name' was between 10 and 20 characters.
Thanks!
-Camden Daily
Not directly, but you could index a NumericField called "length" and
do a NumericRangeQuery on it.
Or loop through all the terms checking length. But that isn't a query
and will be slow.
--
Ian.
On Fri, Jan 21, 2011 at 3:15 PM, Camden Daily wrote:
> Hello all,
>
> Does anyone know if it is p
A wildcard query with 10 leading question marks, each of which requires a
single character. This would also depend on leading wildcards being enabled
in your query parser (if you are using one.)
first_name:??*
The performance would not necessarily be great, but functionally it would do
Oops... I only solved half the problem, the other half was to limit length
to 20, which would be done with a negated leading wildcard of 21 question
marks:
first_name:??* -first_name:?*
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Frida
Wouldn't that also match names with length > 20?
--
Ian.
On Fri, Jan 21, 2011 at 3:26 PM, Jack Krupansky
wrote:
> A wildcard query with 10 leading question marks, each of which requires a
> single character. This would also depend on leading wildcards being enabled
> in your query parser (if y
Thank you Ian and Jack,
I believe I'll go with simply creating a NumericField for the length, as
that will result in the best performance.
-Camden
On Fri, Jan 21, 2011 at 10:35 AM, Ian Lea wrote:
> Wouldn't that also match names with length > 20?
>
>
> --
> Ian.
>
>
> On Fri, Jan 21, 2011 at 3
Hi sorry for the long delay.
The idea is that a single user is editing a single document. As they edit,
any indexes built against the document become stale, actually wrong.
Example: references to specific localities within this document are all
instantly wrong the first time a user types a new be
Hi,
One work around would be to version the documents and store the
version as well as the timestamp of indexed document into the index.
Reading between lines I assume that
Document is
a) stored in some DB/File :
b) indexed in lucene index
User Search On on b)
Document ids
but documents are d
If I understand you correctly, I think that this :
If T2 < T1, Skip the result.
will always be the case. The live being edited document is always "later"
in time than the indexed information about it.
On Fri, Jan 21, 2011 at 9:11 PM, Umesh Prasad wrote:
> Hi,
> One work around would be to
[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[x] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
On
Hi ,
I have started to use Lucene for searching in HTML files. Is it
possible to get Hits per document, when we search for phrases like "Hello
World" and wild card searches like "te?t"?
I managed to return the number of hits per document if there is only one
term using termfrequency vecto
[x] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[x] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
O
There's a feature in lucene called an "instantiated" index. This has
all of the Lucene data structures directly as objects instead of
serialized to disk or a RAMDirectory. It never needs to be committed:
you index a document and it is immediately searchable. It is larger
and faster than a normal in
Nopes. It won't be the case always. Users will not be always editing
the document. They will edit the document, then save which will be
persisted in db. You can use db triggers to push it into a indexing
queue, from which indexer can regularly pick up the document for
indexing. You can schedule you
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[x] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
On F
29 matches
Mail list logo