Thanks for these directions, Ian.
We are running Lucene 2.9.1 on CentOs 5 64-bit machines.
We do use compound file format, and will look into replacing it with the simple
files,
although I believe this will create too many files.
We will also consider the rsync option.
Thanks again,
-- Yuval
---
On Tue, Feb 09, 2010 at 03:47:19PM -0500, Michael McCandless wrote:
> Interesting... and segment merging just does its own private
> concatenation/mapping-around-deletes of the doc/positions?
I think the answer is yes, but I'm not sure I understand the question
completely since I'm not sure why y
On Tue, Feb 9, 2010 at 1:12 PM, Marvin Humphrey wrote:
> On Tue, Feb 09, 2010 at 11:51:31AM -0500, Michael McCandless wrote:
>
>> You should (when possible/reasonable) instead use
>> ReaderUtil.gatherSubReaders, then iterate through those sub readers
>> asking each for its flex fields.
>
>> But if
Maybe I miss something but what is wrong with SynonymTokenFilter in
contrib/wordnet?
simon
On Tue, Feb 9, 2010 at 5:03 PM, Ian Lea wrote:
> Lucene in Action second edition has Synonym stuff that I think will
> work with lucene 3.0.
>
> Source code available from http://www.manning.com/hatcher3/
On Tue, Feb 09, 2010 at 11:51:31AM -0500, Michael McCandless wrote:
> You should (when possible/reasonable) instead use
> ReaderUtil.gatherSubReaders, then iterate through those sub readers
> asking each for its flex fields.
>
> But if this is only for testing purposes, and Multi*Enum is more
> c
Write a simple Collector (read the javadocs) that has a collect(int
doc) method that does nothing except increment a counter. Use it via
one of the search methods that takes a Collector.
btw TopDocCollector won't load them all in memory, but obviously it
will keep track of the top scoring docs.
I'm not sure what you mean by "loading them all into memory".
I'm pretty sure that the numHits you specify just limits the number
of documents kept in the internal ScoreDocs, and getTotalHits
can easily be much greater than numHits. But that would be
trivial to test (you shouldn't take my word for
> is there any way I can search for Documents that have a
> specific Field not set?
Yes. If you are using QueryParser *:* -specificField:[* TO *]
> I was hoping that a simple TermQuery where the term value
> was set to be an empty String would help me out but I was prooven
> wrong.
org.apache
Hi,
is there any way I can search for Documents that have a specific Field not set?
The use case is obvious: Consider you introduce a new field to your
documents but dont want to migrate all other documents,
how would you be able to write a Query that covers both old and new documents?
I was hop
On Tue, Feb 9, 2010 at 11:35 AM, Renaud Delbru wrote:
>> This particular patch doesn't change the Codecs API -- it "only"
>> factors out the Multi* APIs from MultiReader. Likely you won't need
>> to change your codec... but try applying the patch and see :)
>>
>
> Ok, good news ;o).
Flex is sti
Hi Guys,
Is there a way to speed up couting documents that satisfy a search query other
than by using TopDocCollector.getTotalHits()?
For instance, if there are 100 documents satisfying my search query, how
can I count them without loading them all in memory?
Thanks,
Klaus.
--
Jetzt kost
On 09/02/10 16:04, Michael McCandless wrote:
On Tue, Feb 9, 2010 at 9:08 AM, Renaud Delbru wrote:
So, does it mean that the codec interface is likely to change ? Do I need to
be prepared to change again all my code ;o) ?
This particular patch doesn't change the Codecs API -- it "only
Since the update commands may run in different order on different
shards you might get different sets of segments because merges happen
to be triggered at different points in the different batches of
updates. But you shouldn't have different numbers of deleted docs if
you have really been applying
On Tue, Feb 9, 2010 at 9:08 AM, Renaud Delbru wrote:
> Hi Michael,
>
> On 09/02/10 13:35, Michael McCandless wrote:
>>
>> It's great that you're testing the flex APIs... things are still "in
>> flux" as you've seen. There's another big patch pending on
>> LUCENE-2111...
>>
>
> So, does it mean th
Lucene in Action second edition has Synonym stuff that I think will
work with lucene 3.0.
Source code available from http://www.manning.com/hatcher3/
--
Ian.
On Tue, Feb 9, 2010 at 2:03 PM, Marc Schwarz wrote:
> Hi,
>
> i try to implement synonyma, but i didn't exactly know how to do it
> (lu
We are running a large sharded Lucene-based application.
Our configuration supports near real-time updates, by incrementally
Updating documents (using delete then add) on the shards.
Every shard is replicated to several machines in order to improve performance.
We replicate the shard by sending the
Hi Michael,
On 09/02/10 13:35, Michael McCandless wrote:
It's great that you're testing the flex APIs... things are still "in
flux" as you've seen. There's another big patch pending on
LUCENE-2111...
So, does it mean that the codec interface is likely to change ? Do I
need to be prepared t
Hi,
i try to implement synonyma, but i didn't exactly know how to do it
(lucene 3.0).
Is anybody out there who has some small code snippets or a good link ?
Thanks & Greetings,
Marc
-
To unsubscribe, e-mail: java-user-unsub
Renaud,
It's great that you're testing the flex APIs... things are still "in
flux" as you've seen. There's another big patch pending on
LUCENE-2111...
Out of curiosity... in what circumstances do you see a Multi*Enum appearing?
Lucene's core always searches "by segment". Are you doing somethin
Hi Renaud,
> On 09/02/10 12:16, Uwe Schindler wrote:
> > In flex the correct way to add additional posting data to these
> classes would be the usage of custom attributes, registered in the
> attributes() AttributeSource.
> >
> Ok, I have changed my codes to use the AttributeSource interface.
>
Hi Uwe,
On 09/02/10 12:16, Uwe Schindler wrote:
In flex the correct way to add additional posting data to these classes would
be the usage of custom attributes, registered in the attributes()
AttributeSource.
Ok, I have changed my codes to use the AttributeSource interface.
Due to some l
Hi Renaud,
In flex the correct way to add additional posting data to these classes would
be the usage of custom attributes, registered in the attributes()
AttributeSource.
Due to some limitations, there is currently no working support in MultiReaders
to have a "view" on the underlying Enums, b
Hi Michael,
I have updated my lucene-1458, and I discovered there was big
modifications in the StandardCodec interface.
I updated my own codecs to this new interface, but I encounter a
problem. My codecs are creating DocsAndPositionsEnum subclasses that
allow to access more information than si
moreover, search for Mr. Arun Kumar also matches other names because Mr.
matches.
i am ready to use Mr. as a stop word in an analyzer.
Rohit Banga
On Tue, Feb 9, 2010 at 2:42 PM, Rohit Banga wrote:
> i'll try using Luke.
>
> how i want to use Lucene?
>
> there is a sentence that may contain th
i'll try using Luke.
how i want to use Lucene?
there is a sentence that may contain the names of some people from among
those in a list. the names may be incomplete or may have spelling mistakes.
so i created a lucene index, with each person as a document.
eg.
Mr. Arun Kumar
with a hit highli
If you don't get it working that way, then you have to ask you the question:
Why do you want it indexed that way? Is it because you don't want to find all
people in that field when you add ony "Mr." to a search query? It looks like
you use StandardAnalyzer, and in this case, I would add "mr", no
Use Luke.
It can show you the index contents and your parsed query and should show what
is breaking down here.
On 9 Feb 2010, at 08:03, Rohit Banga wrote:
> let us assume this is the only field that is relevant (others are stored and
> not indexed).
> i tried termquery and it does not work.
> i
let us assume this is the only field that is relevant (others are stored and
not indexed).
i tried termquery and it does not work.
i also tried keyword analyzer and still could not make it work.
@Mark
i cannot escape the spaces in my query as i am using Lucene to identify
occurences of names among
28 matches
Mail list logo