The use case is as follows
I have two indexes . One at the master and one at the slave. The user
occasionally keeps committing on the master and the delta is
replicated everytime. But when the optimize happens the transfer size
can be really large. So I am thinking of doing the optimize
separatel
On Wed, Sep 3, 2008 at 2:06 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Noble Paul നോബിള് नोब्ळ् wrote:
>
>> On Tue, Sep 2, 2008 at 1:56 PM, Michael McCandless
>> <[EMAIL PROTECTED]> wrote:
>>>
>>> Are you thinking this would just fallback to Directory.fileModified on
>>> the
>>> segment
I don't think << category:* >> does what you think it does.
category:[* TO *] will find all docs that have any indexed tokens in the
category field, so combining that as a prohibited clause with a
mandatory MatchAllDocsQuery will give you all docs that don't have
anything indexed in the cate
In fact, I think that the important reasons are Directory class and Analyzer
class.
If you don't want IndexSearcher class keep open for the entire life of a web
application, you can do it.
I think It will not cause memory leak problem.
But, Directory and Analyzer classes can cause the problem if th
More details may change my opinion (not quite sure how others feel
yet), but with the way you've described it so far, it seems like all
you need is a basic string matcher:
For every message:
- if message.subject is found in the pool, then this
message is "similar to" the message in the poo
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Yonik,
The SOLR 2 list looks good. The question is, who is going to do the
work? I tried to simplify the scope of Ocean as much as possible to
make it possible (and slowly at that over time) for me to eventually
finish what is mentioned on the wiki. I think SOLR is very cool and
was major
On Sep 3, 2008, at 4:09 PM, Paul Elschot wrote:
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge:
On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote:
Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge:
Hi all,
I am working on implementing a new Query, Weight and Scorer that
is expens
Op Saturday 30 August 2008 18:22:50 schreef Matt Ronge:
> On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote:
> > Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge:
> >> Hi all,
> >>
> >> I am working on implementing a new Query, Weight and Scorer that
> >> is expensive to run. I'd like to limit
On Wed, Sep 3, 2008 at 3:20 PM, Jason Rutherglen
<[EMAIL PROTECTED]> wrote:
> I am wondering
> if there are social networks (or anyone else) out there who would be
> interested in collaborating with Apache on realtime search to get it
> to the point it can be used in production.
Good timing Jason,
I don't know how much of this is a Lucene problem, but -- as I'm sure
you will inevitably hear from others on the list -- it depends on
what your definition of "similar" is.
By similar, do you mean:
1. Identical, except for variations in case (upper/lower)
2. Allow 1., but also allow prefix
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I was kind of waiting for a more efficient solution based on
TermDocs/TermEnum, but I feel since the term is not there at all, the only
thing we can do is to do some deduction.
I can copy the bitmap of all the deleted docs, and go through all
the TermDocs/TermEnum, and set the bit if there is a ter
Hello all,
I don't mean this to sound like a solicitation. I've been working on
realtime search and created some Lucene patches etc. I am wondering
if there are social networks (or anyone else) out there who would be
interested in collaborating with Apache on realtime search to get it
to the poi
Oh.. I wonder if TermDocs/TermEnum would work for you
instead.
Would it work to just create a document validator at index
time that threw an exception if all required fields weren't
present? Or is that outside your control?
Best
Erick
On Wed, Sep 3, 2008 at 3:11 PM, Chris Lu <[EMAIL PROTECTE
Thanks Erick for reminding me of this!
I only need to validate a index and make sure the content are correctly
retrieved and index doesn't have empty fields.
So I'd better simply go through all document by id and check them directly.
Thanks!
--
Chris Lu
-
Instant Scalable
This has been discussed multiple times, so looking at the
searchable archive will give you more detailed info. But as
I remember, the consensus suggestion was to index some
"impossible" value for those documents that lack a field.
For instance, say your field was "sometimes". I document
that had no
: I have attempted to find a concise definition of how the Lucene score is
: calculated, something that can be understood by most people.
The answer tends to vary based on exactly what type of query you are
talking about ... TermQuery? PhraseQuery? BooleanQuery contianing a mix?
I'm going to
If you are looking for a reasonable performance you should not close
your IndexSearcher if not necessary. It is actually best practice to
leave an IndexSearcher instance open an even share it between threads
/ requests of your webapplication. The searcher will not pollute your
memory. Just keep the
Op Wednesday 03 September 2008 18:06:57 schreef Matt Ronge:
> On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote:
> > Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge:
> >> On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote:
> >>> Can you tell us a bit more about what you custom query does?
> >>> Pe
I took your advice and created Singletons for the Directory, Analyzer, and
IndexSearcher classes. I also undid the closing of the Directory and
IndexSearcher. This seemed to fix my memory leak problem. However, I don't
like the fact that I am leaving open the IndexSearcher for the entire life
of a
On Aug 30, 2008, at 3:14 PM, Andrzej Bialecki wrote:
Matt Ronge wrote:
Hi all,
I am working on implementing a new Query, Weight and Scorer that is
expensive to run. I'd like to limit the number of documents I run
this query on by first building a candidate set of documents with a
boolean
On Aug 30, 2008, at 3:01 PM, Paul Elschot wrote:
Op Saturday 30 August 2008 18:19:09 schreef Matt Ronge:
On Aug 30, 2008, at 4:43 AM, Karl Wettin wrote:
Can you tell us a bit more about what you custom query does?
Perhaps you can build the "candidate filter" and reuse it over and
over again?
What's not concise about a complex math formula? :-)
The basic Term Vector approach to IR, that Lucene more or less
implements, says that the score for a document given a query is the
cosine of the angle formed between the query vector and the document
vector.
I like to draw a standard x
Hi all,
I have attempted to find a concise definition of how the Lucene score is
calculated, something that can be understood by most people.
The information I found is accurate, but not particularly concise.
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apac
he/lucene/se
Noble Paul നോബിള് नोब्ळ् wrote:
On Tue, Sep 2, 2008 at 1:56 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Are you thinking this would just fallback to Directory.fileModified
on the
segments_N file for that commit?
You could actually do that without any API change, because
IndexComm
On Tue, Sep 2, 2008 at 1:56 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Are you thinking this would just fallback to Directory.fileModified on the
> segments_N file for that commit?
>
> You could actually do that without any API change, because IndexCommit
> exposes a getSegmentsFileName(
27 matches
Mail list logo