I'm curious, why is LogMergePolicy named *Log*MergePolicy?
(Why not ExpMergePolicy? :-)
Well, I guess it's a matter of perspective. When you look at the way the
algorithm works, the merge decisions are based on a concept of level and
levels are assigned based on the log of the numb
Probably want a combination of extractWikipedia.alg and wikipedia.alg?
You want the EnwikiDocMaker from extractWikipedia.alg which reads the
uncompressed xml file but rather than using WriteLineDoc, you want to go
ahead and index as wikipedia.alg does. (Ditch the query part.)
You'll need an accep
If I'm reading this correctly, there's something a little wonky here. In
your example code, you close the IndexWriter and then, without creating
a new IndexWriter, you call addDocument again. This shouldn't be
possible (what version of Lucene are you using?)
Assuming for the time being that you ar
See IndexWriter#addIndexesNoOptimize, released with 2.1. Note that it
doesn't optimize before or after, so if you want an optimize at the end,
you need to ask for it manually.
-Original Message-
From: Chandan Tamrakar [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 03, 2007 12:46 AM
To: jav
Yup, 845 is relevant, as is 847. I haven't had time to digest all that
David wrote yet, but I'm starting. It's particularly relevant because
before I get to the point of making 847 committable, I need a way of
testing merge performance (the factoring in 847 proposes to simplify the
API slightly, so
Lucene doesn't use a pure Boolean algebra, so things don't always do
what one might expect and things like De Morgan's law don't hold.
The source of this comes from the combination of IR prefix notation
(+/-) with standard Boolean AND/OR.
If you look at the source, there a number of rules that di
You'll have a difficult time updating Lucene indexes in place. A lot of
coordination exists within Lucene specifically not to do this: it's the
fact that Lucene does not do this that enables a lot of the lockless
parallelism in Lucene. This applies equally to the data store and the
inverted index p
It's possible to do leading wildcard searches in Lucene as of 2.1. See
http://wiki.apache.org/lucene-java/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695
(http://tinyurl.com/366suf)
-Original Message-
From: Oystein Reigem [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 13, 2007 11
But, letting it stay in the text stream and not putting it in a
separate
date
field would give you some trouble with ranges because things
that
weren't dates could mess you up.
This is why Chris suggested putting a prefix on the token. For example,
leading underscor
If all you want to do is find docs containing dates within a range, it
probably doesn't make much difference whether you give dates their own
field or put them into your content field. It'll probably be easier to
just add them into the token stream since that's the way the analyzer
architecture wan
Yeah, date finding is a little like entity extraction, since dates can
have many formats, depending on how crazy you want to get ("a week from
tomorrow" is 3/8/2007 if you know that this e-mail was written today).
So much so that I went and looked up lingpipe, but they seem to not be
concerned with
http://lucene.apache.org/java/docs/scoring.html
(which you can also find by googling "lucene scoring")
-Original Message-
From: Jong Kim [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 28, 2007 2:21 PM
To: java-user@lucene.apache.org
Subject: ranking/scoring algorithm in details
Hi,
Are unindexed fields stored seperately from the main inverted
index?
If so then, one could implement the field value change as a
delete and
re-add of just that value?
The short answer is that won't work. Field values are stored in a
different data structure than the posting
age-
From: Neal Richter [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 27, 2007 11:52 AM
To: java-user@lucene.apache.org
Subject: RE: document field updates
Steven Parkes wrote:
>There are no plans to do this. It's essentially impossible, given (1)
>the reverse nature of te
There are no plans to do this. It's essentially impossible, given (1)
the reverse nature of text indexes and (2) Lucene's write-once segment
architecture.
-Original Message-
From: Arnone, Anthony [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 27, 2007 10:18 AM
To: java-user@lucene.apac
The easiest way to pin this down is to get the backtrace from the
exception, e.g., e.printStackTrace(). That would tell a lot.
That said, prior to 2.1, lucene would put lock files outside the index
directory. I don't know if that's what you're hitting, though, because I
think the writer should hav
See the wiki:
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-48921635adf2c968f79
36dc07d51dfb40d638b82
-Original Message-
From: Michael Prichard [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 14, 2007 5:02 PM
To: java-user@lucene.apache.org
Subject: Too many open files?!
I am
You can go to Jira and get the patch and/or vote for it:
https://issues.apache.org/jira/browse/LUCENE-489
[Not that this issue needs much voting, I just like the idea of of
encouraging voting. Get Out the Vote (if that's TM'd, I take it back.)]
-Original Message-
From: Otis Gospodnetic
e date/time values for the
query? In my case I have done nothing special to index my dates. I
just treat them as a string of numbers.
-Original Message-
From: Steven Parkes [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 17, 2006 12:13 PM
To: java-user@lucene.apache.org
Subject: RE: Bo
Lucene takes your date range, enumerates all the unique date/time values
in your corpus within that range, and then executes that query. So the
number of terms in your query is going to be equal to the number of
unique date/time values in the range.
The most common way of handling this is to not i
I think the idea is that 2.0.1 would be a patch-fix release from the
branch created at 2.0 release. This release would incorporate only
back-ported high-impact patches, where "high-impact" is defined by the
community. Certainly security vulnerabilities would be included. As Otis
said, to date, nobo
to:[EMAIL PROTECTED]
> Sent: Thursday, October 05, 2006 2:53 PM
> To: java-user@lucene.apache.org
> Subject: Problem with Field.Text()
>
> I hope now I am in the right mailinglist. In the -dev mailinglist
Steven
> Parkes said, that I have to change this:
>
> > Fiel
hope now I am in the right mailinglist. In the -dev mailinglist Steven
Parkes said, that I have to change this:
> Field.Text(String, String);
to
> Field.Text(String, String, Field.Store.YES, Field.Index.TOKENIZED);
But it seems that there isnt such a method declaration. Where is the
m
I stopped procrastinating on this today.
I signed up for a BOF slot at 8 on Thursday. Hopefully not against other
stuff of interest.
I've not done this before, but the BOF slots were filling.
>From my perspective, it'd be great to have people from any of the
subprojects. Plenty of cross fertiliz
So it looks like there's only a little Lucene-oriented stuff on the
program at ApacheCon 2006. The Solr talk looks interesting.
I was wondering if there have been any other self/semi-organized things
around Lucene in the past, like a BOF?
--
25 matches
Mail list logo