Any more idea?
On 4/17/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote:
Chris,
Thank you for your reply, your solution doesnt work in my case because I
was thinking of indexing more than one document in a single index and each
document representing a table in database. so if I put more than one
William Mee wrote:
I'd like to add metadata which I get *after* indexing a document's
contents to the index. To be more specific: I'm implementing
shingling (detection of near-duplicate documents) and want to add the
document fingerprint (which is based on the sequence of tokens) to
the index.
T
Michael,
Our application includes indexing and archiving documents to meet
compliance requirements.
A couple of reasons that lead to the merge approach:
- Source documents are written to archive media and retrieval is
relatively slow. Add to that our processing pipeline (including
text extrac
Erick & Steven,
I looked at 845, but I'm a bit confused:
Are you suggesting that 845 is the cause for the spikes seen in test
Runs 1 & 2 - and that in 2.1 addIndexesNoOptimize() is, under the
covers, relying on calls to ramSizeInBytes() to trigger new segment
creation before hitting the 10,000 v
You are correct, sir! I failed Lucene History 101 :-) And I failed
my fundamental rule for discussing a search library, which is to do a
search first to see if the answer already exists!
At any rate, here's some history on it: http://www.gossamer-
threads.com/lists/lucene/java-dev/22104
N
On 4/18/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
At any rate, MultiSearcher has been around a lot longer (2001 versus
2004, or at least that is what the changelog seems to indicate)
That doesn't sound right... MultiReader is more fundamental as it's needed
to read any multi-segment index.
I will try to take a crack at these, but not sure I know exactly what
you are looking for, so maybe others can chime in too.
At any rate, MultiSearcher has been around a lot longer (2001 versus
2004, or at least that is what the changelog seems to indicate) and
it works over Searchables, in
Hi all,
Before posting this question I have read a couple of old threads on this list
on ways to update an existing index. the conclusion was to recreate a new
index. unfortunalty this may not be possible in my case (please correct me if I
am wrong).
let me first describe what I am having trou
I'm not sure I understand how you are using Lucene without writing code,
but here goes
(NOT tested)
String id = "123"; // this is the identifier
String text = "this is some document text";
doc.add(new org.apache.lucene.document.Field("id",id,
org.apache.lucene.document.Field.Store.YES,
or
18 apr 2007 kl. 19.15 skrev jim shirreffs:
The documents I am indexing into Lucene all have an integer unique
ID (21345). This ID is not and will not be in the document context.
But I need to the able to retrieve from Lucene the document ID. I
know the document ID at index time so I can te
Hi, I have been using Lucene "out of the box" since 1.4.3, wonderful full
text engine, I love it.
But I can't use it "out of the box" any more, I am going to have to write
some code (Oh no! Mr Bill.). I am fairly certain that the code needed will
be trivial, but I am unfamiliar with Lucene's A
Amazon.com's Darwin team is looking for exceptional software engineers
to develop algorithms and build systems to automatically detect
duplicate products for sale in the Amazon.com catalog.
Merchants on Amazon.com provide information about the products they want
to sell. Amazon attempts to match
d m wrote:
I'd like to share index merge performance data and have a couple
of questions about it...
We (AXS-One, www.axsone.com) build one "master" index per day.
For backup and recovery purposes, we also build many individual
"mini" indexes from the docs added to the master index.
Should one
18 apr 2007 kl. 18.25 skrev William Mee:
The only way I could get this information *before* adding a
document to an index is to create a token stream manually (and then
have this happen all over again when the document is indexed). This
isn't a satisfying solution.
Why is it not a satisf
Yup, 845 is relevant, as is 847. I haven't had time to digest all that
David wrote yet, but I'm starting. It's particularly relevant because
before I get to the point of making 847 committable, I need a way of
testing merge performance (the factoring in 847 proposes to simplify the
API slightly, so
I'd like to add metadata which I get *after* indexing a document's contents to
the index. To be more specific: I'm implementing shingling (detection of
near-duplicate documents) and want to add the document fingerprint (which is
based on the sequence of tokens) to the index.
There doesn't seem
This *may* be relevant, I haven't needed to investigate
it yet...
http://issues.apache.org/jira/browse/LUCENE-845
Also, see the thread titled
"MergeFactor and MaxBufferedDocs value should ...?" for an
interesting discussion of how to optimize indexing, although
I'm not sure the notion of using I
I'd like to share index merge performance data and have a couple
of questions about it...
We (AXS-One, www.axsone.com) build one "master" index per day.
For backup and recovery purposes, we also build many individual
"mini" indexes from the docs added to the master index.
Should one of our maste
18 matches
Mail list logo