Re: question about field equality in query

2007-04-18 Thread Mohammad Norouzi
Any more idea? On 4/17/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote: Chris, Thank you for your reply, your solution doesnt work in my case because I was thinking of indexing more than one document in a single index and each document representing a table in database. so if I put more than one

Re: adding a field at index-time

2007-04-18 Thread Daniel Noll
William Mee wrote: I'd like to add metadata which I get *after* indexing a document's contents to the index. To be more specific: I'm implementing shingling (detection of near-duplicate documents) and want to add the document fingerprint (which is based on the sequence of tokens) to the index. T

Re: Merge performance

2007-04-18 Thread david m
Michael, Our application includes indexing and archiving documents to meet compliance requirements. A couple of reasons that lead to the merge approach: - Source documents are written to archive media and retrieval is relatively slow. Add to that our processing pipeline (including text extrac

Re: Merge performance

2007-04-18 Thread david m
Erick & Steven, I looked at 845, but I'm a bit confused: Are you suggesting that 845 is the cause for the spikes seen in test Runs 1 & 2 - and that in 2.1 addIndexesNoOptimize() is, under the covers, relying on calls to ramSizeInBytes() to trigger new segment creation before hitting the 10,000 v

Re: MultiSearcher vs MultiReader

2007-04-18 Thread Grant Ingersoll
You are correct, sir! I failed Lucene History 101 :-) And I failed my fundamental rule for discussing a search library, which is to do a search first to see if the answer already exists! At any rate, here's some history on it: http://www.gossamer- threads.com/lists/lucene/java-dev/22104 N

Re: MultiSearcher vs MultiReader

2007-04-18 Thread Yonik Seeley
On 4/18/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: At any rate, MultiSearcher has been around a lot longer (2001 versus 2004, or at least that is what the changelog seems to indicate) That doesn't sound right... MultiReader is more fundamental as it's needed to read any multi-segment index.

Re: MultiSearcher vs MultiReader

2007-04-18 Thread Grant Ingersoll
I will try to take a crack at these, but not sure I know exactly what you are looking for, so maybe others can chime in too. At any rate, MultiSearcher has been around a lot longer (2001 versus 2004, or at least that is what the changelog seems to indicate) and it works over Searchables, in

another question about update an unidexed field.

2007-04-18 Thread Omar Didi
Hi all, Before posting this question I have read a couple of old threads on this list on ways to update an existing index. the conclusion was to recreate a new index. unfortunalty this may not be possible in my case (please correct me if I am wrong). let me first describe what I am having trou

Re: Newbie needs help "addField"

2007-04-18 Thread Donna L Gresh
I'm not sure I understand how you are using Lucene without writing code, but here goes (NOT tested) String id = "123"; // this is the identifier String text = "this is some document text"; doc.add(new org.apache.lucene.document.Field("id",id, org.apache.lucene.document.Field.Store.YES, or

Re: Newbie needs help "addField"

2007-04-18 Thread karl wettin
18 apr 2007 kl. 19.15 skrev jim shirreffs: The documents I am indexing into Lucene all have an integer unique ID (21345). This ID is not and will not be in the document context. But I need to the able to retrieve from Lucene the document ID. I know the document ID at index time so I can te

Newbie needs help "addField"

2007-04-18 Thread jim shirreffs
Hi, I have been using Lucene "out of the box" since 1.4.3, wonderful full text engine, I love it. But I can't use it "out of the box" any more, I am going to have to write some code (Oh no! Mr Bill.). I am fairly certain that the code needed will be trivial, but I am unfamiliar with Lucene's A

Job Opportunity at Amazon.com (Seattle)

2007-04-18 Thread Breslin, Dan
Amazon.com's Darwin team is looking for exceptional software engineers to develop algorithms and build systems to automatically detect duplicate products for sale in the Amazon.com catalog. Merchants on Amazon.com provide information about the products they want to sell. Amazon attempts to match

Re: Merge performance

2007-04-18 Thread Michael D. Curtin
d m wrote: I'd like to share index merge performance data and have a couple of questions about it... We (AXS-One, www.axsone.com) build one "master" index per day. For backup and recovery purposes, we also build many individual "mini" indexes from the docs added to the master index. Should one

Re: adding a field at index-time

2007-04-18 Thread karl wettin
18 apr 2007 kl. 18.25 skrev William Mee: The only way I could get this information *before* adding a document to an index is to create a token stream manually (and then have this happen all over again when the document is indexed). This isn't a satisfying solution. Why is it not a satisf

RE: Merge performance

2007-04-18 Thread Steven Parkes
Yup, 845 is relevant, as is 847. I haven't had time to digest all that David wrote yet, but I'm starting. It's particularly relevant because before I get to the point of making 847 committable, I need a way of testing merge performance (the factoring in 847 proposes to simplify the API slightly, so

adding a field at index-time

2007-04-18 Thread William Mee
I'd like to add metadata which I get *after* indexing a document's contents to the index. To be more specific: I'm implementing shingling (detection of near-duplicate documents) and want to add the document fingerprint (which is based on the sequence of tokens) to the index. There doesn't seem

Re: Merge performance

2007-04-18 Thread Erick Erickson
This *may* be relevant, I haven't needed to investigate it yet... http://issues.apache.org/jira/browse/LUCENE-845 Also, see the thread titled "MergeFactor and MaxBufferedDocs value should ...?" for an interesting discussion of how to optimize indexing, although I'm not sure the notion of using I

Merge performance

2007-04-18 Thread d m
I'd like to share index merge performance data and have a couple of questions about it... We (AXS-One, www.axsone.com) build one "master" index per day. For backup and recovery purposes, we also build many individual "mini" indexes from the docs added to the master index. Should one of our maste