Duplicates recods in index

2006-02-08 Thread Anton Potehin
Is it possible to add records into lucene index using following algorithm: 1) create Document object 2) add 5 fields into Document (id, name, field1, field2, field3). All fields are stored, indexed and tokenized 3) check if the document with current id and name was added before 4) if yes

RE: Search on Keyword rather than Text?

2006-02-08 Thread Mike Streeton
Override QueryParser and intercept queries of specific fields producing TermQuery instead of letting it be generated from the analyzed value using the default parser. If you want to look for "New Yo" try also creating a prefix query from the TermQuery. Mike www.ardentia.com the home of NetSearch

Re: Reindexing

2006-02-08 Thread Erik Hatcher
Sorry, I've no clue as I've never used Hibernate and thus never touched its Lucene support. Erik On Feb 8, 2006, at 1:18 AM, Raul Raja Martinez wrote: Hi Eric, I'm in the same situation, I wouldn't normally ask something related to hibernate here but I posted something similar in

AW: Reindexing

2006-02-08 Thread Klaus
Hi, you have to index all object already contained in the database? Then there is no other way then fetching all objects from the database and index them. On Feb 8, 2006, at 1:18 AM, Raul Raja Martinez wrote: > Hi Eric, I'm in the same situation, I wouldn't normally ask > something related t

RE: JVM Crash in Lucene

2006-02-08 Thread Daniel Pfeifer
I resolved this issue for the time-being by adding following parameter to the command: -XX:CompileCommand=exclude,org/apache/lucene/index/IndexReader$1,doBody /Daniel -Original Message- From: Daniel Pfeifer [mailto:[EMAIL PROTECTED] Sent: den 8 februari 2006 08:05 To: java-user@lucene.a

RE: Duplicates recods in index

2006-02-08 Thread Pasha Bizhan
Hi, > From: Anton Potehin [mailto:[EMAIL PROTECTED] > 1) create Document object > > 2) add 5 fields into Document (id, name, field1, field2, > field3). All fields are stored, indexed and tokenized > > 3) check if the document with current id and name was added before Just perform the sear

Re: Queries not derived from the text index

2006-02-08 Thread Erik Hatcher
On Feb 7, 2006, at 6:17 PM, Daniel Noll wrote: So a user might want to enter something like this: text:camel AND tag:zoo In this case we would want a real FieldQuery object for the text:camel portion, and a non-Lucene Query instance for the "tag:zoo" portion which actually queries the

Using Range Queries

2006-02-08 Thread Shivani Sawhney
Hi, I am trying to search across some documents and have min and max experience, min and max ctc and email as some of the search fields. I have problem using the Range Query. The problem is as follows. If I am trying to search for documents with exp between 0 to 9, I get 15 hits, assuming that

RE: Using Range Queries

2006-02-08 Thread Mike Streeton
You need to encode the numbers by padding to the left or another method, we do this we know what fields are numerics and extend QueryParser to encode the fields for searching. We also decode the number on display below is the functions we use, the tricky bit is getting negative numbers to work corr

RE: Using Range Queries

2006-02-08 Thread Koji Sekiguchi
> I guess is that somehow the code is not taking my range as numerals but is > probably doing string compare. Right. We should treat fields as strings. Use "00" to "20" instead of 0 to 20. When doing this, the term values which are indexed should be "00", "01", ... Thanks, Koji > -Original

RE: How to get mapping of query terms to number of their occurrences in a doc?

2006-02-08 Thread Dmitry Goldenberg
Chris, That's what I did, for debugging. The query is "biology", and here's what the API tells me for term frequencies: biolog 15 biologi 31 biologist 4 I actually see 13 occurrences of "biologist" and "biologists", 64 occurrences of "biology", 27 occurrences of "biological". I see "inform 2

RE: Queries not derived from the text index

2006-02-08 Thread John Powers
This may be a tangent, but for my filters and searches, I construct the query with "+" and "-" and what not.. is this not the right way to do this?I haven't had to extend or write any special AND or OR classes, I just write the query and search the once. Any advantage to writing Filter su

RE: How to get mapping of query terms to number of their occurrences in a doc?

2006-02-08 Thread Chris Hostetter
: That's what I did, for debugging. The query is "biology", and here's : what the API tells me for term frequencies: : biolog 15 : biologi 31 : biologist 4 : : I actually see 13 occurrences of "biologist" and "biologists", 64 : occurrences of "biology", 27 occurrences of "biological". : : I see "

RE: How to get mapping of query terms to number of their occurrences in a doc?

2006-02-08 Thread Dmitry Goldenberg
Duh! Bingo! Mistery solved. I should have thought of this :) The discrepancies come in with larger documents, definitely > 10K terms which is Lucene's default maxFieldLength. Thanks for your help, Chris - Dmitry From: Chris Hostetter [mailto:[EMAIL PROTECTED]

scalability recommendations for large performance-intensive indexes

2006-02-08 Thread Vince Taluskie
hello All, I'm looking for some advice on how to improve scalability - we have a fairly large lucene index of 35M documents, max 1k document size (most much smaller) and 14 fields. We combine descriptive text together into a "contents" field and search on that and have been very pleased with han

Re: scalability recommendations for large performance-intensive indexes

2006-02-08 Thread markharw00d
Hi Vince, sounds like the same issue I highlighted recently on the java-dev list. See here: http://www.nabble.com/Preventing-%22killer%22-queries-t1077895.html The problem lies in the underlying cost of reading TermDocs for very common terms (a problem for both queries and filters) For your

1.9 lucene version

2006-02-08 Thread Aigner, Thomas
Hello all, I have a couple of questions for the community about the 1.9 Lucene version. As I understand it, this has not been released and I can't find an approximate date for release (I know you can download the development version and compile it). I see a nightly build going on (http:/

Re: Queries not derived from the text index

2006-02-08 Thread Daniel Noll
John Powers wrote: This may be a tangent, but for my filters and searches, I construct the query with "+" and "-" and what not.. is this not the right way to do this?I haven't had to extend or write any special AND or OR classes, I just write the query and search the once. Any advantage

Re: Queries not derived from the text index

2006-02-08 Thread Daniel Noll
Erik Hatcher wrote: One interesting option is to subclass QueryParser and override getFieldQuery. When the field is "tag", return a FilteredQuery (see trunk codebase, or the nightly 1.9 binaries) using a Filter that interfaces with your database. Caching of the filters would be desirable for

Re: Duplicates recods in index

2006-02-08 Thread Daniel Noll
Pasha Bizhan wrote: Hi, From: Anton Potehin [mailto:[EMAIL PROTECTED] 1) create Document object 2) add 5 fields into Document (id, name, field1, field2, field3). All fields are stored, indexed and tokenized 3) check if the document with current id and name was added before Just perform

Re: Queries not derived from the text index

2006-02-08 Thread Erik Hatcher
On Feb 8, 2006, at 6:46 PM, Daniel Noll wrote: Erik Hatcher wrote: One interesting option is to subclass QueryParser and override getFieldQuery. When the field is "tag", return a FilteredQuery (see trunk codebase, or the nightly 1.9 binaries) using a Filter that interfaces with your data

Too many required clauses for a BooleanQuery

2006-02-08 Thread Kevin Dutcher
Hey Everyone, I'm running into the "More than 32 required/prohibited clauses in query" exception when running a query. I thought I understood the problem but the following two scenarios confuse me. 1st - No Error 33 required clauses plus additional clauses that are left off b/c they are the same

Re: Queries not derived from the text index

2006-02-08 Thread Daniel Noll
Erik Hatcher wrote: Actually I'm pretty certain that it'll work with just getFieldQuery overriding. You can AND or OR a FilteredQuery with any other Query inside a BooleanQuery. I'd be surprised if it didn't work. Scoring is the one tricky caveat to this sort of thing, and perhaps the new "

Build vs. Buy?

2006-02-08 Thread jwang
I'm trying to upgrade our search functionality (currently, RTF/text only, and exact phrase match only) at my company, and have run into some concerns. Our 4 main formats are: RTF - javax.swing looks fine, we use those classes already. MS Word - I know that POI exists, but development on th

Re: Too many required clauses for a BooleanQuery

2006-02-08 Thread Chris Hostetter
I don't know a lot about the error your encountering (or not encountering as the case may be) but please for hte love of all that is sane use a Filter instead of putting all those categories in your Query. Your search performance and your scores will thank you. : Date: Wed, 8 Feb 2006 18:52:22 -

Re: Queries not derived from the text index

2006-02-08 Thread Chris Hostetter
: I think that overriding getFieldQuery would work, yeah... you're right. :It's just a matter of comparing efficienty of this: : : BooleanQuery of (TermQuery, FilteredQuery of (AllDocsQuery, Filter)) : : to the efficiency of this: : : FilteredQuery of (TermQuery, Filter) the third o