Re: max number of documents

2005-08-10 Thread Andrzej Bialecki
Yonik Seeley wrote: I think it would be 2 billion. There are many places that woudn't like the overflow to negative docids I think... We have indexes up to 200M documents, so 1/10th the max. 64 bit ids are definitely something to think about for the near future. Until/unless Java grows 64 bit

Re: create a single or multiple index?

2005-08-10 Thread Otis Gospodnetic
Search against multiple indices is easy enough with MultiSearcher or its Parallel cousin. From the usage point of view, it's as simple as using a single IndexSearcher. Otis --- aurora <[EMAIL PROTECTED]> wrote: > I have two sources of data, let's say one is a set of articles and > one is >

create a single or multiple index?

2005-08-10 Thread aurora
I have two sources of data, let's say one is a set of articles and one is from forum messages. I'd like see the opinion on whether to create one single index or separate index for each kind of document. The user interface is not yet finalized. The search result may be presented as separated

Re: max number of documents

2005-08-10 Thread Yonik Seeley
I think it would be 2 billion. There are many places that woudn't like the overflow to negative docids I think... We have indexes up to 200M documents, so 1/10th the max. 64 bit ids are definitely something to think about for the near future. > Who's got Lucene indexes nearing the maximum intege

Re: max number of documents

2005-08-10 Thread Erik Hatcher
On Aug 10, 2005, at 6:47 PM, Chris Fraschetti wrote: maybe this is a stupid question, maybe not... hits.id returns an int.. which would lead me to assume the obvious limitations of the size of the index (size meaning number of docs) ... assuming I reach this limit, can I expect lucene to throw s

max number of documents

2005-08-10 Thread Chris Fraschetti
maybe this is a stupid question, maybe not... hits.id returns an int.. which would lead me to assume the obvious limitations of the size of the index (size meaning number of docs) ... assuming I reach this limit, can I expect lucene to throw some sort of exception? What is the best practice for t

RE: Indexing terms limit

2005-08-10 Thread Tim Johnson
I posted before with issues about searching multiple indexes to get a total number of docs that match a query with a given category. The best performance I found was to have a separate index for each index and iterate over each category and do a hits.length() to get the total hits. Well each que

Re: Indexing terms limit

2005-08-10 Thread Chris Hostetter
: I'm currently attempting to index the distinct list of terms found in a : Lucene index using the TermEnum. I'm creating a document with each list : and indexing the document of terms. It appears there's a limit of : 10,000 distinct terms within a given document. Can this be overcome?? Out of

Re: Indexing terms limit

2005-08-10 Thread Chris Lamprecht
See IndexWriter.setMaxFieldLength(), I think it's what you want: from javadocs: public void setMaxFieldLength(int maxFieldLength) The maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with ve

Re: Why is Hits.java not Serializable?

2005-08-10 Thread Ali Rouhi
Thanks for the quick answer! Ali On 8/10/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > Ali Rouhi wrote: > > I can think of 3 reasons why search methods returning Hits objects > > are not exposed in Searchable: > > > > 1) Someone forgot to declare Hits Serializable > > 2) There is a fundamental r

Indexing terms limit

2005-08-10 Thread Tim Johnson
I'm currently attempting to index the distinct list of terms found in a Lucene index using the TermEnum. I'm creating a document with each list and indexing the document of terms. It appears there's a limit of 10,000 distinct terms within a given document. Can this be overcome?? --

Re: DOM or XML representation of a query?

2005-08-10 Thread jian chen
Well, the good practice I think is to decouple the backend from the front end as much as possible. You might have different versions of java running for each end and also, there might be code compatibility issues with different versions. Jian On 8/10/05, Andrew Boyd <[EMAIL PROTECTED]> wrote: > Q

Indexing document instances and retrieving instance attributes

2005-08-10 Thread Chris D
I'm adding files to an index over time, so after some time I'm likely to see the same file more than once. I would like to be able to search for the information about that particular instance of the file (Filename, date etc) For instance I index File1 and then File2 (which are identical) at differe

Re: Why is Hits.java not Serializable?

2005-08-10 Thread Doug Cutting
Ali Rouhi wrote: I can think of 3 reasons why search methods returning Hits objects are not exposed in Searchable: 1) Someone forgot to declare Hits Serializable 2) There is a fundamental reason the forms of search which return Hits objects cannot be called remotely, some non optimal form of se

Re: DOM or XML representation of a query?

2005-08-10 Thread Andrew Boyd
Query is Serializable why not use that? -Original Message- From: Roy Klein <[EMAIL PROTECTED]> Sent: Aug 10, 2005 10:08 AM To: java-user@lucene.apache.org Subject: DOM or XML representation of a query? Hi, The "front-end" guys working on my application need a way to pass me complex quer

RE: OutOfMemoryError on addIndexes()

2005-08-10 Thread Otis Gospodnetic
Is -Xmx case sensitive? Should it be 1000m instead of 1000M? Not sure. Otis --- Trezzi Michael <[EMAIL PROTECTED]> wrote: > I tried it first without any parameter, and now just for test i tried > it with -Xmx1000M, but still no luck. It gives the error almost > instantly. > > Michael > > __

Re: DOM or XML representation of a query?

2005-08-10 Thread Andrzej Bialecki
Roy Klein wrote: Hi, The "front-end" guys working on my application need a way to pass me complex queries. I was thinking that it'd be pretty straightforward to hand them a package which helps them to create a DOM object which describes a query (i.e. nested Booleans combined with phrases and key

DOM or XML representation of a query?

2005-08-10 Thread Roy Klein
Hi, The "front-end" guys working on my application need a way to pass me complex queries. I was thinking that it'd be pretty straightforward to hand them a package which helps them to create a DOM object which describes a query (i.e. nested Booleans combined with phrases and keyword searches, sort

Re: Why is Hits.java not Serializable?

2005-08-10 Thread Miles Barr
On Tue, 2005-08-09 at 23:37 -0700, Ali Rouhi wrote: > I can think of 3 reasons why search methods returning Hits objects > are not exposed in Searchable: > > 1) Someone forgot to declare Hits Serializable > 2) There is a fundamental reason the forms of search which return Hits > objects cannot be

RE: OutOfMemoryError on addIndexes()

2005-08-10 Thread Trezzi Michael
I tried it first without any parameter, and now just for test i tried it with -Xmx1000M, but still no luck. It gives the error almost instantly. Michael Od: Ian Lea [mailto:[EMAIL PROTECTED] Odesláno: st 10.8.2005 12:34 Komu: java-user@lucene.apache.org Předmět

Re: OutOfMemoryError on addIndexes()

2005-08-10 Thread Ian Lea
How much memory are you giving your programs? java-Xmxset maximum Java heap size -- Ian. On 10/08/05, Trezzi Michael <[EMAIL PROTECTED]> wrote: > Hello, > I have a problem and i tried everything i could think of to solve it. TO > understand my situation, i create indexes on several

OutOfMemoryError on addIndexes()

2005-08-10 Thread Trezzi Michael
Hello, I have a problem and i tried everything i could think of to solve it. TO understand my situation, i create indexes on several computers on our network and they are copied to one server. There, once a day, they are merged into one masterIndex, which is then searched. The problem is in merg