Re: Autonomy search technology

2009-04-04 Thread Grant Ingersoll
Note that I believe with some work (marking the "zones" during analysis), one can accomplish this with Spans without the field creation problem that John mentions. -Grant On Apr 3, 2009, at 7:24 PM, John Wang wrote: Not quite.For example, # of fields is static thru out the corpus. # zone

RE: simultaneous indexing and searching causing intermitently long searches.

2009-04-04 Thread Dan OConnor
Mike, Thanks for the response -- we've already jumped on a couple of your suggestions. Here is some feedback and follow ups: We have watched GC times closely in the past. Most of the results of us trying various settings was to make GC worse instead of better. We didn't know about reopen() unti

Re: Term Limit?

2009-04-04 Thread Michael McCandless
OK I opened https://issues.apache.org/jira/browse/LUCENE-1586 to track this. Thanks deminix! Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.or

Re: Term Limit?

2009-04-04 Thread deminix
Ah yes. I'd be happy with the ability to monitor it for now. Assuming it is too involved to remove the limitation. For all practical purposes we should only be using, worst case, 10% of the term space today. That happens to make it risky enough that it needs an eye kept on it, as this will be o

Re: Term Limit?

2009-04-04 Thread Michael McCandless
On Sat, Apr 4, 2009 at 11:57 AM, deminix wrote: > Yea.  That is all that matters anyway right, is the limit at the segment > level? Well... the problem is when merges kick off. You could have N segments that each are below the limit, but when a merge runs the merged segment would try to exceed t

Re: Term Limit?

2009-04-04 Thread Michael McCandless
On Sat, Apr 4, 2009 at 11:52 AM, deminix wrote: > My crude regex'ing of the code has me thinking it is only term vectors that > are limited to 32 bits, since they allocate arrays.  Otherwise it seems > good.  Does that sound right? Not quite... SegmentTermEnum.seek takes "int p". TermInfosReader

Re: Term Limit?

2009-04-04 Thread deminix
Yea. That is all that matters anyway right, is the limit at the segment level? On Sat, Apr 4, 2009 at 8:44 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Sat, Apr 4, 2009 at 10:25 AM, deminix wrote: > > > AFAIK there isn't an api that returns the current number of terms, > cor

Re: Term Limit?

2009-04-04 Thread deminix
My crude regex'ing of the code has me thinking it is only term vectors that are limited to 32 bits, since they allocate arrays. Otherwise it seems good. Does that sound right? On Sat, Apr 4, 2009 at 7:25 AM, deminix wrote: > Thanks for the clarification. > > I'm partitioning the document spac

Re: Term Limit?

2009-04-04 Thread Michael McCandless
On Sat, Apr 4, 2009 at 10:25 AM, deminix wrote: > AFAIK there isn't an api that returns the current number of terms, correct? Alas, no. This limitation has been talked about before... maybe we should add it. But: it's not actually simple to compute, at the MultiSegmentReader level. Each Segme

Re: Term Limit?

2009-04-04 Thread deminix
Thanks for the clarification. I'm partitioning the document space, so I'm not really concerned about the fact documents are ints. Some fields have very unique value spaces though (and many values per document), and they don't align to the same way the documents are partitioned so may have a very

Re: Term Limit?

2009-04-04 Thread Michael McCandless
Correct, and, not that I know of. Mike On Sat, Apr 4, 2009 at 7:55 AM, Murat Yakici wrote: > > I assume the total number of documents that you can index is also limited > by Java max int. Is this correct? Is there any way to index documents > beyond this number in a single index? > > Murat > > >

Re: Term Limit?

2009-04-04 Thread Murat Yakici
I assume the total number of documents that you can index is also limited by Java max int. Is this correct? Is there any way to index documents beyond this number in a single index? Murat > I tentatively think you are correct: the file format itself does not > impose this limitation. > > But in

Re: simultaneous indexing and searching causing intermitently long searches.

2009-04-04 Thread Michael McCandless
On Fri, Apr 3, 2009 at 10:21 PM, Dan OConnor wrote: > All, > > I have a several questions regarding query response time and I would > appreciate any help that can be provided. > > We have a system that indexes approximately 200,000 documents per day at a > fairly constant rate and holds them in

Re: Term Limit?

2009-04-04 Thread Michael McCandless
I tentatively think you are correct: the file format itself does not impose this limitation. But in a least a couple places internally, Lucene uses a java int to hold the term number, which is actually a limit of 2,147,483,648 terms. I'll update fileformats.html for 2.9. Mike On Sat, Apr 4, 200