Re: Understanding lucene indexes and disk I/O

2010-04-13 Thread Michael McCandless
On Tue, Apr 13, 2010 at 11:55 AM, Burton-West, Tom wrote: > At some point maybe the File Formats Document could be updated to make it > clear that the tii has an entry similar to the IntexInterval'th tis entry but > instead of holding frq/prx deltas it holds absolute pointers. Is it worth > e

RE: Understanding lucene indexes and disk I/O

2010-04-13 Thread Burton-West, Tom
, 2010 5:27 AM To: java-user@lucene.apache.org Subject: Re: Understanding lucene indexes and disk I/O Hi Tom, Fear not: we only scan up to 128 terms, to find the specific term. First, the terms dict index (tii) is fully loaded into RAM, and then a binary search is done on this (in-RAM) to find t

Re: Understanding lucene indexes and disk I/O

2010-04-13 Thread Michael McCandless
Hi Tom, Fear not: we only scan up to 128 terms, to find the specific term. First, the terms dict index (tii) is fully loaded into RAM, and then a binary search is done on this (in-RAM) to find the nearest index term just before the term you want. Then, we seek to that spot in the main terms dict

Understanding lucene indexes and disk I/O

2010-04-12 Thread Burton-West, Tom
Hi all, Please let me know if this should be posted instead to the Lucene java-dev list. We have very large tis files (about 36 GB). I have not been too concerned as I assumed that due to the indexing of the tis file by the tii file, only a small portion of the file needed to be read. However