Re: Non-index files under the search directory
Correct, this data is associated with individual IndexCommits (you should be able to see the key-value pairs in the segment_xy files' raw contents in an index directory). To consolidate the entries, you'll have to retrieve user data from each sub-index, put all of them into a new map, then set this data on the aggregate writer. On Tue, Nov 22, 2016 at 9:02 PM, Xiaolong Zheng wrote: > Hi András, > > Thanks, this is what I need! > > I also notice this user commit data does not carry over if I am > consolidating several search database into a new one, I guess the solution > should be explicitly use getCommitData for each sub-index, then set it into > new consolidated search database, right? > > Best, > > --Xiaolong > > > On Tue, Nov 22, 2016 at 12:10 PM, András Péteri > wrote: > >> Hi Xiaolong, >> >> A Map of key-value pairs can be supplied to >> IndexWriter#setCommitData(Map) and will be persisted >> when committing changes (setting the commit data counts as a change). >> It can be retrieved with IndexWriter#getCommitData() later. >> >> This may serve as good storage for metadata; as an example, >> Elasticsearch stores attributes related to its transaction log there >> (UUID and generation identifier). >> >> Regards, >> András >> >> On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng >> wrote: >> > Thanks, StoredField seems still down to the per-document level, which >> means >> > for every document they will contains this search field. >> > >> > What I really would like is a global level storage to hold this single >> > value. Maybe this is impossible. >> > >> > Sincerely, >> > >> > --Xiaolong >> > >> > >> > On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless < >> > luc...@mikemccandless.com> wrote: >> > >> >> Lucene won't merge foreign files for you, and in general it's >> >> dangerous to put such files into Lucene's index directory because if >> >> they look like codec files Lucene may delete them. >> >> >> >> Can you just add a StoredField to each document to hold your >> information? >> >> >> >> Mike McCandless >> >> >> >> http://blog.mikemccandless.com >> >> >> >> >> >> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng >> >> wrote: >> >> > Hello, >> >> > >> >> > I am trying to adding some meta data into the search data base. >> Instead >> >> of >> >> > adding a new search filed or adding a phony document, I am looking at >> the >> >> > method org.apache.lucene.store.Directory#createOutpu, which is create >> >> new >> >> > file in the search directory. >> >> > >> >> > >> >> > I am wondering does indexwriter can also merge this non-index file >> while >> >> it >> >> > merging multiple search index? >> >> > >> >> > And if I am stepping back a little bit, what's is the best way to add >> >> meta >> >> > data into the search database. >> >> > >> >> > For example, I would like to add a indicator which is showing the >> >> different >> >> > kind of stemmer is being used while it created. >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > Thanks, >> >> > >> >> > --Xiaolong >> >> >> >> -- >> András Péteri >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> -- András Péteri - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Understanding Query Parser Behavior
Hi, You should double check which analyzer you are using during indexing. The same analyzer on the same string should produce the same tokens. Mike McCandless http://blog.mikemccandless.com On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi wrote: > Could someone elaborate this. > > On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi wrote: > >> Hello, >> Can you help me out on your "No" . >> >> On Mon, Nov 21, 2016 at 11:16 PM, wmartin...@gmail.com < >> wmartin...@gmail.com> wrote: >> >>> No >>> >>> Sent from my LG G4, an AT&T 4G LTE smartphone >>> >>> -- Original message-- >>> *From: *Peru Redmi >>> *Date: *Mon, Nov 21, 2016 10:44 AM >>> *To: *java-user@lucene.apache.org; >>> *Cc: * >>> *Subject:*Understanding Query Parser Behavior >>> >>> Hello All ,Could someone explain *QueryParser* behavior on these cases1. >>> While Indexing ,Document doc = new Document();doc.add(new Field("*Field*", >>> "*http://www.google.com*";, Field.Store.YES, Field.Index.ANALYZED)); >>> index has *two* terms - *http* & *www.google.com**2.* While searching >>> ,Analyzer anal = new *ClassicAnalyzer*(Version.LUCENE_30, >>> newStringReader(""));QueryParser parser=new >>> *MultiFieldQueryParser*(Version.LUCENE_30, >>> newString[]{"*Field*"},anal);Query query = >>> parser.parse("*http://www.google.com *");Now , query has *three *terms - >>> (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3 terms >>> while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both >>> cases ) ?ii) is this expected behavior of >>> ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done to >>> avoid query part *(Field://) *?Thanks,Peru. >>> >>> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Range query on date field
Hi - i seem to be having trouble correctly executing a range query on a date field. The following Solr document is indexed via a unit test followed by a commit: view test_key 2013-01-09T17:11:40Z I can retrieve the document simply wrapping term queries in a boolean query like this: BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder(); Query typeQuery = new TermQuery(new Term("type", "view")); queryBuilder.add(typeQuery, Occur.MUST); long count = searcher.get().count(queryBuilder.build()); This gets me exactly 1 in variable count. This is all fine. But i also need to restrict the query to a date, so i add a simple (or so i thought) range query! TermRangeQuery timeQuery = TermRangeQuery.newStringRange("time", date + "T00:00:00Z", date + "T23:59:59Z", true, true); queryBuilder.add(timeQuery, Occur.MUST); But no, it doesn't work. No matter what i do, i don't get any results! Thinking there is something wrong with my range query, i even tried StandardQueryParser, nothing can go wrong if Lucene builds the query for me right? StandardQueryParser parser = new StandardQueryParser(); Query q = parser.parse(type:view AND time:[" + date + "T00:00:00Z TO " + date + "T23:59:59Z]", "query"); In both cases, toString of the final query yields similar results, only the order is different. The letters T and Z are somehow lowercased by the query parser. I feel incredible stupid so many thanks in advance! Markus - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Understanding Query Parser Behavior
Hello Mike, Here is, how i analyze my text using QueryParser ( with ClassicAnalyzer) and plain ClassicAnalyzer. On checking the same in luke, i get "//" as RegexQuery. Here is my code snippet: String value = "http\\://www.google.com"; > Analyzer anal = new ClassicAnalyzer(Version.LUCENE_30, new > StringReader("")); > QueryParser parser = new QueryParser(Version.LUCENE_30, "name", > anal); > Query query = parser.parse(value); > System.out.println(" output terms from query parser ::" + query); > > ArrayList list = new ArrayList(); > TokenStream stream = anal.tokenStream("name", new > StringReader(value)); > stream.reset(); > while (stream.incrementToken()) > { > > list.add(stream.getAttribute(CharTermAttribute.class).toString()); > } > System.out.println(" output terms from analyzer " + list); output: output terms from query parser ::name:http name:// name:www.google.com output terms from analyzer [http, www.google.com] On Thu, Nov 24, 2016 at 5:10 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Hi, > > You should double check which analyzer you are using during indexing. > > The same analyzer on the same string should produce the same tokens. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi wrote: > > Could someone elaborate this. > > > > On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi > wrote: > > > >> Hello, > >> Can you help me out on your "No" . > >> > >> On Mon, Nov 21, 2016 at 11:16 PM, wmartin...@gmail.com < > >> wmartin...@gmail.com> wrote: > >> > >>> No > >>> > >>> Sent from my LG G4, an AT&T 4G LTE smartphone > >>> > >>> -- Original message-- > >>> *From: *Peru Redmi > >>> *Date: *Mon, Nov 21, 2016 10:44 AM > >>> *To: *java-user@lucene.apache.org; > >>> *Cc: * > >>> *Subject:*Understanding Query Parser Behavior > >>> > >>> Hello All ,Could someone explain *QueryParser* behavior on these > cases1. While Indexing ,Document doc = new Document();doc.add(new > Field("*Field*", "*http://www.google.com*";, Field.Store.YES, > Field.Index.ANALYZED)); index has *two* terms - *http* & * > www.google.com**2.* While searching ,Analyzer anal = new > *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser > parser=new *MultiFieldQueryParser*(Version.LUCENE_30, > newString[]{"*Field*"},anal);Query query = parser.parse("*http://www. > google.com *");Now , query has *three *terms - (Field:http) > *(Field://)* (Field:www.google.com)i) Why I have got 3 terms while > parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both cases > ) ?ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30) > onParser ?iii) what should be done to avoid query part *(Field://) > *?Thanks,Peru. > >>> > >>> > >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
term frequency
I'm using SOLRJ to find term frequency for each term in a field, I wrote this code but it is not working: 1. String urlString = "http://localhost:8983/solr/huda";; 2. SolrClient solr = new HttpSolrClient.Builder(urlString).build(); 3. 4. SolrQuery query = new SolrQuery(); 5. query.setTerms(true); 6. query.addTermsField("name"); 7. SolrRequest req = new QueryRequest(query); 8. QueryResponse rsp = req.process(solr); 9. 10. System.out.println(rsp); 11. 12. System.out.println("numFound: " + rsp.getResults().getNumFound()); 13. 14. TermsResponse termResp =rsp.getTermsResponse(); 15. List terms = termResp.getTerms("name"); 16. System.out.print(terms.size()); I got this error: Exception in thread "main" java.lang.NullPointerException at solr_test.solr.App2.main(App2.java:50)
Re: term frequency
the exception line does not match the code you pasted, but do make sure your object actually not null before accessing its method. On Thu, Nov 24, 2016 at 5:42 PM, huda barakat wrote: > I'm using SOLRJ to find term frequency for each term in a field, I wrote > this code but it is not working: > > >1. String urlString = "http://localhost:8983/solr/huda";; >2. SolrClient solr = new HttpSolrClient.Builder(urlString).build(); >3. >4. SolrQuery query = new SolrQuery(); >5. query.setTerms(true); >6. query.addTermsField("name"); >7. SolrRequest req = new QueryRequest(query); >8. QueryResponse rsp = req.process(solr); >9. >10. System.out.println(rsp); >11. >12. System.out.println("numFound: " + > rsp.getResults().getNumFound()); >13. >14. TermsResponse termResp =rsp.getTermsResponse(); >15. List terms = termResp.getTerms("name"); >16. System.out.print(terms.size()); > > > I got this error: > > Exception in thread "main" java.lang.NullPointerException at > solr_test.solr.App2.main(App2.java:50) - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how do lucene read large index files?
Erick, Thanks a lot for sharing an excellent post... Btw, am using NIOFSDirectory, could you please elaborate on below mentioned lines? or any further pointers? NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code > has to do a lot of syscalls to the O/S kernel to copy blocks of data > between the disk or filesystem cache and our buffers residing in Java heap. > This needs to be done on every search request, over and over again. -- Kumaran R On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson wrote: > see Uwe's blog: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Short form: files are read into the OS's memory as needed. the whole > file isn't read at once. > > Best, > Erick > > On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian > wrote: > > Hi All, > > > > how do lucene read large index files? > > for example, if one file (for eg: .dat file) is 4GB. > > lucene read only part of file to RAM? or > > is it different approach for different lucene file formats? > > > > > > Related Link: > > How do applications (and OS) handle very big files? > > http://superuser.com/a/361201 > > > > > > -- > > Kumaran R > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: how do lucene read large index files?
Not really, as I don't know that code well, Uwe and company are the masters of that realm ;) Sorry I can't be more help there Erick On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian wrote: > Erick, Thanks a lot for sharing an excellent post... > > Btw, am using NIOFSDirectory, could you please elaborate on below mentioned > lines? or any further pointers? > > NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code >> has to do a lot of syscalls to the O/S kernel to copy blocks of data >> between the disk or filesystem cache and our buffers residing in Java heap. >> This needs to be done on every search request, over and over again. > > > > > -- > Kumaran R > > > > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson > wrote: > >> see Uwe's blog: >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html >> >> Short form: files are read into the OS's memory as needed. the whole >> file isn't read at once. >> >> Best, >> Erick >> >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian >> wrote: >> > Hi All, >> > >> > how do lucene read large index files? >> > for example, if one file (for eg: .dat file) is 4GB. >> > lucene read only part of file to RAM? or >> > is it different approach for different lucene file formats? >> > >> > >> > Related Link: >> > How do applications (and OS) handle very big files? >> > http://superuser.com/a/361201 >> > >> > >> > -- >> > Kumaran R >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: how do lucene read large index files?
Hi Kumaran, hi Erick, > Not really, as I don't know that code well, Uwe and company > are the masters of that realm ;) > > Sorry I can't be more help there I can help! > On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian > wrote: > > Erick, Thanks a lot for sharing an excellent post... > > > > Btw, am using NIOFSDirectory, could you please elaborate on below > mentioned > > lines? or any further pointers? > > NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our > code > >> has to do a lot of syscalls to the O/S kernel to copy blocks of data > >> between the disk or filesystem cache and our buffers residing in Java > heap. > >> This needs to be done on every search request, over and over again. the blog post just says it simple: You should use MMapDirectory and avoid SimpleFSDir or MMapDirectory! The blog post explains why: SimpleFSDir and NIOFSDir extend BufferedIndexInput. This class uses an on-heap buffer for reading index files (which is 16 KB). For some parts of the index (like doc values), this is not ideal. E.g. if you sort against a doc values field and it needs to access a sort value (e.g. a short, integer or byte, which is very small), it will ask the buffer for the like 4 bytes. In most cases when sorting the buffer will not contain those byte, as sorting requires random access over a huge file (so it is unlikely that the buffer will help). Then BufferedIndexInput will seek the NIO/Simple file pointer and read 16 KiB into the buffer. This requires a syscall to the OS kernel, which is expensive. During sorting search results this can be millions or billions of times. In addition it will copy chunks of memory between Java heap and operating system cache over and over. With MMapDirectory no buffering is done, the Lucene code directly accesses the file system cache and this is much more optimized. So for fast index access: - avoid SimpleFSDir or NIOFSDir (those are only there for legacy 32 bit operating systems and JVMs) - configure your operating system kernel as described in the blog post and use MMapDirectory - tell the sysadmin to inform himself about the output of linux commands free/top/... (or Windows complements). Uwe > > -- > > Kumaran R > > > > > > > > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson > > > wrote: > > > >> see Uwe's blog: > >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on- > 64bit.html > >> > >> Short form: files are read into the OS's memory as needed. the whole > >> file isn't read at once. > >> > >> Best, > >> Erick > >> > >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian > >> wrote: > >> > Hi All, > >> > > >> > how do lucene read large index files? > >> > for example, if one file (for eg: .dat file) is 4GB. > >> > lucene read only part of file to RAM? or > >> > is it different approach for different lucene file formats? > >> > > >> > > >> > Related Link: > >> > How do applications (and OS) handle very big files? > >> > http://superuser.com/a/361201 > >> > > >> > > >> > -- > >> > Kumaran R > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how do lucene read large index files?
Thanks Uwe! On Thu, Nov 24, 2016 at 9:41 AM, Uwe Schindler wrote: > Hi Kumaran, hi Erick, > >> Not really, as I don't know that code well, Uwe and company >> are the masters of that realm ;) >> >> Sorry I can't be more help there > > I can help! > >> On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian >> wrote: >> > Erick, Thanks a lot for sharing an excellent post... >> > >> > Btw, am using NIOFSDirectory, could you please elaborate on below >> mentioned >> > lines? or any further pointers? >> > NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our >> code >> >> has to do a lot of syscalls to the O/S kernel to copy blocks of data >> >> between the disk or filesystem cache and our buffers residing in Java >> heap. >> >> This needs to be done on every search request, over and over again. > > the blog post just says it simple: You should use MMapDirectory and avoid > SimpleFSDir or MMapDirectory! The blog post explains why: SimpleFSDir and > NIOFSDir extend BufferedIndexInput. This class uses an on-heap buffer for > reading index files (which is 16 KB). For some parts of the index (like doc > values), this is not ideal. E.g. if you sort against a doc values field and > it needs to access a sort value (e.g. a short, integer or byte, which is very > small), it will ask the buffer for the like 4 bytes. In most cases when > sorting the buffer will not contain those byte, as sorting requires random > access over a huge file (so it is unlikely that the buffer will help). Then > BufferedIndexInput will seek the NIO/Simple file pointer and read 16 KiB into > the buffer. This requires a syscall to the OS kernel, which is expensive. > During sorting search results this can be millions or billions of times. In > addition it will copy chunks of memory between Java heap and operating system > cache over and over. > > With MMapDirectory no buffering is done, the Lucene code directly accesses > the file system cache and this is much more optimized. > > So for fast index access: > - avoid SimpleFSDir or NIOFSDir (those are only there for legacy 32 bit > operating systems and JVMs) > - configure your operating system kernel as described in the blog post and > use MMapDirectory > - tell the sysadmin to inform himself about the output of linux commands > free/top/... (or Windows complements). > > Uwe > >> > -- >> > Kumaran R >> > >> > >> > >> > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson >> >> > wrote: >> > >> >> see Uwe's blog: >> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on- >> 64bit.html >> >> >> >> Short form: files are read into the OS's memory as needed. the whole >> >> file isn't read at once. >> >> >> >> Best, >> >> Erick >> >> >> >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian >> >> wrote: >> >> > Hi All, >> >> > >> >> > how do lucene read large index files? >> >> > for example, if one file (for eg: .dat file) is 4GB. >> >> > lucene read only part of file to RAM? or >> >> > is it different approach for different lucene file formats? >> >> > >> >> > >> >> > Related Link: >> >> > How do applications (and OS) handle very big files? >> >> > http://superuser.com/a/361201 >> >> > >> >> > >> >> > -- >> >> > Kumaran R >> >> >> >> - >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
java.lang.IndexOutOfBoundsException: Index: 9634, Size: 97 opening an index
Hi all, I have a client who has what appears to be a corrupted Lucene index. When they try and openthe index they get: java.lang.IndexOutOfBoundsException: Index: 9634, Size: 97 at java.util.ArrayList.rangeCheck(ArrayList.java:638) at java.util.ArrayList.get(ArrayList.java:414) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:255) at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:244) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133) at org.apache.lucene.index.TermInfosReaderIndex.(TermInfosReaderIndex.java:76) at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:116) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:83) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:94) at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:105) at org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:78) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:709) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:72) at org.apache.lucene.index.IndexReader.open(IndexReader.java:256) Running CheckIndex didn't seem to really help: NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled Opening index @ C:\Issue\TextIndex Segments file=segments_2 numSegments=1 version=3.6.2 format=FORMAT_3_1 [Lucene 3.1+] 1 of 1: name=_64 docCount=1764481 compound=false hasProx=true numFiles=10 size (MB)=119.050,043 diagnostics = {os=Windows Server 2012, java.vendor=Oracle Corporation, java. version=1.8.0_05, lucene.version=3.6.2-SNAPSHOT - 2014-01-16 16:14:14, mergeMaxN umSegments=1, os.arch=amd64, source=merge, mergeFactor=20, os.version=6.2} no deletions test: open reader.FAILED WARNING: fixIndex() would remove reference to this segment; full exception: java.lang.IndexOutOfBoundsException: Index: 9634, Size: 97 at java.util.ArrayList.rangeCheck(ArrayList.java:638) at java.util.ArrayList.get(ArrayList.java:414) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:255) at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:244) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133) at org.apache.lucene.index.TermInfosReaderIndex.(TermInfosReaderIndex.java:76) at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:116) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:83) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:94) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:523) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1064) WARNING: 1 broken segments (containing 1764481 documents) detected WARNING: 1764481 documents will be lost NOTE: will write new segments file in 5 seconds; this will remove 1764481 docs f rom the index. THIS IS YOUR LAST CHANCE TO CTRL+C! 5... 4... 3... 2... 1... Writing... OK Wrote new segments file "segments_3" Are there any approaches to try and repair this index? It is 120 GB in size and there are no backups.. :-/ Cheers, David