Re: Non-index files under the search directory

2016-11-24 Thread András Péteri
Correct, this data is associated with individual IndexCommits (you
should be able to see the key-value pairs in the segment_xy files' raw
contents in an index directory). To consolidate the entries, you'll
have to retrieve user data from each sub-index, put all of them into a
new map, then set this data on the aggregate writer.

On Tue, Nov 22, 2016 at 9:02 PM, Xiaolong Zheng  wrote:
> Hi András,
>
> Thanks, this is what I need!
>
>  I also notice this user commit data does not carry over if I am
> consolidating several search database into a new one, I guess the solution
> should be explicitly use getCommitData for each sub-index, then set it into
> new consolidated search database, right?
>
> Best,
>
> --Xiaolong
>
>
> On Tue, Nov 22, 2016 at 12:10 PM, András Péteri > wrote:
>
>> Hi Xiaolong,
>>
>> A Map of key-value pairs can be supplied to
>> IndexWriter#setCommitData(Map) and will be persisted
>> when committing changes (setting the commit data counts as a change).
>> It can be retrieved with IndexWriter#getCommitData() later.
>>
>> This may serve as good storage for metadata; as an example,
>> Elasticsearch stores attributes related to its transaction log there
>> (UUID and generation identifier).
>>
>> Regards,
>> András
>>
>> On Tue, Nov 22, 2016 at 5:40 PM, Xiaolong Zheng 
>> wrote:
>> > Thanks, StoredField seems still down to the per-document level, which
>> means
>> > for every document they will contains this search field.
>> >
>> > What I really would like is a global level storage to hold this single
>> > value. Maybe this is impossible.
>> >
>> > Sincerely,
>> >
>> > --Xiaolong
>> >
>> >
>> > On Tue, Nov 22, 2016 at 5:13 AM, Michael McCandless <
>> > luc...@mikemccandless.com> wrote:
>> >
>> >> Lucene won't merge foreign files for you, and in general it's
>> >> dangerous to put such files into Lucene's index directory because if
>> >> they look like codec files Lucene may delete them.
>> >>
>> >> Can you just add a StoredField to each document to hold your
>> information?
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >>
>> >> On Mon, Nov 21, 2016 at 11:38 PM, Xiaolong Zheng
>> >>  wrote:
>> >> > Hello,
>> >> >
>> >> > I am trying to adding some meta data into the search data base.
>> Instead
>> >> of
>> >> > adding a new search filed or adding a phony document, I am looking at
>> the
>> >> > method org.apache.lucene.store.Directory#createOutpu, which is create
>> >> new
>> >> > file in the search directory.
>> >> >
>> >> >
>> >> > I am wondering does indexwriter can also merge this non-index file
>> while
>> >> it
>> >> > merging multiple search index?
>> >> >
>> >> > And if I am stepping back a little bit, what's is the best way to add
>> >> meta
>> >> > data into the search database.
>> >> >
>> >> > For example, I would like to add a indicator which is showing the
>> >> different
>> >> > kind of stemmer is being used while it created.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > --Xiaolong
>> >>
>>
>> --
>> András Péteri
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

-- 
András Péteri

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Understanding Query Parser Behavior

2016-11-24 Thread Michael McCandless
Hi,

You should double check which analyzer you are using during indexing.

The same analyzer on the same string should produce the same tokens.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi  wrote:
> Could someone elaborate this.
>
> On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi  wrote:
>
>> Hello,
>> Can you help me out on your "No" .
>>
>> On Mon, Nov 21, 2016 at 11:16 PM, wmartin...@gmail.com <
>> wmartin...@gmail.com> wrote:
>>
>>> No
>>>
>>> Sent from my LG G4, an AT&T 4G LTE smartphone
>>>
>>> -- Original message--
>>> *From: *Peru Redmi
>>> *Date: *Mon, Nov 21, 2016 10:44 AM
>>> *To: *java-user@lucene.apache.org;
>>> *Cc: *
>>> *Subject:*Understanding Query Parser Behavior
>>>
>>> Hello All ,Could someone explain *QueryParser* behavior on these cases1. 
>>> While Indexing ,Document doc = new Document();doc.add(new Field("*Field*", 
>>> "*http://www.google.com*";, Field.Store.YES, Field.Index.ANALYZED));  
>>> index has *two* terms - *http* & *www.google.com**2.* While searching 
>>> ,Analyzer anal = new *ClassicAnalyzer*(Version.LUCENE_30, 
>>> newStringReader(""));QueryParser parser=new 
>>> *MultiFieldQueryParser*(Version.LUCENE_30, 
>>> newString[]{"*Field*"},anal);Query query = 
>>> parser.parse("*http://www.google.com *");Now , query has *three *terms  -  
>>> (Field:http) *(Field://)* (Field:www.google.com)i) Why I have got 3 terms 
>>> while parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both 
>>> cases ) ?ii) is this expected behavior of 
>>> ClassicAnalyzer(Version.LUCENE_30) onParser ?iii) what should be done to 
>>> avoid query part *(Field://) *?Thanks,Peru.
>>>
>>>
>>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Range query on date field

2016-11-24 Thread Markus Jelsma
Hi - i seem to be having trouble correctly executing a range query on a date 
field.

The following Solr document is indexed via a unit test followed by a commit:

  
    view
    test_key
    2013-01-09T17:11:40Z
  

I can retrieve the document simply wrapping term queries in a boolean query 
like this:

  BooleanQuery.Builder queryBuilder = new BooleanQuery.Builder();

  Query typeQuery = new TermQuery(new Term("type", "view"));
  queryBuilder.add(typeQuery, Occur.MUST);
  
  long count = searcher.get().count(queryBuilder.build());

This gets me exactly 1 in variable count. This is all fine. But i also need to 
restrict the query to a date, so i add a simple (or so i thought) range query!

  TermRangeQuery timeQuery = TermRangeQuery.newStringRange("time", date + 
"T00:00:00Z", date + "T23:59:59Z", true, true);
  queryBuilder.add(timeQuery, Occur.MUST);

But no, it doesn't work. No matter what i do, i don't get any results! Thinking 
there is something wrong with my range query, i even tried StandardQueryParser, 
nothing can go wrong if Lucene builds the query for me right?

  StandardQueryParser parser = new StandardQueryParser();
  Query q = parser.parse(type:view AND time:[" + date + "T00:00:00Z TO " + 
date + "T23:59:59Z]", "query");

In both cases, toString of the final query yields similar results, only the 
order is different. The letters T and Z are somehow lowercased by the query 
parser.

I feel incredible stupid so many thanks in advance!
Markus

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Understanding Query Parser Behavior

2016-11-24 Thread Peru Redmi
Hello Mike,

Here is, how i analyze my text using QueryParser ( with ClassicAnalyzer)
and plain ClassicAnalyzer. On checking the same in luke, i get "//"
as RegexQuery.

Here is my code snippet:

String value = "http\\://www.google.com";
> Analyzer anal = new ClassicAnalyzer(Version.LUCENE_30, new
> StringReader(""));
> QueryParser parser = new QueryParser(Version.LUCENE_30, "name",
> anal);
> Query query = parser.parse(value);
> System.out.println(" output terms from query parser ::" + query);



>
> ArrayList list = new ArrayList();
> TokenStream stream = anal.tokenStream("name", new
> StringReader(value));
> stream.reset();
> while (stream.incrementToken())
> {
>
> list.add(stream.getAttribute(CharTermAttribute.class).toString());
> }
> System.out.println(" output terms from analyzer " + list);



output:

output terms from query parser ::name:http name:// name:www.google.com
output terms from analyzer [http, www.google.com]






On Thu, Nov 24, 2016 at 5:10 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Hi,
>
> You should double check which analyzer you are using during indexing.
>
> The same analyzer on the same string should produce the same tokens.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 23, 2016 at 9:38 PM, Peru Redmi  wrote:
> > Could someone elaborate this.
> >
> > On Tue, Nov 22, 2016 at 11:41 AM, Peru Redmi 
> wrote:
> >
> >> Hello,
> >> Can you help me out on your "No" .
> >>
> >> On Mon, Nov 21, 2016 at 11:16 PM, wmartin...@gmail.com <
> >> wmartin...@gmail.com> wrote:
> >>
> >>> No
> >>>
> >>> Sent from my LG G4, an AT&T 4G LTE smartphone
> >>>
> >>> -- Original message--
> >>> *From: *Peru Redmi
> >>> *Date: *Mon, Nov 21, 2016 10:44 AM
> >>> *To: *java-user@lucene.apache.org;
> >>> *Cc: *
> >>> *Subject:*Understanding Query Parser Behavior
> >>>
> >>> Hello All ,Could someone explain *QueryParser* behavior on these
> cases1. While Indexing ,Document doc = new Document();doc.add(new
> Field("*Field*", "*http://www.google.com*";, Field.Store.YES,
> Field.Index.ANALYZED));  index has *two* terms - *http* & *
> www.google.com**2.* While searching ,Analyzer anal = new
> *ClassicAnalyzer*(Version.LUCENE_30, newStringReader(""));QueryParser
> parser=new *MultiFieldQueryParser*(Version.LUCENE_30,
> newString[]{"*Field*"},anal);Query query = parser.parse("*http://www.
> google.com *");Now , query has *three *terms  -  (Field:http)
> *(Field://)* (Field:www.google.com)i) Why I have got 3 terms while
> parsing , and 2 terms on indexing (Usingsame ClassicAnalyzer in both cases
> ) ?ii) is this expected behavior of ClassicAnalyzer(Version.LUCENE_30)
> onParser ?iii) what should be done to avoid query part *(Field://)
> *?Thanks,Peru.
> >>>
> >>>
> >>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


term frequency

2016-11-24 Thread huda barakat
I'm using SOLRJ to find term frequency for each term in a field, I wrote
this code but it is not working:


   1. String urlString = "http://localhost:8983/solr/huda";;
   2. SolrClient solr = new HttpSolrClient.Builder(urlString).build();
   3.
   4. SolrQuery query = new SolrQuery();
   5. query.setTerms(true);
   6. query.addTermsField("name");
   7. SolrRequest req = new QueryRequest(query);
   8. QueryResponse rsp = req.process(solr);
   9.
   10. System.out.println(rsp);
   11.
   12. System.out.println("numFound: " +
rsp.getResults().getNumFound());
   13.
   14. TermsResponse termResp =rsp.getTermsResponse();
   15. List terms = termResp.getTerms("name");
   16. System.out.print(terms.size());


I got this error:

Exception in thread "main" java.lang.NullPointerException at
solr_test.solr.App2.main(App2.java:50)


Re: term frequency

2016-11-24 Thread Jason Wee
the exception line does not match the code you pasted, but do make
sure your object actually not null before accessing its method.

On Thu, Nov 24, 2016 at 5:42 PM, huda barakat
 wrote:
> I'm using SOLRJ to find term frequency for each term in a field, I wrote
> this code but it is not working:
>
>
>1. String urlString = "http://localhost:8983/solr/huda";;
>2. SolrClient solr = new HttpSolrClient.Builder(urlString).build();
>3.
>4. SolrQuery query = new SolrQuery();
>5. query.setTerms(true);
>6. query.addTermsField("name");
>7. SolrRequest req = new QueryRequest(query);
>8. QueryResponse rsp = req.process(solr);
>9.
>10. System.out.println(rsp);
>11.
>12. System.out.println("numFound: " +
> rsp.getResults().getNumFound());
>13.
>14. TermsResponse termResp =rsp.getTermsResponse();
>15. List terms = termResp.getTerms("name");
>16. System.out.print(terms.size());
>
>
> I got this error:
>
> Exception in thread "main" java.lang.NullPointerException at
> solr_test.solr.App2.main(App2.java:50)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how do lucene read large index files?

2016-11-24 Thread Kumaran Ramasubramanian
Erick, Thanks a lot for sharing an excellent post...

Btw, am using NIOFSDirectory, could you please elaborate on below mentioned
lines? or any further pointers?

NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code
> has to do a lot of syscalls to the O/S kernel to copy blocks of data
> between the disk or filesystem cache and our buffers residing in Java heap.
> This needs to be done on every search request, over and over again.




--
Kumaran R



On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson 
wrote:

> see Uwe's blog:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Short form: files are read into the OS's memory as needed. the whole
> file isn't read at once.
>
> Best,
> Erick
>
> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian
>  wrote:
> > Hi All,
> >
> > how do lucene read large index files?
> > for example, if one file (for eg: .dat file) is 4GB.
> > lucene read only part of file to RAM? or
> > is it different approach for different lucene file formats?
> >
> >
> > Related Link:
> > How do applications (and OS) handle very big files?
> > http://superuser.com/a/361201
> >
> >
> > --
> > Kumaran R
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: how do lucene read large index files?

2016-11-24 Thread Erick Erickson
Not really, as I don't know that code well, Uwe and company
are the masters of that realm ;)

Sorry I can't be more help there

Erick

On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian
 wrote:
> Erick, Thanks a lot for sharing an excellent post...
>
> Btw, am using NIOFSDirectory, could you please elaborate on below mentioned
> lines? or any further pointers?
>
> NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our code
>> has to do a lot of syscalls to the O/S kernel to copy blocks of data
>> between the disk or filesystem cache and our buffers residing in Java heap.
>> This needs to be done on every search request, over and over again.
>
>
>
>
> --
> Kumaran R
>
>
>
> On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson 
> wrote:
>
>> see Uwe's blog:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> Short form: files are read into the OS's memory as needed. the whole
>> file isn't read at once.
>>
>> Best,
>> Erick
>>
>> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian
>>  wrote:
>> > Hi All,
>> >
>> > how do lucene read large index files?
>> > for example, if one file (for eg: .dat file) is 4GB.
>> > lucene read only part of file to RAM? or
>> > is it different approach for different lucene file formats?
>> >
>> >
>> > Related Link:
>> > How do applications (and OS) handle very big files?
>> > http://superuser.com/a/361201
>> >
>> >
>> > --
>> > Kumaran R
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: how do lucene read large index files?

2016-11-24 Thread Uwe Schindler
Hi Kumaran, hi Erick,

> Not really, as I don't know that code well, Uwe and company
> are the masters of that realm ;)
> 
> Sorry I can't be more help there

I can help!

> On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian
>  wrote:
> > Erick, Thanks a lot for sharing an excellent post...
> >
> > Btw, am using NIOFSDirectory, could you please elaborate on below
> mentioned
> > lines? or any further pointers?
> > NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our
> code
> >> has to do a lot of syscalls to the O/S kernel to copy blocks of data
> >> between the disk or filesystem cache and our buffers residing in Java
> heap.
> >> This needs to be done on every search request, over and over again.

the blog post just says it simple: You should use MMapDirectory and avoid 
SimpleFSDir or MMapDirectory! The blog post explains why: SimpleFSDir and 
NIOFSDir extend BufferedIndexInput. This class uses an on-heap buffer for 
reading index files (which is 16 KB). For some parts of the index (like doc 
values), this is not ideal. E.g. if you sort against a doc values field and it 
needs to access a sort value (e.g. a short, integer or byte, which is very 
small), it will ask the buffer for the like 4 bytes. In most cases when sorting 
the buffer will not contain those byte, as sorting requires random access over 
a huge file (so it is unlikely that the buffer will help). Then 
BufferedIndexInput will seek the NIO/Simple file pointer and read 16 KiB into 
the buffer. This requires a syscall to the OS kernel, which is expensive. 
During sorting search results this can be millions or billions of times. In 
addition it will copy chunks of memory between Java heap and operating system 
cache over and over.

With MMapDirectory no buffering is done, the Lucene code directly accesses the 
file system cache and this is much more optimized.

So for fast index access:
- avoid SimpleFSDir or NIOFSDir (those are only there for legacy 32 bit 
operating systems and JVMs)
- configure your operating system kernel as described in the blog post and use 
MMapDirectory
- tell the sysadmin to inform himself about the output of linux commands 
free/top/... (or Windows complements).

Uwe

> > --
> > Kumaran R
> >
> >
> >
> > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson
> 
> > wrote:
> >
> >> see Uwe's blog:
> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> 64bit.html
> >>
> >> Short form: files are read into the OS's memory as needed. the whole
> >> file isn't read at once.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian
> >>  wrote:
> >> > Hi All,
> >> >
> >> > how do lucene read large index files?
> >> > for example, if one file (for eg: .dat file) is 4GB.
> >> > lucene read only part of file to RAM? or
> >> > is it different approach for different lucene file formats?
> >> >
> >> >
> >> > Related Link:
> >> > How do applications (and OS) handle very big files?
> >> > http://superuser.com/a/361201
> >> >
> >> >
> >> > --
> >> > Kumaran R
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how do lucene read large index files?

2016-11-24 Thread Erick Erickson
Thanks Uwe!




On Thu, Nov 24, 2016 at 9:41 AM, Uwe Schindler  wrote:
> Hi Kumaran, hi Erick,
>
>> Not really, as I don't know that code well, Uwe and company
>> are the masters of that realm ;)
>>
>> Sorry I can't be more help there
>
> I can help!
>
>> On Thu, Nov 24, 2016 at 7:29 AM, Kumaran Ramasubramanian
>>  wrote:
>> > Erick, Thanks a lot for sharing an excellent post...
>> >
>> > Btw, am using NIOFSDirectory, could you please elaborate on below
>> mentioned
>> > lines? or any further pointers?
>> > NIOFSDirectory or SimpleFSDirectory, we have to pay another price: Our
>> code
>> >> has to do a lot of syscalls to the O/S kernel to copy blocks of data
>> >> between the disk or filesystem cache and our buffers residing in Java
>> heap.
>> >> This needs to be done on every search request, over and over again.
>
> the blog post just says it simple: You should use MMapDirectory and avoid 
> SimpleFSDir or MMapDirectory! The blog post explains why: SimpleFSDir and 
> NIOFSDir extend BufferedIndexInput. This class uses an on-heap buffer for 
> reading index files (which is 16 KB). For some parts of the index (like doc 
> values), this is not ideal. E.g. if you sort against a doc values field and 
> it needs to access a sort value (e.g. a short, integer or byte, which is very 
> small), it will ask the buffer for the like 4 bytes. In most cases when 
> sorting the buffer will not contain those byte, as sorting requires random 
> access over a huge file (so it is unlikely that the buffer will help). Then 
> BufferedIndexInput will seek the NIO/Simple file pointer and read 16 KiB into 
> the buffer. This requires a syscall to the OS kernel, which is expensive. 
> During sorting search results this can be millions or billions of times. In 
> addition it will copy chunks of memory between Java heap and operating system 
> cache over and over.
>
> With MMapDirectory no buffering is done, the Lucene code directly accesses 
> the file system cache and this is much more optimized.
>
> So for fast index access:
> - avoid SimpleFSDir or NIOFSDir (those are only there for legacy 32 bit 
> operating systems and JVMs)
> - configure your operating system kernel as described in the blog post and 
> use MMapDirectory
> - tell the sysadmin to inform himself about the output of linux commands 
> free/top/... (or Windows complements).
>
> Uwe
>
>> > --
>> > Kumaran R
>> >
>> >
>> >
>> > On Wed, Nov 23, 2016 at 9:17 PM, Erick Erickson
>> 
>> > wrote:
>> >
>> >> see Uwe's blog:
>> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
>> 64bit.html
>> >>
>> >> Short form: files are read into the OS's memory as needed. the whole
>> >> file isn't read at once.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Wed, Nov 23, 2016 at 12:04 AM, Kumaran Ramasubramanian
>> >>  wrote:
>> >> > Hi All,
>> >> >
>> >> > how do lucene read large index files?
>> >> > for example, if one file (for eg: .dat file) is 4GB.
>> >> > lucene read only part of file to RAM? or
>> >> > is it different approach for different lucene file formats?
>> >> >
>> >> >
>> >> > Related Link:
>> >> > How do applications (and OS) handle very big files?
>> >> > http://superuser.com/a/361201
>> >> >
>> >> >
>> >> > --
>> >> > Kumaran R
>> >>
>> >> -
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



java.lang.IndexOutOfBoundsException: Index: 9634, Size: 97 opening an index

2016-11-24 Thread David Sitsky
Hi all,

I have a client who has what appears to be a corrupted Lucene index.  When
they try and openthe index they get:

java.lang.IndexOutOfBoundsException: Index: 9634, Size: 97

at java.util.ArrayList.rangeCheck(ArrayList.java:638)

at java.util.ArrayList.get(ArrayList.java:414)

at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:255)

at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:244)

at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86)

at 
org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133)

at 
org.apache.lucene.index.TermInfosReaderIndex.(TermInfosReaderIndex.java:76)

at 
org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:116)

at 
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:83)

at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)

at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:94)

at 
org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:105)

at 
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:27)

at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:78)

at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:709)

at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:72)

at org.apache.lucene.index.IndexReader.open(IndexReader.java:256)


Running CheckIndex didn't seem to really help:


NOTE: testing will be more thorough if you run java with
'-ea:org.apache.lucene...', so assertions are enabled

Opening index @ C:\Issue\TextIndex

Segments file=segments_2 numSegments=1 version=3.6.2 format=FORMAT_3_1
[Lucene 3.1+]
1 of 1: name=_64 docCount=1764481
compound=false
hasProx=true
numFiles=10
size (MB)=119.050,043
diagnostics = {os=Windows Server 2012, java.vendor=Oracle Corporation, java.
version=1.8.0_05, lucene.version=3.6.2-SNAPSHOT - 2014-01-16 16:14:14, mergeMaxN
umSegments=1, os.arch=amd64, source=merge, mergeFactor=20, os.version=6.2}
no deletions
test: open reader.FAILED
WARNING: fixIndex() would remove reference to this segment; full exception:
java.lang.IndexOutOfBoundsException: Index: 9634, Size: 97
at java.util.ArrayList.rangeCheck(ArrayList.java:638)
at java.util.ArrayList.get(ArrayList.java:414)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:255)
at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:244)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:86)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:133)
at 
org.apache.lucene.index.TermInfosReaderIndex.(TermInfosReaderIndex.java:76)
at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:116)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:83)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:94)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:523)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1064)

WARNING: 1 broken segments (containing 1764481 documents) detected
WARNING: 1764481 documents will be lost

NOTE: will write new segments file in 5 seconds; this will remove 1764481 docs f
rom the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
5...
4...
3...
2...
1...
Writing...
OK
Wrote new segments file "segments_3"


Are there any approaches to try and repair this index?  It is 120 GB in
size and there are no backups.. :-/

Cheers,
David