Re: Size + memory restrictions
Hi Greg, Thanks. We are actually running against 4 segments of 4gb so about 20 million docs. We cant merge the segments as their seems to be problems with out linux box , with having files over about 4gb. Not sure why that is. If I was to upgrade to 8gb of ram does it seem likely this will double the amount of docs we can handle, or would this provide an exponential increase? Thanks Leon - Original Message - From: "Greg Gershman" <[EMAIL PROTECTED]> To: Sent: Wednesday, February 15, 2006 12:41 AM Subject: Re: Size + memory restrictions You may consider incrementally adding documents to your index; I'm not sure why there would be problems adding to an existing index, but you can always add additional documents. You can optimize later to get everything back into a single segment. Querying is a different story; if you are using the Sort API, you will need enough memory to store a full sorting of your documents in memory. If you're trying to sort on a string or anything other than an int or float, this could require a lot of memory. I've used indices much bigger than 5 mil. docs/3.5 gb with less than 4GB of RAM and had no problems. Greg --- Leon Chaddock <[EMAIL PROTECTED]> wrote: Hi, we are having tremendous problems building a large lucene index and querying it. The programmers are telling me that when the index file reaches 3.5 gb or 5 million docs the index file can no longer grow any larger. To rectify this they have built index files in multiple directories. Now apparently my 4gb memory is not enough to query. Does this seem right to people or does anyone have any experience on largish scale projects. I am completely tearing my hair out here and dont know what to do. Thanks __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: 01/02/2006 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Relevance Feedback Lucene+Algorithms
You might also want to look at that the LucQE project (http://sourceforge.net/projects/lucene-qe/), which implement a couple of automated relevance feedback methods including Rocchio's formula. On 2/15/06, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: > Please check Grant Ingersoll's presentation at ApacheCon 2005. > He put out great demo programs for the relevance feedback using Lucene. > > Thank you, > > Koji > > > -Original Message- > > From: varun sood [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, February 15, 2006 3:36 PM > > To: java-user@lucene.apache.org > > Subject: Relevance Feedback Lucene+Algorithms > > > > > > Hi, > > Can anyone share the experience of how to implement Relevance > > Feedback in > > Lucene? > > > > Can someone suggest me some algorithms and papers which can help me in > > building an effective Relevance Feedback system? > > > > Thanks in advance. > > > > Dexter. > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Dave Kor, Research Assistant Center for Information Mining and Extraction School of Computing National University of Singapore. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryParser behaviour ..
Chris Hostetter wrote: : Exactly this is my question, why the QueryParser creates a Phrase query : when he gets several tokens from analyzer : and not a BooleanQuery? Because if it did that, there would be no way to write phrase queries :) I'm not very sure about this ... QueryParser only returns a BooleanQuery when *it* can tell you have several clauses. For each "chunk" of text that it thinks of as one continuous piece of text (either because it doesn't contain whitespaces or wouldn't be better to let the analyzer decide if there is a continuous piece of text? and to build PhraseQueries only when the quote sign is found? because it has quotes around it) it gives it to the analyzer, if the analyzer says there are multiple Terms there then QueryParser makes a PhraseQuery out of it. or in a nutshell: 1) if the Parser detects multiple terms, it makes a boolean query 2) if the Analyzer detects multiple terms, it makes a phrase query this is related with my comment above. From the user's point of view I think it will make sense to build a phrase query only when the quotes are found in the search string. I think there are pro and con arguments, for "unifying" the behaviour. I would be happy if the QueryParser wouldn't create phrase queries if i didn't explicitly asked to do it. Does someone have a different opinion? if you don't like this behavior, it can all be circumvented by overriding getFieldQuery(). you don't even have to teal with the analyzer if you don't want to. just call super.getFieldQuery() and if you get back a PhraseQuery take it apart and build TermQueries wrapped in a boolean query. Well, there is all the time a work around. It is obvious that searching for word1,word2,word3 was a silly mistake, but I needed one hour to find why a PhraseQuery is created when no quotes existed in the query string. So ... my opinion is that what I suggest will improve the usability of lucene, I hope that the lucene developers share my opinion. Best, Sergiu -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Relevance Feedback Lucene+Algorithms
URL is http://www.cnlp.org/apachecon2005/ Koji Sekiguchi wrote: Please check Grant Ingersoll's presentation at ApacheCon 2005. He put out great demo programs for the relevance feedback using Lucene. Thank you, Koji -Original Message- From: varun sood [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 15, 2006 3:36 PM To: java-user@lucene.apache.org Subject: Relevance Feedback Lucene+Algorithms Hi, Can anyone share the experience of how to implement Relevance Feedback in Lucene? Can someone suggest me some algorithms and papers which can help me in building an effective Relevance Feedback system? Thanks in advance. Dexter. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- --- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: QueryParser behaviour ..
> From the user's point of view I think it will make sense to > build a phrase query only when the quotes are found in the search string. You make an interesting point Sergiu. Your proposal would increase the expressive power of the QueryParser by allowing the construction of either phrase queries or boolean queries when multiple tokens are produced by analysis. The main downside is that it's not backward compatible, and without quotes (and hence phrase queries) many older queries will produce worse results. I also think that a majority of the time, when multiple tokens are produced, you do want a phrase search (or at least a sloppy one). Of course, the backward compatible thing can be fixed via a flag on the query parser that defaults to the old behavior. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with mass delete from large index
> perform such a cull again, you might make several > distinct indexes (one per > day, per week, per whatever) during that reindexing > so the next time will be > much easier. How would you search and consolidate the results across multiple indexes? Hits from each index will have independent scoring. CL --- "Michael D. Curtin" <[EMAIL PROTECTED]> wrote: > Now that it's already in 1 index, I'm afraid you > can't just delete a few > files. On the other hand, if it's only a one-time > thing, reindexing with only > the docs you want shouldn't be too bad. If you > think you might ever need to > perform such a cull again, you might make several > distinct indexes (one per > day, per week, per whatever) during that reindexing > so the next time will be > much easier. > > Good luck! > > --MDC > > - > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Size + memory restrictions
Looking into the memory problems further I read "Every time you open an IndexSearcher/IndexReader resources are used which take up memory. for an application pointed at a static index, you only ever need one IndexReader/IndexSearcher that can be shared among multiple threads issuing queries. if your index is being incrimentally updated, you should never need more then two searcher/reader pairs open at a time" We may have many different segments of our index, and it seems below we are using one IndexSearcher per segment. Could this explain why we run out of memory when using more than 2/3 segments? Anyone else have any comments on the below? Many thanks Leon ps. At the moment I think it is set to only look at 2 segements private Searcher getSearcher() throws IOException { if (mSearcher == null) { synchronized (Monitor) { Searcher[] srs = new IndexSearcher[SearchersDir.size()]; int maxI = 2; // Searcher[] srs = new IndexSearcher[maxI]; int i = 0; for (Iterator iter = SearchersDir.iterator(); iter.hasNext() && ii++) { String dir = (String) iter.next(); try { srs[i] = new IndexSearcher(IndexDir+dir); } catch (IOException e) { log.error(ClassTool.getClassNameOnly(e) + ": " + e.getMessage(), e); } } mSearcher = new MultiSearcher(srs); changeTime = System.currentTimeMillis(); } } return mSearcher; } - Original Message - From: "Leon Chaddock" <[EMAIL PROTECTED]> To: Sent: Wednesday, February 15, 2006 9:28 AM Subject: Re: Size + memory restrictions Hi Greg, Thanks. We are actually running against 4 segments of 4gb so about 20 million docs. We cant merge the segments as their seems to be problems with out linux box , with having files over about 4gb. Not sure why that is. If I was to upgrade to 8gb of ram does it seem likely this will double the amount of docs we can handle, or would this provide an exponential increase? Thanks Leon - Original Message - From: "Greg Gershman" <[EMAIL PROTECTED]> To: Sent: Wednesday, February 15, 2006 12:41 AM Subject: Re: Size + memory restrictions You may consider incrementally adding documents to your index; I'm not sure why there would be problems adding to an existing index, but you can always add additional documents. You can optimize later to get everything back into a single segment. Querying is a different story; if you are using the Sort API, you will need enough memory to store a full sorting of your documents in memory. If you're trying to sort on a string or anything other than an int or float, this could require a lot of memory. I've used indices much bigger than 5 mil. docs/3.5 gb with less than 4GB of RAM and had no problems. Greg --- Leon Chaddock <[EMAIL PROTECTED]> wrote: Hi, we are having tremendous problems building a large lucene index and querying it. The programmers are telling me that when the index file reaches 3.5 gb or 5 million docs the index file can no longer grow any larger. To rectify this they have built index files in multiple directories. Now apparently my 4gb memory is not enough to query. Does this seem right to people or does anyone have any experience on largish scale projects. I am completely tearing my hair out here and dont know what to do. Thanks __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: 01/02/2006 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Internal Virus Database is out-of-date. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: 01/02/2006 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Help with mass delete from large index
Chandramohan wrote: perform such a cull again, you might make several distinct indexes (one per day, per week, per whatever) during that reindexing so the next time will be much easier. How would you search and consolidate the results across multiple indexes? Hits from each index will have independent scoring. Frankly, I ignore the scores in my application. The data itself isn't English prose, so the TF/IDF calcuations are stretched at best, as a measure of relevance. I presort the documents to be in "relevance" order (a popularity metric), then specify index ordering for the results. If that wouldn't work for your application, it seems to me that large-enough sub-sections *would* produce equivalent scores. That is, if the sub-indexes were big enough, one could directly compare scores, so a simple merge would work. If the total document corpus is small, then the need for sub-indexes isn't there anyhow. --MDC - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Performance Issues
Hi All, My system requires traversing Hits (search result) and extracting some data from it. If the result set is very large my system becomes very slow. Is there a way to increase performance? Is there a way i can limit the number of most relevant documents returned? Best regards, Urvashi - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: index merging
I have tried to use the isCurrent() method IndexReader to figure out if an index is merging. but since I have to do this evrytime I need to add a document, the performance got s slow. here is what I am doing, I create 4 indexs and I am running with 4 threads. I do a round robbin on the indexes when ever I process a new document. before adding a document I need to check if the index is merging, if it's the case then send this document to an index that is not merging. is there a better to index with multi threads? or what is the fastet way to check that a index is not merging? thanks for any hints, - Omar -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Monday, February 06, 2006 10:03 AM To: java-user@lucene.apache.org Subject: Re: index merging On 2/6/06, Vanlerberghe, Luc <[EMAIL PROTECTED]> wrote: > Sorry to contradict you Yonik, but I'm pretty sure the commit lock is > *not* locked during a merge, only while the "segments" file is being > updated. Oops, you're right. Good thing too... if the commit lock was held during merges, one couldn't even open up a new IndexReader. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Iterating hits
Hi lucene users I have a strange error and I don't know to do? My logs say this: java.lang.ArrayIndexOutOfBoundsException: 100 >= 100 at java.util.Vector.elementAt(Vector.java:431) at org.apache.lucene.search.Hits.hitDoc(Hits.java:127) at org.apache.lucene.search.Hits.doc(Hits.java:89) my code is this PrefixQuery p = new PrefixQuery(new Term("TOOL_REF_ID",getINITIAL(tool))); Hits h = sr.search(p); for (int i=0;i log.debug(h.doc(i).getField("TYPE") + " "+h.doc(i).getField("TOOL_REF_ID")); reader.delete(h.id(i)); } Why? How can I do to delete all the documents that the tool_ref_if begins with for example "AK"? Searching about it I find this : http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200306.mbox/[EMAIL PROTECTED] thks for any reply. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Iterating hits
Try using a different reader to delete the documents. Hits can re-execute a query, and if the searcher you are using is sharing the reader you are deleting with, it's like changing a list you are iterating over (fewer hits will be found the next time the query is executed). -Yonik On 2/15/06, Daniel Cortes <[EMAIL PROTECTED]> wrote: > Hi lucene users I have a strange error and I don't know to do? > My logs say this: > java.lang.ArrayIndexOutOfBoundsException: 100 >= 100 > at java.util.Vector.elementAt(Vector.java:431) > at org.apache.lucene.search.Hits.hitDoc(Hits.java:127) > at org.apache.lucene.search.Hits.doc(Hits.java:89) > > my code is this > PrefixQuery p = new PrefixQuery(new > Term("TOOL_REF_ID",getINITIAL(tool))); > Hits h = sr.search(p); > for (int i=0;i log.debug(h.doc(i).getField("TYPE") + " > "+h.doc(i).getField("TOOL_REF_ID")); > reader.delete(h.id(i)); > } > > Why? How can I do to delete all the documents that the tool_ref_if > begins with for example "AK"? > > > Searching about it I find this : > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200306.mbox/[EMAIL > PROTECTED] > > thks for any reply. > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Size + memory restrictions
: We may have many different segments of our index, and it seems below we are : using one : IndexSearcher per segment. Could this explain why we run out of memory when : using more than 2/3 segments? : Anyone else have any comments on the below? terminology is a big issue hwere .. when you use the word "segment" it seems like you are talking about a segment of your data, which is in a self contained index in it's own right. My point in in the comment you quoted was that for a given index, you don't need more then one active IndexSearcher open at a time, any more then that can waste resources. I don't know what kind of memory overhead there is in a MultiSearcher, but besides that you should also be looking at the other issues in the message you quoted from: who/when is calling your getSearcher() method? ... is it getting called more often then the underlying indexes change? who is closing the old searchers when you open new ones? : : Many thanks : : Leon : ps. At the moment I think it is set to only look at 2 segements : : private Searcher getSearcher() throws IOException { : if (mSearcher == null) { :synchronized (Monitor) { : Searcher[] srs = new IndexSearcher[SearchersDir.size()]; : int maxI = 2; :// Searcher[] srs = new IndexSearcher[maxI]; : int i = 0; : for (Iterator iter = SearchersDir.iterator(); iter.hasNext() && i : To: : Sent: Wednesday, February 15, 2006 9:28 AM : Subject: Re: Size + memory restrictions : : : > Hi Greg, : > Thanks. We are actually running against 4 segments of 4gb so about 20 : > million docs. We cant merge the segments as their seems to be problems : > with out linux box , with having files over about 4gb. Not sure why that : > is. : > : > If I was to upgrade to 8gb of ram does it seem likely this will double the : > amount of docs we can handle, or would this provide an exponential : > increase? : > : > Thanks : > : > Leon : > - Original Message - : > From: "Greg Gershman" <[EMAIL PROTECTED]> : > To: : > Sent: Wednesday, February 15, 2006 12:41 AM : > Subject: Re: Size + memory restrictions : > : > : >> You may consider incrementally adding documents to : >> your index; I'm not sure why there would be problems : >> adding to an existing index, but you can always add : >> additional documents. You can optimize later to get : >> everything back into a single segment. : >> : >> Querying is a different story; if you are using the : >> Sort API, you will need enough memory to store a full : >> sorting of your documents in memory. If you're trying : >> to sort on a string or anything other than an int or : >> float, this could require a lot of memory. : >> : >> I've used indices much bigger than 5 mil. docs/3.5 gb : >> with less than 4GB of RAM and had no problems. : >> : >> Greg : >> : >> : >> --- Leon Chaddock <[EMAIL PROTECTED]> wrote: : >> : >>> Hi, : >>> we are having tremendous problems building a large : >>> lucene index and querying : >>> it. : >>> : >>> The programmers are telling me that when the index : >>> file reaches 3.5 gb or 5 : >>> million docs the index file can no longer grow any : >>> larger. : >>> : >>> To rectify this they have built index files in : >>> multiple directories. Now : >>> apparently my 4gb memory is not enough to query. : >>> : >>> Does this seem right to people or does anyone have : >>> any experience on largish : >>> scale projects. : >>> : >>> I am completely tearing my hair out here and dont : >>> know what to do. : >>> : >>> Thanks : >>> : >> : >> : >> __ : >> Do You Yahoo!? : >> Tired of spam? Yahoo! Mail has the best spam protection around : >> http://mail.yahoo.com : >> : >> - : >> To unsubscribe, e-mail: [EMAIL PROTECTED] : >> For additional commands, e-mail: [EMAIL PROTECTED] : >> : >> : >> : >> : >> : >> -- : >> Internal Virus Database is out-of-date. : >> Checked by AVG Free Edition. : >> Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: : >> 01/02/2006 : >> : >> : > : > : > - : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : > : > : > : > : > : > -- : > Internal Virus Database is out-of-date. : > Checked by AVG Free Edition. : > Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: 01/02/2006 : > : > : : : - : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Size + memory restrictions
Hi Chris, Thanks, when I quoted segment I meant index file. So if we have 10 seperate index files are you saying we should have one indexSearcher for the index collectively, or one per index file Thanks Leon - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: Sent: Wednesday, February 15, 2006 6:40 PM Subject: Re: Size + memory restrictions : We may have many different segments of our index, and it seems below we are : using one : IndexSearcher per segment. Could this explain why we run out of memory when : using more than 2/3 segments? : Anyone else have any comments on the below? terminology is a big issue hwere .. when you use the word "segment" it seems like you are talking about a segment of your data, which is in a self contained index in it's own right. My point in in the comment you quoted was that for a given index, you don't need more then one active IndexSearcher open at a time, any more then that can waste resources. I don't know what kind of memory overhead there is in a MultiSearcher, but besides that you should also be looking at the other issues in the message you quoted from: who/when is calling your getSearcher() method? ... is it getting called more often then the underlying indexes change? who is closing the old searchers when you open new ones? : : Many thanks : : Leon : ps. At the moment I think it is set to only look at 2 segements : : private Searcher getSearcher() throws IOException { : if (mSearcher == null) { :synchronized (Monitor) { : Searcher[] srs = new IndexSearcher[SearchersDir.size()]; : int maxI = 2; :// Searcher[] srs = new IndexSearcher[maxI]; : int i = 0; : for (Iterator iter = SearchersDir.iterator(); iter.hasNext() && i : i++) { : String dir = (String) iter.next(); : try { : srs[i] = new IndexSearcher(IndexDir+dir); : } catch (IOException e) { : log.error(ClassTool.getClassNameOnly(e) + ": " + e.getMessage(), e); : } : } : mSearcher = new MultiSearcher(srs); : changeTime = System.currentTimeMillis(); :} : } : return mSearcher; : } : - Original Message - : From: "Leon Chaddock" <[EMAIL PROTECTED]> : To: : Sent: Wednesday, February 15, 2006 9:28 AM : Subject: Re: Size + memory restrictions : : : > Hi Greg, : > Thanks. We are actually running against 4 segments of 4gb so about 20 : > million docs. We cant merge the segments as their seems to be problems : > with out linux box , with having files over about 4gb. Not sure why that : > is. : > : > If I was to upgrade to 8gb of ram does it seem likely this will double the : > amount of docs we can handle, or would this provide an exponential : > increase? : > : > Thanks : > : > Leon : > - Original Message - : > From: "Greg Gershman" <[EMAIL PROTECTED]> : > To: : > Sent: Wednesday, February 15, 2006 12:41 AM : > Subject: Re: Size + memory restrictions : > : > : >> You may consider incrementally adding documents to : >> your index; I'm not sure why there would be problems : >> adding to an existing index, but you can always add : >> additional documents. You can optimize later to get : >> everything back into a single segment. : >> : >> Querying is a different story; if you are using the : >> Sort API, you will need enough memory to store a full : >> sorting of your documents in memory. If you're trying : >> to sort on a string or anything other than an int or : >> float, this could require a lot of memory. : >> : >> I've used indices much bigger than 5 mil. docs/3.5 gb : >> with less than 4GB of RAM and had no problems. : >> : >> Greg : >> : >> : >> --- Leon Chaddock <[EMAIL PROTECTED]> wrote: : >> : >>> Hi, : >>> we are having tremendous problems building a large : >>> lucene index and querying : >>> it. : >>> : >>> The programmers are telling me that when the index : >>> file reaches 3.5 gb or 5 : >>> million docs the index file can no longer grow any : >>> larger. : >>> : >>> To rectify this they have built index files in : >>> multiple directories. Now : >>> apparently my 4gb memory is not enough to query. : >>> : >>> Does this seem right to people or does anyone have : >>> any experience on largish : >>> scale projects. : >>> : >>> I am completely tearing my hair out here and dont : >>> know what to do. : >>> : >>> Thanks : >>> : >> : >> : >> __ : >> Do You Yahoo!? : >> Tired of spam? Yahoo! Mail has the best spam protection around : >> http://mail.yahoo.com : >> : >> - : >> To unsubscribe, e-mail: [EMAIL PROTECTED] : >> For additional commands, e-mail: [EMAIL PROTECTED] : >> : >> : >> : >> : >> : >> -- : >> Internal Virus Database is out-of-date. : >> Checked by AVG Free Edition. : >> Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: : >> 01/02/2006 : >> : >> : > : > : > -
Re: Size + memory restrictions
Leon, Index is typically a directory on disk with files (commonly called "index files") in it. Each index can have 1 or more segments. Each segment is comprised of several index files. If you are using the compound index format, then the situation is a bit different (less index files). Otis P.S. You asked about Lucene in Action... :) - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, February 15, 2006 1:40:01 PM Subject: Re: Size + memory restrictions : We may have many different segments of our index, and it seems below we are : using one : IndexSearcher per segment. Could this explain why we run out of memory when : using more than 2/3 segments? : Anyone else have any comments on the below? terminology is a big issue hwere .. when you use the word "segment" it seems like you are talking about a segment of your data, which is in a self contained index in it's own right. My point in in the comment you quoted was that for a given index, you don't need more then one active IndexSearcher open at a time, any more then that can waste resources. I don't know what kind of memory overhead there is in a MultiSearcher, but besides that you should also be looking at the other issues in the message you quoted from: who/when is calling your getSearcher() method? ... is it getting called more often then the underlying indexes change? who is closing the old searchers when you open new ones? : : Many thanks : : Leon : ps. At the moment I think it is set to only look at 2 segements : : private Searcher getSearcher() throws IOException { : if (mSearcher == null) { :synchronized (Monitor) { : Searcher[] srs = new IndexSearcher[SearchersDir.size()]; : int maxI = 2; :// Searcher[] srs = new IndexSearcher[maxI]; : int i = 0; : for (Iterator iter = SearchersDir.iterator(); iter.hasNext() && i : To: : Sent: Wednesday, February 15, 2006 9:28 AM : Subject: Re: Size + memory restrictions : : : > Hi Greg, : > Thanks. We are actually running against 4 segments of 4gb so about 20 : > million docs. We cant merge the segments as their seems to be problems : > with out linux box , with having files over about 4gb. Not sure why that : > is. : > : > If I was to upgrade to 8gb of ram does it seem likely this will double the : > amount of docs we can handle, or would this provide an exponential : > increase? : > : > Thanks : > : > Leon : > - Original Message - : > From: "Greg Gershman" <[EMAIL PROTECTED]> : > To: : > Sent: Wednesday, February 15, 2006 12:41 AM : > Subject: Re: Size + memory restrictions : > : > : >> You may consider incrementally adding documents to : >> your index; I'm not sure why there would be problems : >> adding to an existing index, but you can always add : >> additional documents. You can optimize later to get : >> everything back into a single segment. : >> : >> Querying is a different story; if you are using the : >> Sort API, you will need enough memory to store a full : >> sorting of your documents in memory. If you're trying : >> to sort on a string or anything other than an int or : >> float, this could require a lot of memory. : >> : >> I've used indices much bigger than 5 mil. docs/3.5 gb : >> with less than 4GB of RAM and had no problems. : >> : >> Greg : >> : >> : >> --- Leon Chaddock <[EMAIL PROTECTED]> wrote: : >> : >>> Hi, : >>> we are having tremendous problems building a large : >>> lucene index and querying : >>> it. : >>> : >>> The programmers are telling me that when the index : >>> file reaches 3.5 gb or 5 : >>> million docs the index file can no longer grow any : >>> larger. : >>> : >>> To rectify this they have built index files in : >>> multiple directories. Now : >>> apparently my 4gb memory is not enough to query. : >>> : >>> Does this seem right to people or does anyone have : >>> any experience on largish : >>> scale projects. : >>> : >>> I am completely tearing my hair out here and dont : >>> know what to do. : >>> : >>> Thanks : >>> : >> : >> : >> __ : >> Do You Yahoo!? : >> Tired of spam? Yahoo! Mail has the best spam protection around : >> http://mail.yahoo.com : >> : >> - : >> To unsubscribe, e-mail: [EMAIL PROTECTED] : >> For additional commands, e-mail: [EMAIL PROTECTED] : >> : >> : >> : >> : >> : >> -- : >> Internal Virus Database is out-of-date. : >> Checked by AVG Free Edition. : >> Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: : >> 01/02/2006 : >> : >> : > : > : > - : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : > : > : > : > : > : > -- : > Internal Virus Database is out-of-date. : > Checked by AVG Free Edition. : > Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: 01/02/2006 : > : > : : : --
Re: Relevance Feedback Lucene+Algorithms
Hi Thanks for replying. I read your ppt. It is good. But the code or the basic relevance feedback is not explained there. Actually I am not familiar with JSP, JUnit, Maven, etc. I guess It will take me lot of time to actually discover how the things work in demo program because I have to learn all those technologies first. Is there any documentation or some brief notes on how Relevance Feedback has been or could be done? I am looking on manual Relevance Feedback system. Thanks, Dexter On 2/15/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > URL is http://www.cnlp.org/apachecon2005/ > > Koji Sekiguchi wrote: > > Please check Grant Ingersoll's presentation at ApacheCon 2005. > > He put out great demo programs for the relevance feedback using Lucene. > > > > Thank you, > > > > Koji > > > > > >> -Original Message- > >> From: varun sood [mailto:[EMAIL PROTECTED] > >> Sent: Wednesday, February 15, 2006 3:36 PM > >> To: java-user@lucene.apache.org > >> Subject: Relevance Feedback Lucene+Algorithms > >> > >> > >> Hi, > >> Can anyone share the experience of how to implement Relevance > >> Feedback in > >> Lucene? > >> > >> Can someone suggest me some algorithms and papers which can help me in > >> building an effective Relevance Feedback system? > >> > >> Thanks in advance. > >> > >> Dexter. > >> > >> > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > -- > --- > Grant Ingersoll > Sr. Software Engineer > Center for Natural Language Processing > Syracuse University > School of Information Studies > 335 Hinds Hall > Syracuse, NY 13244 > > http://www.cnlp.org > Voice: 315-443-5484 > Fax: 315-443-6886 > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
Re: index merging
Omar Didi wrote: I have tried to use the isCurrent() method IndexReader to figure out if an index is merging. but since I have to do this evrytime I need to add a document, the performance got s slow. here is what I am doing, I create 4 indexs and I am running with 4 threads. I do a round robbin on the indexes when ever I process a new document. before adding a document I need to check if the index is merging, if it's the case then send this document to an index that is not merging. is there a better to index with multi threads? or what is the fastet way to check that a index is not merging? I've done this before by having a single work queue of documents which need adding. Each of the four indexing threads refer to that queue and can pull documents off that queue. The concurrency utility classes in java.util.concurrent may help with this approach. Daniel -- Daniel Noll Nuix Australia Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Phone: (02) 9280 0699 Fax: (02) 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Relevance Feedback Lucene+Algorithms
In the example code, take a look at the SearchServlet.java code and the performFeedback and getTopTerms() methods, which demonstrate the use of the term vectors. It is fairly well commented. You don't need maven, JSP or JUnit for this. On the indexing side, look at the TVHTMLDocument for how to construct the term vectors. As for how to do Rel. Feedback, you can search the mailing list archive, there have been many discussions in the past that will offer insights into RF in Lucene. I also like the book "Modern Information Retrieval" by Baeza-Yates, et.al as a text for the theory behind RF. You may also find the MoreLikeThis implementation (again, search this mailing list and look in the Lucene contrib section) satisfies your needs. Hope this helps, Grant varun sood wrote: Hi Thanks for replying. I read your ppt. It is good. But the code or the basic relevance feedback is not explained there. Actually I am not familiar with JSP, JUnit, Maven, etc. I guess It will take me lot of time to actually discover how the things work in demo program because I have to learn all those technologies first. Is there any documentation or some brief notes on how Relevance Feedback has been or could be done? I am looking on manual Relevance Feedback system. Thanks, Dexter On 2/15/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: URL is http://www.cnlp.org/apachecon2005/ Koji Sekiguchi wrote: Please check Grant Ingersoll's presentation at ApacheCon 2005. He put out great demo programs for the relevance feedback using Lucene. Thank you, Koji -Original Message- From: varun sood [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 15, 2006 3:36 PM To: java-user@lucene.apache.org Subject: Relevance Feedback Lucene+Algorithms Hi, Can anyone share the experience of how to implement Relevance Feedback in Lucene? Can someone suggest me some algorithms and papers which can help me in building an effective Relevance Feedback system? Thanks in advance. Dexter. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- --- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- --- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Hardware Requirements for a large index?
Hi, I am in the process of deciding specs for a crawling machine and a searching machine (two machines), which will support merging/indexing and searching operations on a single Lucene index that may scale to about several million pages (at which it would be about 2-10 GB, assuming linear growth with pages). What is the range of hardware that I should be looking at? Could anyone share their deployment/hardware specs for a large index size? I'm looking for RAM and CPU considerations. Also what is the preferred platform - Java has a max memory allocation of 4GB on Solaris and 2GB on linux? -> Does it make sense to get more RAM than this? Thanks! CW - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to index numeric fields
Hi, What is the best way to index numeric decimal fields, like experience, when I want to use a range search on this field? Thanks in advance. Regards, Shivani
Re: How to index numeric fields
Here are a few bits: http://www.lucenebook.com/search?query=indexing+numbers The Wiki and the FAQ also have some information about indexing numbers/dates. Basically, you want them small (ints, faster sorting, if you need sorting), and you don't want them too fine, if you'll be expanding them into Boolean OR query. Otis - Original Message From: Shivani Sawhney <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, February 15, 2006 11:36:37 PM Subject: How to index numeric fields Hi, What is the best way to index numeric decimal fields, like experience, when I want to use a range search on this field? Thanks in advance. Regards, Shivani - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ArrayIndexOutOfBoundsException while closing the index writer
Who knows what else the app is doing. However, I can quickly suggest that you add a finally block and close your writer in there if writer != null. Otis - Original Message From: Shivani Sawhney <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, February 15, 2006 11:31:12 PM Subject: ArrayIndexOutOfBoundsException while closing the index writer Hi, I have used Lucene in my application and am just indexing and searching on some documents. The code that indexes the documents was working fine till yesterday and suddenly stopped working. I get an error when I am trying to close the index writer. The code is as follows: . IndexWriter indexwriter = new IndexWriter(indexDirFile, new StandardAnalyzer(), flag); indexFile(indexwriter, resumeFile); indexwriter.close(); //causing errors } catch (IOException e) { e.printStackTrace(); throw new Error(e); } . And the error log is as follows: 2006-02-15 18:47:48,748 WARN [org.apache.struts.action.RequestProcessor] Unhandled Exception thrown: class java.lang.ArrayIndexOutOfBoundsException 2006-02-15 18:47:48,748 ERROR [org.jboss.web.localhost.Engine] StandardWrapperValve[action]: Servlet.service() for servlet action threw exception java.lang.ArrayIndexOutOfBoundsException: 105 >= 25 at java.util.Vector.elementAt(Vector.java:432) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:169) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:425) at org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:373) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:193) at rd.admin.NewIndexer.indexTextFile(NewIndexer.java:108) at rd.admin.AddResume.indexOneRow(AddResume.java:38) at rd.admin.LuceneGateway.buildMapAndIndex(LuceneGateway.java:46) at rd.admin.LuceneGateway.indexResume(LuceneGateway.java:30) at rd.admin.UploadResumeAgainstRequisition.npExecute(UploadResumeAgainstRequisi tion.java:106) at np.core.BaseNPAction.execute(BaseNPAction.java:116) at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProces sor.java:421) at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:226) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1164) at org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:415) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at javax.servlet.http.HttpServlet.service(HttpServlet.java:810) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:237) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:157) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja va:75) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:186) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:157) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:214) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:104) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContext Valve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:152) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:104) at org.jboss.web.tomcat.security.CustomPrincipalValve.invoke(CustomPrincipalVal ve.java:66) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:102) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssoci ationValve.java:153) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:102) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase .java:540) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:102) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.
ArrayIndexOutOfBoundsException while closing the index writer
Hi, I have used Lucene in my application and am just indexing and searching on some documents. The code that indexes the documents was working fine till yesterday and suddenly stopped working. I get an error when I am trying to close the index writer. The code is as follows: . IndexWriter indexwriter = new IndexWriter(indexDirFile, new StandardAnalyzer(), flag); indexFile(indexwriter, resumeFile); indexwriter.close(); //causing errors } catch (IOException e) { e.printStackTrace(); throw new Error(e); } . And the error log is as follows: 2006-02-15 18:47:48,748 WARN [org.apache.struts.action.RequestProcessor] Unhandled Exception thrown: class java.lang.ArrayIndexOutOfBoundsException 2006-02-15 18:47:48,748 ERROR [org.jboss.web.localhost.Engine] StandardWrapperValve[action]: Servlet.service() for servlet action threw exception java.lang.ArrayIndexOutOfBoundsException: 105 >= 25 at java.util.Vector.elementAt(Vector.java:432) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:169) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:97) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:425) at org.apache.lucene.index.IndexWriter.flushRamSegments(IndexWriter.java:373) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:193) at rd.admin.NewIndexer.indexTextFile(NewIndexer.java:108) at rd.admin.AddResume.indexOneRow(AddResume.java:38) at rd.admin.LuceneGateway.buildMapAndIndex(LuceneGateway.java:46) at rd.admin.LuceneGateway.indexResume(LuceneGateway.java:30) at rd.admin.UploadResumeAgainstRequisition.npExecute(UploadResumeAgainstRequisi tion.java:106) at np.core.BaseNPAction.execute(BaseNPAction.java:116) at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProces sor.java:421) at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:226) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1164) at org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:415) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at javax.servlet.http.HttpServlet.service(HttpServlet.java:810) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:237) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:157) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.ja va:75) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:186) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:157) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:214) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:104) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContext Valve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:152) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:104) at org.jboss.web.tomcat.security.CustomPrincipalValve.invoke(CustomPrincipalVal ve.java:66) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:102) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssoci ationValve.java:153) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:102) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase .java:540) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:102) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java: 54) at org.apache.catalina.core.StandardValveContext.invokeNext(StandardValveContex t.java:102) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:520) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:137 ) at org.apache.catalina.core.StandardValveContext.invokeNext(St
RE: ArrayIndexOutOfBoundsException while closing the index writer
Hi Otis, Thanks for such a quick reply. I tried using finally, but it didn't help. I guess if I explain the integration of lucene with my app in little detail then you probably can help me better. I allow users to upload documents, which are then indexed, and search on them. Now I am getting this error when I am trying to index the document and particularly while closing the index writer. If we look closely at the error log, it's giving an error at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135) i.e., when lucene tries to get something by the field name, return (FieldInfo) byName.get(fieldName); , now what beats me is that, indexing on fields has already been done by the time we want to close the index writer, so how come I don't get an error while indexing, what goes wrong when I am trying to close the index writer. Please see if you can help me with this Thanks in advance. The code used for indexing is as follows: public void indexFile(IndexWriter indexwriter, File resumeFile) { Document document = new Document(); try { File afile[] = indexDirFile.listFiles(); boolean flag = false; if (afile.length <= 0) flag = true; indexwriter = new IndexWriter(indexDirFile, new StandardAnalyzer(), flag); try { document.add(Field.Text(IndexerColumns.contents, new FileReader(resumeFile))); } catch (FileNotFoundException e) { e.printStackTrace(); throw new MyRuntimeException(e.getMessage(), e); } document.add(Field.Keyword( IndexerColumns.id, String.valueOf(mapLuceneParams.get(IndexerColumns.id)) )); for (int i = 0; i < this.columnInfos.length; i++) { ColumnInfo columnInfo = columnInfos[i]; String value = String.valueOf(mapLuceneParams.get(columnInfo.columnName)); if (value != null) { value = value.trim(); if (value.length() != 0) { if (columnInfo.istokenized) { document.add(Field.Text(columnInfo.columnName, value)); } else { document.add(Field.Keyword(columnInfo.columnName, value)); } } } } document.add(Field.Keyword(IndexerColumns.filePath, String.valueOf(mapLuceneParams.get(IndexerColumns.filePath; try { indexwriter.addDocument(document); } catch (IOException e) { e.printStackTrace(); throw new MyRuntimeException(e.getMessage(), e); } indexwriter.close(); } catch (IOException e) { e.printStackTrace(); throw new Error(e); }finally { if(indexwriter != null) { indexwriter.close(); } } Regards, Shivani -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: 16 February, 2006 10:16 AM To: java-user@lucene.apache.org Subject: Re: ArrayIndexOutOfBoundsException while closing the index writer Who knows what else the app is doing. However, I can quickly suggest that you add a finally block and close your writer in there if writer != null. Otis - Original Message From: Shivani Sawhney <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, February 15, 2006 11:31:12 PM Subject: ArrayIndexOutOfBoundsException while closing the index writer Hi, I have used Lucene in my application and am just indexing and searching on some documents. The code that indexes the documents was working fine till yesterday and suddenly stopped working. I get an error when I am trying to close the index writer. The code is as follows: . IndexWriter indexwriter = new IndexWriter(indexDirFile, new StandardAnalyzer(), flag); indexFile(indexwriter, resumeFile); indexwriter.close(); //causing errors } catch (IOException e) { e.printStackTrace(); throw new Error(e); } . And the error log is as follows: 2006-02-15 18:47:48,748 WARN [org.apache.struts.action.RequestProcessor] Unhandled Exception thrown: class java.lang.ArrayIndexOutOfBoundsException 2006-02-15 18:47:48,748 ERROR [org.jboss.web.localhost.Engine] StandardWrapperValve[action]: Servlet.service() for servlet action threw exception java.lang.ArrayIndexOutOfBoundsException: 105 >= 25 at java.util.Vector.elementAt(Vector.java:432) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135) at org.apache.lucene.inde