Re: Question from a new user : IndexSearcher.doc

2010-06-21 Thread Victor Kabdebon
Hi Erick,

Thank you very much for you explanations. 588 is a rather long way to go, so
you're right maybe I won't need at the moment to care about that problem.
To answer your final question : no indeed I won't need to store a lot of
data. Just some keys  in order to find the data in Cassandra later on.

If you don't mind, please let me ask you another question :

Is it really interesting to begin with Lucene rather than directly with solR
(or Nutch) ? What I mean by that is : is it the same difficulty to implement
a search with solR and stay with it instead of first implement a search with
Lucene, then when the project becomes very big change it to a new system ?
My goal is to have that can evolve with time even if I have 1 million
documents added daily ?

Thank you,
Victor

2010/6/21 Erick Erickson 

> By and large, you won't ever actually be interested in very many documents,
> what's returned in the TopDocs structure internal document ID and score, in
> score order. But retrieval by document ID is quite efficient, it's not a
> search. I'm quite sure this won't be a problem.
>
> Adding 10,000 documents a day means that in 588 years you'll exceed a
> 31-bit
> number. I don't think you really need to worry about that either. And
> that's
> the worst-case, assuming the ints are signed. And I believe that they're
> unsigned anyway.
>
> What you will have to worry about is the time to get the top N
> highest-scoring documents. That is, IndexSearcher.seach() will be your
> limiting factor long before you reach these numbers. By that time, though,
> you'll have moved to SOLR or some other distributed search mechanism.
>
> Performance is influenced by the complexity of the queries and the
> structure
> and size of your index. The time spent retrieving the top few matches is
> completely dwarfed by the search time for an index of any size.
>
> All this may be irrelevant if you really want to retrieve a very large
> number of documents rather than, say, the top 100. But the use case would
> have to be very interesting for it to be a requirement to return, say,
> 100,000 documents to a user.
>
> But do be aware that you're not retrieving the *original* text with
> IndexSearcher. Typically, the relevant data is indexed but not stored These
> two concepts are confusing when you start using Lucene, especially since
> they're specified in the same call. Indexing a field splits it up into
> tokens, normalizes it (e.g. lowercases, stems, puts in synonyms, etc). The
> indexed data is the part that's searched. You can also store the input
> verbatim, the but stored part is just a copy that's never searched but is
> available for retrieval.
>
> Which brings up one of the central decisions you need to make. Are you,
> indeed, going to store all the data for retrieval in your index or just
> index the relevant text to be searched along with some locator information
> to the original document? You mention Cassandra, which leads me to
> speculate
> that it's the latter.
>
> HTH
> Erick
>
>
> On Sun, Jun 20, 2010 at 4:04 PM, Victor Kabdebon
> wrote:
>
> > Hello Simon,
> >
> > As I told you, I am quite new with Lucene, so there are many things that
> > might be wrong.
> > I'm using Lucene to make a search service for a website that has a large
> > amount of information daily. This amount of information is directly
> avaible
> > as text in a Cassandra Database.
> > There might be as much as 10.000 new documents added daily, and yes my
> > concern is it possible to retrieve more documents than the integer max
> > value
> > ?
> > I don't really see also how the IndexSearcher.doc( ) really works,
> because
> > it seems like we give this method an ID and it is going to search in the
> > indexed documents. So what exactly is going to do this
> > IndexSearcher.doc(int) ?
> >
> > *Or are you concerned about retrieving all documents
> > containing term "XY" if the number of documents matching is large?*
> > *
> > *
> >
> > I'm also concerned by this problem, yes
> >
> > Could you explain me a little bit how it works, and how Lucene enables
> one
> > to retrieve a very large number of documents even if it uses int ?
> >
> > Thank you for your answers,
> > Victor
> >
> > 2010/6/20 Simon Willnauer 
> >
> > > Hi, maybe I don't understand your question correctly. Are you asking
> > > if you could run into problems if you retrieve more documents than
> > > integer max value? Or are you concerned about retrieving all documents
> > > containing term "XY" if the number of documents matching is large? If
> > > you are afraid of loading all documents matched from a stored field I
> > > guess you are doing something wrong.
> > > What are you using lucene for?
> > >
> > > simon
> > >
> > > On Sun, Jun 20, 2010 at 8:00 PM, Victor Kabdebon
> > >  wrote:
> > > > Hello everybody,
> > > >
> > > > I am new to Apache Lucene and it seems to fit perfectly my needs for
> my
> > > > application.
> > > > However I'm a little concerned about something (pardon me

Re: Strange behaviour of StandardTokenizer

2010-06-21 Thread Anna Hunecke
Hi!

Basically, what I want is something that removes punctuation. 
But I realized now that things like email or number recognition are also very 
useful if I want to give suggestions. I want to be able to give 'nl-lt001' as a 
suggestion when the user enters 'nl'. This would of course not be possible if 
the tokenizer just blindly splits at the '-'. 
So, I'll stick with the tokenizer for now and fix the problems I had with the 
splitting of words by building the queries differently.
Thanks for your help!

- Anna

--- Simon Willnauer  schrieb am Fr, 18.6.2010:

> Von: Simon Willnauer 
> Betreff: Re: Strange behaviour of StandardTokenizer
> An: java-user@lucene.apache.org
> Datum: Freitag, 18. Juni, 2010 09:52 Uhr
> Hi Anna,
> 
> what are you using you tokenizer for? There are a lot of
> different
> options in lucene an StandardTokenizer is not necessarily
> the best
> one. The behaviour you are see is that the tokenizer
> detects you token
> as a number. When you look at the grammar that is kind of
> obvious.
> 
> 
> // floating point, serial, model numbers, ip addresses,
> etc.
> // every other segment must have at least one digit
> NUM        = ({ALPHANUM} {P}
> {HAS_DIGIT}
>            | {HAS_DIGIT}
> {P} {ALPHANUM}
>            | {ALPHANUM}
> ({P} {HAS_DIGIT} {P} {ALPHANUM})+
>            | {HAS_DIGIT}
> ({P} {ALPHANUM} {P} {HAS_DIGIT})+
>            | {ALPHANUM}
> {P} {HAS_DIGIT} ({P} {ALPHANUM} {P} {HAS_DIGIT})+
>            | {HAS_DIGIT}
> {P} {ALPHANUM} ({P} {HAS_DIGIT} {P} {ALPHANUM})+)
> 
> // punctuation
> P         
>    = ("_"|"-"|"/"|"."|",")
> 
> 
> 
> you can either build your own custom filter which fixed
> only the
> problem with numbers containing a '- ', use the
> MappingCharFilter or
> switch to a different tokenizer.
> If you could talk more about your usecase you might get
> better suggestions.
> 
> Simon
> 
> On Fri, Jun 18, 2010 at 9:03 AM, Anna Hunecke 
> wrote:
> > Hi Ahmet,
> > thanks for the explanation. :)
> > okay, so it is recognized as a number? I didn't expect
> that really. I expect that all words are either split at the
> minus or not.
> > Maybe I'll have to use another tokenizer.
> > Best,
> > Anna
> >
> > --- Ahmet Arslan 
> schrieb am Do, 17.6.2010:
> >
> >> Von: Ahmet Arslan 
> >> Betreff: Re: Strange behaviour of
> StandardTokenizer
> >> An: java-user@lucene.apache.org
> >> Datum: Donnerstag, 17. Juni, 2010 15:50 Uhr
> >>
> >> > I ran into a strange behaviour of the
> >> StandardTokenizer.
> >> > Terms containing a '-' are tokenized
> differently
> >> depending
> >> > on the context.
> >> > For example, the term 'nl-lt' is split into
> 'nl' and
> >> 'lt'.
> >> > The term 'nl-lt0' is tokenized into
> 'nl-lt0'.
> >> > Is this a bug or a feature?
> >>
> >> It is designed that way. TypeAttribute of those
> tokens are
> >> different.
> >>
> >> > Can I avoid it somehow?
> >>
> >> Do you want to split at '-' char no matter what?
> If yes,
> >> you can replace all '-' characters with whitespace
> using
> >> MappingCharFilter before StandardTokenizer.
> >>
> >>
> >>
> >>
> >>
> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> >
> >
> >
> -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: segment_N file is missed

2010-06-21 Thread maryam ma'danipour
That's great. I'll try it.
thanks

On Sat, Jun 19, 2010 at 11:10 AM, Lance Norskog  wrote:

> This code is old (2006!) and I've updated it for Lucene 2.9.2, ad the
> trunk.This version only works for one CFS file (that I've tested). The
> code does not check versions carefully.  Here are both versions:
>
> Lucene 2.9.2:
> 
> package org.apache.lucene.index;
>
> import java.io.File;
> import java.io.FilenameFilter;
> import java.io.IOException;
>
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.store.IndexInput;
>
> public class CFSopen {
>
>  // this code fixes up a directory
>  // make a ramdirectory, use this to fix it up if needed
>  // does compoundfilereader work directly?
>
>  Directory fixIndex(String path) throws IOException {
>File file = new File(path);
>Directory directory = FSDirectory.getDirectory(file, false);
>
>String[] files = file.list(new FilenameFilter() {
>  public boolean accept(File dir, String name) {
>return name.endsWith(".cfs");
>  }
>});
>
>SegmentInfos infos = new SegmentInfos();
>int counter = 0;
>for (int i = 0; i < files.length; i++) {
>  String fileName = files[i];
>  String segmentName = fileName.substring(1, fileName.lastIndexOf('.'));
>
>  int segmentInt = Integer.parseInt(segmentName, Character.MAX_RADIX);
>  counter = Math.max(counter, segmentInt);
>
>  segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
>
>  Directory fileReader = new CompoundFileReader(directory, fileName);
>  IndexInput indexStream = fileReader.openInput(".fdx");
>  int size = (int) (indexStream.length() / 8);
>  indexStream.close();
>  fileReader.close();
>
>  SegmentInfo segmentInfo = new SegmentInfo(segmentName, size,
> directory);
>  infos.addElement(segmentInfo);
>}
>
>infos.counter = ++counter;
>
>infos.prepareCommit(directory);
>infos.finishCommit(directory);
>return directory;
>  }
>
>  /**
>   * @param args
>   * @throws IOException
>   */
>  public static void main(String[] args) throws IOException {
>// TODO Auto-generated method stub
>CFSopen cfsopen = new CFSopen();
>Directory dir = cfsopen.fixIndex("/cygwin/tmp/index");
>dir.hashCode();
>  }
>
> }
>
> ---
> trunk:
> ---
> package org.apache.lucene.index;
>
> import java.io.File;
> import java.io.FilenameFilter;
> import java.io.IOException;
>
> import org.apache.lucene.index.codecs.Codec;
> import org.apache.lucene.index.codecs.standard.StandardCodec;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.store.IndexInput;
>
> public class CFSopen {
>
>  Directory fixIndex(String path) throws IOException {
>File file = new File(path);
>Directory directory = FSDirectory.open(file);
>
>String[] files = file.list(new FilenameFilter() {
>  public boolean accept(File dir, String name) {
>return name.endsWith(".cfs");
>  }
>});
>
>SegmentInfos infos = new SegmentInfos();
>int counter = 0;
>for (int i = 0; i < files.length; i++) {
>  String fileName = files[i];
>  String segmentName = fileName.substring(1, fileName.lastIndexOf('.'));
>
>  int segmentInt = Integer.parseInt(segmentName, Character.MAX_RADIX);
>  counter = Math.max(counter, segmentInt);
>
>  segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
>
>  Directory fileReader = new CompoundFileReader(directory, fileName);
>  IndexInput indexStream = fileReader.openInput(".fdx");
>  int size = (int) (indexStream.length() / 8);
>  indexStream.close();
>  fileReader.close();
>
>  // need to get codec name out of the CFS files
>  Codec codec = new StandardCodec();
>  SegmentInfo segmentInfo = new SegmentInfo(segmentName, size,
> directory, false, -1, null, false, false, codec );
>  infos.addElement(segmentInfo);
>}
>
>infos.counter = ++counter;
>
>infos.prepareCommit(directory);
>infos.finishCommit(directory);
>return directory;
>  }
>
>  /**
>   * @param args
>   * @throws IOException
>   */
>  public static void main(String[] args) throws IOException {
>// TODO Auto-generated method stub
>CFSopen cfsopen = new CFSopen();
>Directory dir = cfsopen.fixIndex("/cygwin/tmp/index");
>dir.hashCode();
>   }
>
> }
>
>
>
>
> On 6/16/10, Michael McCandless  wrote:
> > On Wed, Jun 16, 2010 at 10:38 AM, Yonik Seeley
> >  wrote:
> >> On Tue, Jun 15, 2010 at 5:23 AM, Michael McCandless
> >>  wrote:
> >>> CheckIndex is not able to recover from this corruption (missing
> >>> segments_N file); this would be a nice addition...
> >>>
> >>> But it sounds like you've worked out a way to write your own
> segmetns_N?
> >>>
> >>> Use oal.store.ChecksumIndexOutput (wraps any other IndexOutput) to
> >>> p

Re: Question from a new user : IndexSearcher.doc

2010-06-21 Thread Erick Erickson
They're quite different beasts to use. SOLR will have you up and running
with some configuration very quickly, and if you're comfortable with servlet
containers, it'll be even faster. It has a DIH handler which will index data
from a database (again, with some configuration, but not necessarily
programming). SOLR has, out of the box, support for sharding, replication,
etc.

Lucene is a pure Java library that you have to write infrastructure for.
An understanding of Lucene, which SOLR uses under the covers can be
quite valuable.

But from what you've described, I suspect you'll be better off starting off
with
SOLR. You can add custom bits to SOLR if you need to, but it'll almost
certainly be some time before you do if you do. And it won't be as likely to
be
throw-away work as it would be if you started with Lucene then migrated
to Java.

Nutch is a web-crawler/indexer, so from what you've described Nutch isn't
a good match for what you're trying to do.

HTH
Erick


On Mon, Jun 21, 2010 at 3:29 AM, Victor Kabdebon
wrote:

> Hi Erick,
>
> Thank you very much for you explanations. 588 is a rather long way to go,
> so
> you're right maybe I won't need at the moment to care about that problem.
> To answer your final question : no indeed I won't need to store a lot of
> data. Just some keys  in order to find the data in Cassandra later on.
>
> If you don't mind, please let me ask you another question :
>
> Is it really interesting to begin with Lucene rather than directly with
> solR
> (or Nutch) ? What I mean by that is : is it the same difficulty to
> implement
> a search with solR and stay with it instead of first implement a search
> with
> Lucene, then when the project becomes very big change it to a new system ?
> My goal is to have that can evolve with time even if I have 1 million
> documents added daily ?
>
> Thank you,
> Victor
>
> 2010/6/21 Erick Erickson 
>
> > By and large, you won't ever actually be interested in very many
> documents,
> > what's returned in the TopDocs structure internal document ID and score,
> in
> > score order. But retrieval by document ID is quite efficient, it's not a
> > search. I'm quite sure this won't be a problem.
> >
> > Adding 10,000 documents a day means that in 588 years you'll exceed a
> > 31-bit
> > number. I don't think you really need to worry about that either. And
> > that's
> > the worst-case, assuming the ints are signed. And I believe that they're
> > unsigned anyway.
> >
> > What you will have to worry about is the time to get the top N
> > highest-scoring documents. That is, IndexSearcher.seach() will be your
> > limiting factor long before you reach these numbers. By that time,
> though,
> > you'll have moved to SOLR or some other distributed search mechanism.
> >
> > Performance is influenced by the complexity of the queries and the
> > structure
> > and size of your index. The time spent retrieving the top few matches is
> > completely dwarfed by the search time for an index of any size.
> >
> > All this may be irrelevant if you really want to retrieve a very large
> > number of documents rather than, say, the top 100. But the use case would
> > have to be very interesting for it to be a requirement to return, say,
> > 100,000 documents to a user.
> >
> > But do be aware that you're not retrieving the *original* text with
> > IndexSearcher. Typically, the relevant data is indexed but not stored
> These
> > two concepts are confusing when you start using Lucene, especially since
> > they're specified in the same call. Indexing a field splits it up into
> > tokens, normalizes it (e.g. lowercases, stems, puts in synonyms, etc).
> The
> > indexed data is the part that's searched. You can also store the input
> > verbatim, the but stored part is just a copy that's never searched but is
> > available for retrieval.
> >
> > Which brings up one of the central decisions you need to make. Are you,
> > indeed, going to store all the data for retrieval in your index or just
> > index the relevant text to be searched along with some locator
> information
> > to the original document? You mention Cassandra, which leads me to
> > speculate
> > that it's the latter.
> >
> > HTH
> > Erick
> >
> >
> > On Sun, Jun 20, 2010 at 4:04 PM, Victor Kabdebon
> > wrote:
> >
> > > Hello Simon,
> > >
> > > As I told you, I am quite new with Lucene, so there are many things
> that
> > > might be wrong.
> > > I'm using Lucene to make a search service for a website that has a
> large
> > > amount of information daily. This amount of information is directly
> > avaible
> > > as text in a Cassandra Database.
> > > There might be as much as 10.000 new documents added daily, and yes my
> > > concern is it possible to retrieve more documents than the integer max
> > > value
> > > ?
> > > I don't really see also how the IndexSearcher.doc( ) really works,
> > because
> > > it seems like we give this method an ID and it is going to search in
> the
> > > indexed documents. So what exactly is go

search hits not returned until I stop and restart application

2010-06-21 Thread andynuss

Hi,

I have an IndexWriter singleton in my program, and an IndexSearcher
singleton based on a readonly IndexReader singleton.  When I use the
IndexWriter to index a large document to lucene, and then, while the program
is still running, use my previously created IndexSearcher to find hits in
that book, they are not found.  But if I stop and restart the application,
then they are found.

Andy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/search-hits-not-returned-until-I-stop-and-restart-application-tp911711p911711.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: search hits not returned until I stop and restart application

2010-06-21 Thread Steven A Rowe
Hi Andy,

From the API docs for IndexWriter 
:

[D]ocuments are added with addDocument and removed
with deleteDocuments(Term) or deleteDocuments(Query).
A document can be updated with updateDocument (which
just deletes and then adds the entire document).
When finished adding, deleting and updating documents, 
close should be called.

These changes  are not visible to IndexReader
until either commit() or close() is called.

So you gotta call commit() or close().  Once you've done that, you can reduce 
the (expensive) cost of opening a new IndexReader by calling reopen():



Steve

> -Original Message-
> From: andynuss [mailto:andrew_n...@yahoo.com]
> Sent: Monday, June 21, 2010 11:02 AM
> To: java-user@lucene.apache.org
> Subject: search hits not returned until I stop and restart application
> 
> 
> Hi,
> 
> I have an IndexWriter singleton in my program, and an IndexSearcher
> singleton based on a readonly IndexReader singleton.  When I use the
> IndexWriter to index a large document to lucene, and then, while the
> program is still running, use my previously created IndexSearcher to find
> hits in that book, they are not found.  But if I stop and restart the
> application, then they are found.
> 
> Andy
> --
> View this message in context: http://lucene.472066.n3.nabble.com/search-
> hits-not-returned-until-I-stop-and-restart-application-
> tp911711p911711.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: search hits not returned until I stop and restart application

2010-06-21 Thread andynuss

"So you gotta call commit() or close().  Once you've done that, you can
reduce the (expensive) cost of opening a new IndexReader by calling
reopen(): "

Steve,

I tried this, and I must have done something wrong.

After my document set was ingested, I called a function which (1) called the
IndexWriter singleton commit() function, (2) then called the IndexReader
singleton reopen() function (no arguments).  (My IndexReader is read only.)  
Still didn't find hits in that book.  Then I tried (3) creating a new
IndexSearcher on top of this IndexReader and that also didn't help.

Wonder what I could be doing wrong.

Andy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/search-hits-not-returned-until-I-stop-and-restart-application-tp911711p912096.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: search hits not returned until I stop and restart application

2010-06-21 Thread Steven A Rowe
Andy, it sounds like you're doing the right thing.

Maybe you aren't using the IndexReader instance returned by reopen(), but 
instead are continuing to use the instance on which you called reopen()?  It's 
tough to figure this kind of thing out without looking at the code.

For example, what do you mean by "singleton"? (You mentioned this in reference 
to both IndexWriter and IndexReader.)  Is it possible that some part of your 
code is maintaining a reference to the original IndexReader instance and using 
it, rather than using the newly opened instance?

Steve

> -Original Message-
> From: andynuss [mailto:andrew_n...@yahoo.com]
> Sent: Monday, June 21, 2010 1:29 PM
> To: java-user@lucene.apache.org
> Subject: RE: search hits not returned until I stop and restart application
> 
> 
> "So you gotta call commit() or close().  Once you've done that, you can
> reduce the (expensive) cost of opening a new IndexReader by calling
> reopen(): "
> 
> Steve,
> 
> I tried this, and I must have done something wrong.
> 
> After my document set was ingested, I called a function which (1) called
> the IndexWriter singleton commit() function, (2) then called the
> IndexReader singleton reopen() function (no arguments).  (My IndexReader
> is read only.) Still didn't find hits in that book.  Then I tried (3)
> creating a new IndexSearcher on top of this IndexReader and that also
> didn't help.
> 
> Wonder what I could be doing wrong.
> 
> Andy
> --
> View this message in context: http://lucene.472066.n3.nabble.com/search-
> hits-not-returned-until-I-stop-and-restart-application-
> tp911711p912096.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: search hits not returned until I stop and restart application

2010-06-21 Thread andynuss

"Maybe you aren't using the IndexReader instance returned by reopen(), but
instead are continuing to use the instance on which you called reopen()? 
It's tough to figure this kind of thing out without looking at the code."

That was it, I was not using the newly (re)opened index.  By the way, one
last question.  It doesn't matter for this because I'm indexing one huge
document at a time, and then committing.  But later, I will also be indexing
very small documents frequently.  In that case, it would seem that if I
index a very small document, I don't want to be thrashing with a commit
after each one, and then a reopen of the reader and reconstruction of my
searcher.  Do others manage this type of thing with a thread that fires at
intervals to commit if dirty?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/search-hits-not-returned-until-I-stop-and-restart-application-tp911711p912345.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: search hits not returned until I stop and restart application

2010-06-21 Thread Steven A Rowe
Andy,

I think batching commits either by time or number of documents is common.

Do you know about NRT (Near Realtime Search)?: 
.  Using 
IndexWriter.getReader(), you can avoid commits altogether, as well as reducing 
update->search latency.  See IndexWriter.getReader() javadocs for more details: 
.

Depending on requirements, these two strategies can be combined.

Steve

> -Original Message-
> From: andynuss [mailto:andrew_n...@yahoo.com]
> Sent: Monday, June 21, 2010 2:44 PM
> To: java-user@lucene.apache.org
> Subject: RE: search hits not returned until I stop and restart application
> 
> 
> "Maybe you aren't using the IndexReader instance returned by reopen(), but
> instead are continuing to use the instance on which you called reopen()?
> It's tough to figure this kind of thing out without looking at the code."
> 
> That was it, I was not using the newly (re)opened index.  By the way, one
> last question.  It doesn't matter for this because I'm indexing one huge
> document at a time, and then committing.  But later, I will also be
> indexing very small documents frequently.  In that case, it would seem
> that if I index a very small document, I don't want to be thrashing with a
> commit after each one, and then a reopen of the reader and reconstruction
> of my searcher.  Do others manage this type of thing with a thread that
> fires at intervals to commit if dirty?