SV: SV: OutOfMemoryError tokenizing a boring text file

2007-09-11 Thread Per Lindberg
> Från: Chris Hostetter [mailto:[EMAIL PROTECTED] > : Setting writer.setMaxFieldLength(5000) (default is 1) > : seems to eliminate the risk for an OutOfMemoryError, > > that's because it now gives up after parsing 5000 tokens. > > : To me, it appears that simply calling > :new Field("c

SV: OutOfMemoryError tokenizing a boring text file

2007-09-03 Thread Per Lindberg
g lots of issues. > > > > - > > AZ > > > > On 9/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > >> > >> I can't answer the question of why the same token > >> takes up memory, but I've indexed far more than > >> 20M

OutOfMemoryError tokenizing a boring text file

2007-08-31 Thread Per Lindberg
I'm creating a tokenized "content" Field from a plain text file using an InputStreamReader and new Field("content", in); The text file is large, 20 MB, and contains zillions lines, each with the the same 100-character token. That causes an OutOfMemoryError. Given that all tokens are the *same*,

SV: Caching IndexSearcher in a webapp [was: Find "latest" document (before a certain date)]

2007-08-29 Thread Per Lindberg
Kalle and Patrick: many thanks for the suggestions! Caching the IndexSearcher in the ServletContext sounds like a very good idea. However, I have to index a number of databases, each with a different Lucene index. So keeping an IndexSearcher for each may come with a prohibitive memory cost. But as

Caching IndexSearcher in a webapp [was: Find "latest" document (before a certain date)]

2007-08-29 Thread Per Lindberg
> Från: Karl Wettin [mailto:[EMAIL PROTECTED] > 29 aug 2007 kl. 12.29 skrev Per Lindberg: > > >> how about using a RangeQuery and pick the hit with the > >> greatest document number? > > > > Yep, that did the trick! There seems to be no Filter that can

SV: Find "latest" document (before a certain date)

2007-08-29 Thread Per Lindberg
> Från: Karl Wettin [mailto:[EMAIL PROTECTED] > 28 aug 2007 kl. 17.48 skrev Per Lindberg: > > > Now, I want to search the content, and return only the > > LATEST found document with each id. To complicate > > things a bit, I want the latest before a given date. In o

Find "latest" document (before a certain date)

2007-08-28 Thread Per Lindberg
trick. The query syntax does not seem to support a question like "for each vaule of the id field among the found hits, give me the one with the highest date less than x"... Cheers, Per Lindberg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]