OutOfMemoryError tokenizing a boring text file

Per Lindberg Fri, 31 Aug 2007 07:17:29 -0700

I'm creating a tokenized "content" Field from a plain text file
using an InputStreamReader and new Field("content", in);


The text file is large, 20 MB, and contains zillions lines,
each with the the same 100-character token.

That causes an OutOfMemoryError.

Given that all tokens are the *same*,
why should this cause an OutOfMemoryError?
Shouldn't StandardAnalyzer just chug along
and just note "ho hum, this token is the same"?
That shouldn't take too much memory.

Or have I missed something?




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

OutOfMemoryError tokenizing a boring text file

Reply via email to