I can't answer the question of why the same token
takes up memory, but I've indexed far more than
20M of data in a single document field. As in on the
order of 150M. Of course I allocated 1G or so to the
JVM, so you might try that....

Best
Erick

On 8/31/07, Per Lindberg <[EMAIL PROTECTED]> wrote:
>
> I'm creating a tokenized "content" Field from a plain text file
> using an InputStreamReader and new Field("content", in);
>
> The text file is large, 20 MB, and contains zillions lines,
> each with the the same 100-character token.
>
> That causes an OutOfMemoryError.
>
> Given that all tokens are the *same*,
> why should this cause an OutOfMemoryError?
> Shouldn't StandardAnalyzer just chug along
> and just note "ho hum, this token is the same"?
> That shouldn't take too much memory.
>
> Or have I missed something?
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to