Bit of thread necromancy here, but I figured it was relevant because
we get exactly the same error.

On Thu, Jan 19, 2012 at 12:47 AM, Michael McCandless
<luc...@mikemccandless.com> wrote:
> Hmm, are you certain your RAM buffer is 3 MB?
>
> Is it possible you are indexing an absurdly enormous document...?

We're seeing a case here where the document certainly could qualify as
"absurdly enormous". The doc itself is 2GB in size and the
tokenisation is per-character, not per-word, so the number of
generated terms must be enormous. Probably enough to fill 2GB...

So I'm wondering if there is more info somewhere on why this is (or
was? We're still using 3.6.x) a limit and whether it can be detected
up-front. Some large amount of indexing time (~30 minutes) could be
avoided if we can detect that it would have failed ahead of time.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to