: Subject: Size of Document
: To: java-user@lucene.apache.org
: References:
:
:
: Message-ID:
: In-Reply-To:
:
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing
For the record, this is made even more complex by the fact that the disk
footprint of a document depends on other documents that are indexed nearby
in the same segment, and can change over merges.
Le jeu. 5 juil. 2018 à 08:22, Chris Bamford a écrit :
> Yes I see, I originally missed Terry’s resp
Yes I see, I originally missed Terry’s response which is probably the source of
the confusion.
So to clarify: I already know the size of the source document. As you say, this
bears little resemblance to what actually gets written when indexed. It is this
latter figure I was hoping to get.
Than
I think we're not talking about the same thing.
You asked "How can I calculate the total size of a Lucene Document"...
I was responding to the Terry's comment "In the document types I
usually index (.pdf, .docx/.doc, .eml), there exists a metadata field
called "stream_size" that contains the size
Hi Erick
Yes, size on disk is what I’m after as it will feed into an eventual
calculation regarding actual bytes written (not interested in the source data
document size, just real disk usage).
Thanks
Chris
Sent from my iPhone
> On 4 Jul 2018, at 17:08, Erick Erickson wrote:
>
> But does s
But does size on disk help? If the doc has a zillion
images in it, those aren't part of the resulting index
(I'm excluding stored data here)
On Wed, Jul 4, 2018 at 7:49 AM, Terry Steichen wrote:
> In the document types I usually index (.pdf, .docx/.doc, .eml), there
> exists a metadata field
In the document types I usually index (.pdf, .docx/.doc, .eml), there
exists a metadata field called "stream_size" that contains the size of
the document on disk. You don't have to compute it. Thus, when you
retrieve each document you can pull out the contents of this field and,
if you like, incl
It was called IndexWriter.ramSizeInBytes() in 4.10.3.
Le mer. 4 juil. 2018 à 15:35, Chris Bamford a écrit :
> > IndexWriter.ramBytesUsed() gives you access to the current memory usage
> of
> > IndexWriter's buffers, but it can't tell you by how much it increased
> for a
> > given document assumi
> IndexWriter.ramBytesUsed() gives you access to the current memory usage of
> IndexWriter's buffers, but it can't tell you by how much it increased for a
> given document assuming concurrent access to the IndexWriter.
>
Thanks, although I can’t find that API. Is there an equivalent call for Lucen
IndexWriter.ramBytesUsed() gives you access to the current memory usage of
IndexWriter's buffers, but it can't tell you by how much it increased for a
given document assuming concurrent access to the IndexWriter.
Le mer. 4 juil. 2018 à 15:13, Chris Bamford a écrit :
> Hello Adrien,
>
>
> >
> > T
Hello Adrien,
>
> There is no way to compute the byte size of a document.
I feared that!
> Also note that the
> relationship between the size of a document and how much space it will use
> in the Lucene index is quite complex.
>
I understand. I was wondering if there was maybe some sneaky way
Hello,
There is no way to compute the byte size of a document. Also note that the
relationship between the size of a document and how much space it will use
in the Lucene index is quite complex.
Le mer. 4 juil. 2018 à 11:26, Chris and Helen Bamford a
écrit :
> Hi there,
>
> How can I calculate
Hi there,
How can I calculate the total size of a Lucene Document that I'm about
to write to an index so I know how many bytes I am writing please? I
need it for some external metrics collection.
Thanks
- Chris
-
To unsubs
13 matches
Mail list logo