Re: Size of Document

2018-07-05 Thread Chris Hostetter
: Subject: Size of Document : To: java-user@lucene.apache.org : References: : : : Message-ID: : In-Reply-To: : https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing

Re: Size of Document

2018-07-05 Thread Adrien Grand
For the record, this is made even more complex by the fact that the disk footprint of a document depends on other documents that are indexed nearby in the same segment, and can change over merges. Le jeu. 5 juil. 2018 à 08:22, Chris Bamford a écrit : > Yes I see, I originally missed Terry’s resp

Re: Size of Document

2018-07-04 Thread Chris Bamford
Yes I see, I originally missed Terry’s response which is probably the source of the confusion. So to clarify: I already know the size of the source document. As you say, this bears little resemblance to what actually gets written when indexed. It is this latter figure I was hoping to get. Than

Re: Size of Document

2018-07-04 Thread Erick Erickson
I think we're not talking about the same thing. You asked "How can I calculate the total size of a Lucene Document"... I was responding to the Terry's comment "In the document types I usually index (.pdf, .docx/.doc, .eml), there exists a metadata field called "stream_size" that contains the size

Re: Size of Document

2018-07-04 Thread Chris Bamford
Hi Erick Yes, size on disk is what I’m after as it will feed into an eventual calculation regarding actual bytes written (not interested in the source data document size, just real disk usage). Thanks Chris Sent from my iPhone > On 4 Jul 2018, at 17:08, Erick Erickson wrote: > > But does s

Re: Size of Document

2018-07-04 Thread Erick Erickson
But does size on disk help? If the doc has a zillion images in it, those aren't part of the resulting index (I'm excluding stored data here) On Wed, Jul 4, 2018 at 7:49 AM, Terry Steichen wrote: > In the document types I usually index (.pdf, .docx/.doc, .eml), there > exists a metadata field

Re: Size of Document

2018-07-04 Thread Terry Steichen
In the document types I usually index (.pdf, .docx/.doc, .eml), there exists a metadata field called "stream_size" that contains the size of the document on disk.  You don't have to compute it.  Thus, when you retrieve each document you can pull out the contents of this field and, if you like, incl

Re: Size of Document

2018-07-04 Thread Adrien Grand
It was called IndexWriter.ramSizeInBytes() in 4.10.3. Le mer. 4 juil. 2018 à 15:35, Chris Bamford a écrit : > > IndexWriter.ramBytesUsed() gives you access to the current memory usage > of > > IndexWriter's buffers, but it can't tell you by how much it increased > for a > > given document assumi

Re: Size of Document

2018-07-04 Thread Chris Bamford
> IndexWriter.ramBytesUsed() gives you access to the current memory usage of > IndexWriter's buffers, but it can't tell you by how much it increased for a > given document assuming concurrent access to the IndexWriter. > Thanks, although I can’t find that API. Is there an equivalent call for Lucen

Re: Size of Document

2018-07-04 Thread Adrien Grand
IndexWriter.ramBytesUsed() gives you access to the current memory usage of IndexWriter's buffers, but it can't tell you by how much it increased for a given document assuming concurrent access to the IndexWriter. Le mer. 4 juil. 2018 à 15:13, Chris Bamford a écrit : > Hello Adrien, > > > > > > T

Re: Size of Document

2018-07-04 Thread Chris Bamford
Hello Adrien, > > There is no way to compute the byte size of a document. I feared that! > Also note that the > relationship between the size of a document and how much space it will use > in the Lucene index is quite complex. > I understand. I was wondering if there was maybe some sneaky way

Re: Size of Document

2018-07-04 Thread Adrien Grand
Hello, There is no way to compute the byte size of a document. Also note that the relationship between the size of a document and how much space it will use in the Lucene index is quite complex. Le mer. 4 juil. 2018 à 11:26, Chris and Helen Bamford a écrit : > Hi there, > > How can I calculate

Size of Document

2018-07-04 Thread Chris and Helen Bamford
Hi there, How can I calculate the total size of a Lucene Document that I'm about to write to an index so I know how many bytes I am writing please?  I need it for some external metrics collection. Thanks - Chris - To unsubs