Sorry for the double email.

Also, I can provide more detailed data, but its big and not emailable.

From: Nate Bellowe [mailto:nath...@windward.net]
Sent: Friday, August 26, 2016 11:06 AM
To: dev@openoffice.apache.org
Cc: David Thielen <da...@windward.net>
Subject: Line Spacing Comparison and Issues with Docx

Hello! Sorry to intrude on your dev mailing list, this may seem a little off 
topic, but I have been banging my head on this forever, and I know you guys are 
the experts!  We can also all get some help out of this, perhaps!

I have been running into an issue that I know the developers of openOffice have 
likely faced before! I was wondering if the developers that have worked on this 
before would like to talk some about this, I'd really appreciate it!

Basically, I have some questions on how to calculate the line spacing between 
lines, when parsing and rendering a docx file.

My requirement is to exactly match Word, not necessarily the OOXML spec, in the 
spacing between lines in a simple paragraph.

In order to try to do this, I have built a tool to analyze the differences 
between my layout and Word's layout. To do so it does the following:

- First it generates a (or many) docx files.
- Next it creates pdfs from the docx files. It uses Word to render the docx to 
PDF, and my program to render the docx to PDF. "word.pdf", and "me.pdf"
- Then it analyzes the resulting PDFs for differences in layout.

So, my tool would say:

- Create a document "template.docx" with 1000 "a" characters in a single run of 
text with the same properties.
- Make a "word.pdf" and "me.pdf" from this docx
- Calculate info from the pdfs, in particular, calculating the line spacing in 
terms of the calculated leading between a lines ascent and the previous lines 
descent (our (Ascent + Descent) are identical-ish, so all that differs is the 
whitespace between lines). I often think of it as the lines whitespace...

This tool showed me that the leading varies greatly from font to font.

To depict this, I used the tool to make thousands of these comparisons, in 
particular generating for:

- For each font in system
- For "a", "y", and a mix of letters and spaces.
- For different font sizes.
- For different line spacing types (Single, One and a half, and Double)

I was hoping to find groupings, such as "this type of font has 1.3 times my 
calculation of leading".

I was able to conclude far less than I had hoped, and was wondering if you 
could help me further with the issue of calculating line spacing. I'm providing 
you with a file that is best downloaded and opened using the filters in the 
header row. Note that its not totally complete, there are missing entries, but 
I doubt they will be a problem for anyone, and I'm going to regenerate it soon 
but its pretty slow, so I'm finishing up some changes to it first.

Here is a comparison of the layout of our software, vs the layout of Word's for 
every font installed on my system, etc. (attached and linked)
https://drive.google.com/file/d/0BzQpUdPjnJUUclRXVXFkaEh3Mms/view?usp=sharing

I'm not positive, but I believe the issue could be one of the following:

- Word is using a different process than we are to calculate the "leading" of a 
font. We don't parse the font files ourselves, instead rely on libraries to get 
font sizing information, and perhaps in the "world of font files" I am missing 
something, and word is parsing the fonts directly and differently.
- Word has some sort of lookup table that handles groups of fonts, or an 
algorithm, that scales a fonts leadings up or down based on some criteria I am 
unaware of.
- Word is using an additional criteria besides leading, ascent, and descent, to 
determine line spacing.

Please feel free to email me at 
nath...@windward.net<mailto:nath...@windward.net>

Thank you so much for your time!!

I know you guys aren't trying to emulate Word, (I very much enjoy Writer) but 
am sure you've had complaints of people opening documents made in Word that 
there friend sent them, and having major formatting differences, such as 
different pagination.

Microsoft is famous for having been both open and closed about this spec, and 
thus allowing its wide-spread adoption without proper competition, and I think 
that openness between communities trying to work with that is very helpful. 
We'd be happy to exchange information about quirks and things that differ from 
the spec that we find.

Reply via email to