[ 
https://issues.apache.org/jira/browse/TIKA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison closed TIKA-2589.
-----------------------------
    Resolution: Not A Problem

Thank you for opening this issue.

MSWord calculates page counts dynamically and IMHO rarely stores the actual 
page count for a document, rather, it typically stores "1", which is incorrect. 
 If you add .zip to your file, unzip it, and look in docProps/app.xml, you'll 
see:

{noformat}
<Pages>1</Pages><Words>127171</Words><Characters>724878</Characters>
{noformat}

It is beyond the scope of Tika to calculate page counts dynamically, and so, we 
rely on whatever MSWord stored in the document.

> Wrong page count detection (docx from dotm template)
> ----------------------------------------------------
>
>                 Key: TIKA-2589
>                 URL: https://issues.apache.org/jira/browse/TIKA-2589
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.17
>         Environment: $ java -version
> java version "1.8.0_161"
> Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode
> OS Version: 6.1.7601 Service Pack 1 сборка 7601
>            Reporter: Leonid Korsakov
>            Priority: Major
>         Attachments: 262 страницы.docx
>
>
> I have docx file cteated from dotm template. When I call 
> {code:java}
> java -jar tika-app.jar -m path_to_file
> {code}
> i see xmpTPg:NPages: 1 but docx file contain 262 pages count



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to