https://issues.apache.org/bugzilla/show_bug.cgi?id=53816

          Priority: P2
            Bug ID: 53816
          Assignee: dev@poi.apache.org
           Summary: Extracted word count is incorrect
          Severity: normal
    Classification: Unclassified
                OS: Linux
          Reporter: luc...@mikemccandless.com
          Hardware: PC
            Status: NEW
           Version: 3.9-dev
         Component: HPSF
           Product: POI

Created attachment 29316
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=29316&action=edit
Word document showing incorrect PID_WORDCOUNT=11

I have a Word doc (attached) that has 6 words, plus an embedded PDF document
(not sure that's relevant).  When I view the word count with Word it correctly
says 6.  But when I run org.apache.poi.hpsf.extractor.HPSFPropertiesExtractor
the word count incorrectly says 11:

1 = 1252
PID_TITLE = 
PID_SUBJECT = 
PID_AUTHOR = IBMer
PID_KEYWORDS = 
PID_TEMPLATE = Normal.dot
PID_LASTAUTHOR = IBMer
PID_REVNUMBER = 3
PID_APPNAME = Microsoft Office Word
PID_EDITTIME = Sun Dec 31 19:03:00 EST 1600
PID_CREATE_DTM = Tue Jul 17 07:16:00 EDT 2012
PID_LASTSAVE_DTM = Mon Jul 23 07:21:00 EDT 2012
PID_PAGECOUNT = 1
PID_WORDCOUNT = 11
PID_CHARCOUNT = 55
PID_SECURITY = 0
PID_CODEPAGE = 1252
PID_COMPANY = IBM
PID_LINECOUNT = 1
PID_PARCOUNT = 1
17 = 65
23 = 730895
PID_SCALE = false
PID_LINKSDIRTY = false
19 = false
22 = false
PID_DOCPARTS =

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to