https://issues.apache.org/bugzilla/show_bug.cgi?id=53816
Priority: P2 Bug ID: 53816 Assignee: dev@poi.apache.org Summary: Extracted word count is incorrect Severity: normal Classification: Unclassified OS: Linux Reporter: luc...@mikemccandless.com Hardware: PC Status: NEW Version: 3.9-dev Component: HPSF Product: POI Created attachment 29316 --> https://issues.apache.org/bugzilla/attachment.cgi?id=29316&action=edit Word document showing incorrect PID_WORDCOUNT=11 I have a Word doc (attached) that has 6 words, plus an embedded PDF document (not sure that's relevant). When I view the word count with Word it correctly says 6. But when I run org.apache.poi.hpsf.extractor.HPSFPropertiesExtractor the word count incorrectly says 11: 1 = 1252 PID_TITLE = PID_SUBJECT = PID_AUTHOR = IBMer PID_KEYWORDS = PID_TEMPLATE = Normal.dot PID_LASTAUTHOR = IBMer PID_REVNUMBER = 3 PID_APPNAME = Microsoft Office Word PID_EDITTIME = Sun Dec 31 19:03:00 EST 1600 PID_CREATE_DTM = Tue Jul 17 07:16:00 EDT 2012 PID_LASTSAVE_DTM = Mon Jul 23 07:21:00 EDT 2012 PID_PAGECOUNT = 1 PID_WORDCOUNT = 11 PID_CHARCOUNT = 55 PID_SECURITY = 0 PID_CODEPAGE = 1252 PID_COMPANY = IBM PID_LINECOUNT = 1 PID_PARCOUNT = 1 17 = 65 23 = 730895 PID_SCALE = false PID_LINKSDIRTY = false 19 = false 22 = false PID_DOCPARTS = -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org