https://issues.apache.org/bugzilla/show_bug.cgi?id=53951

          Priority: P2
            Bug ID: 53951
          Assignee: dev@poi.apache.org
           Summary: java.io.UnsupportedEncodingException: Codepage number
                    may not be 0
          Severity: normal
    Classification: Unclassified
                OS: other
          Reporter: m...@nearbyfyi.com
          Hardware: Macintosh
            Status: NEW
           Version: unspecified
         Component: HPSF
           Product: POI

Hi,

I'm using Nutch to crawl websites, using Tika to parse documents. Encountered
the following ERROR and thought that this would be the place to log it.

2012-09-22 22:30:03,321 ERROR tika.TikaParser - Error parsing
http://www.montpelier-vt.org/upload/groups/384/files/meac_11.17.10.doc
java.io.UnsupportedEncodingException: Codepage number may not be 0
    at
org.apache.poi.hpsf.VariantSupport.codepageToEncoding(VariantSupport.java:338)
    at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:240)
    at org.apache.poi.hpsf.Property.<init>(Property.java:164)
    at org.apache.poi.hpsf.Section.<init>(Section.java:277)
    at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:452)
    at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:247)
    at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:67)
    at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:57)
    at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182)
    at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:124)
    at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:36)
    at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:23)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:680)
2012-09-22 22:30:03,322 WARN  parse.ParseUtil - Unable to successfully parse
content http://www.montpelier-vt.org/upload/groups/384/files/meac_11.17.10.doc
of type application/x-tika-msoffice

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to