[jira] [Updated] (TIKA-762) EXIF extraction from PNG images

2011-10-26 Thread Nick Burch (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-762: Attachment: training.png The attached file "training.png" is an example PNG which contains EXIF metadata.

[jira] [Created] (TIKA-762) EXIF extraction from PNG images

2011-10-26 Thread Nick Burch (Created) (JIRA)
EXIF extraction from PNG images --- Key: TIKA-762 URL: https://issues.apache.org/jira/browse/TIKA-762 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.0 Report

Build failed in Jenkins: Tika-trunk #692

2011-10-26 Thread Apache Jenkins Server
See Changes: [jukka] Summarize changelog entries by feature rather than by issue [jukka] TIKA-565: Improved OSGi bundling Don't use the context class loader of the current thread as the default. This helps prevent underterministic results i

Re: Updating CHANGES.txt?

2011-10-26 Thread Jukka Zitting
Hi, On Thu, Oct 20, 2011 at 2:26 PM, Michael McCandless wrote: > But I think API changes, issues a user has hit, new features, changes > in behavior, we really should include.  Generally, when I'm unsure, I > try to err on the side of "being verbose". See revision 1189334 for how we can improve

Re: Google's Compact Language Detector

2011-10-26 Thread reinhard schwab
i have also compared tika performance with the nutch language detector in version 1.0. it seems that nutch is far better in performance than tika ( 5 to 6 times faster than nutch). but my use case is so special (short texts ~ 140 characters length) and i dont have time to investigate, so i have no

[jira] [Commented] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Uwe Schindler (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136090#comment-13136090 ] Uwe Schindler commented on TIKA-736: {quote} Oh, this is because XHTMLContentHandler, on

Re: Tika is waiting for ODFToolkit to improve ODF file format processing

2011-10-26 Thread Michael McCandless
On Tue, Oct 25, 2011 at 5:40 PM, Rob Weir wrote: > Is there a list of the complete set of tags you use, or a schema or something? Hmm, I think technically any tags that are valid XHTML is fair game, but in practice the parsers seems to use a very limited set of tags (table/td/tr, a, img, p, br,

[jira] [Commented] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136070#comment-13136070 ] Michael McCandless commented on TIKA-736: - bq. Can you also check that parsing style

[jira] [Commented] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136018#comment-13136018 ] Michael McCandless commented on TIKA-582: - bq. Also, when I re-run my test (from the

[jira] [Resolved] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Michael McCandless (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved TIKA-582. - Resolution: Fixed Fix Version/s: (was: 0.9) 1.0 Thansk Žygima

[jira] [Commented] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Uwe Schindler (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135996#comment-13135996 ] Uwe Schindler commented on TIKA-736: Hi Michael, thanks for this simple improvement. Ca

[jira] [Commented] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Michael McCandless (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135914#comment-13135914 ] Michael McCandless commented on TIKA-582: - Thanks Žygimantas! When testing Tika's l

[jira] [Reopened] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Michael McCandless (Reopened) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened TIKA-582: - Assignee: Michael McCandless (was: Jukka Zitting) Reopen to switch to fixed ngp.

[jira] [Commented] (TIKA-761) Provide version number by CLI argument -V

2011-10-26 Thread Ingo Renner (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13135885#comment-13135885 ] Ingo Renner commented on TIKA-761: -- hmm, can't get it to work for me, stream is null. What

[jira] [Updated] (TIKA-582) Lithuanian language identification

2011-10-26 Thread Updated
[ https://issues.apache.org/jira/browse/TIKA-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Žygimantas Medelis updated TIKA-582: Attachment: lt.ngp The previous file had wrong ngrams, they included quote symbols. Place upd

[jira] [Updated] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Michael McCandless (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated TIKA-736: Attachment: TIKA-736.patch This turned out to be fairly simple to fix, so I worked out a patc

[jira] [Assigned] (TIKA-736) OpenOffice parser: master footer text isn't extracted

2011-10-26 Thread Michael McCandless (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned TIKA-736: --- Assignee: Michael McCandless > OpenOffice parser: master footer text isn't extracte

[jira] [Resolved] (TIKA-746) Support custom mime types

2011-10-26 Thread Jukka Zitting (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-746. Resolution: Fixed Fix Version/s: (was: 1.1) 1.0 There was a backwards c