[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716318#comment-14716318 ] Jukka Zitting commented on TIKA-1706: - Note that o.a.tika.io is a part of the public AP

[jira] [Resolved] (TIKA-1722) Tika methods that accept a File needlessly convert it to a URL

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1722. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Committed in revision 1698100. My original

[jira] [Resolved] (TIKA-1721) Replace IOExceptionWithCause in ForkClient

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1721. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Committed in 1698101. PS. Note that we bre

[jira] [Resolved] (TIKA-1720) Collect multiple exceptions in TemporaryResources.close() using Throwable.addSuppressed()

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1720. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Committed in revision 1698150. > Collect m

[jira] [Resolved] (TIKA-1719) Utilize try-with-resources where it is trivial

2015-08-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1719. - Resolution: Fixed Assignee: Jukka Zitting Committed in revision 1700195. Thanks a lot for thi

[jira] [Commented] (TIKA-1672) Integrate tika-java7 component

2015-08-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722701#comment-14722701 ] Jukka Zitting commented on TIKA-1672: - I'm actually not sure if we should do this. The

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725703#comment-14725703 ] Jukka Zitting commented on TIKA-1726: - Could createTemporaryFile() be accompanied with

[jira] [Commented] (TIKA-2001) Parsing XML outputs empty string

2016-06-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322710#comment-15322710 ] Jukka Zitting commented on TIKA-2001: - By default Tika only extracts the text between X

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813593#comment-16813593 ] Jukka Zitting commented on TIKA-2849: - There's a related TODO in {{detectZipFormat()}}

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821995#comment-16821995 ] Jukka Zitting commented on TIKA-2849: - How about something like this: {code:java}    

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822038#comment-16822038 ] Jukka Zitting commented on TIKA-2849: - Exactly (sorry, s/n/maxBytesToSpool/). That sh

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822055#comment-16822055 ] Jukka Zitting commented on TIKA-2849: - SGTM! Alternatively you could overload the getF

[jira] [Commented] (TIKA-565) Improved OSGi bundling

2011-08-15 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085337#comment-13085337 ] Jukka Zitting commented on TIKA-565: Patch committed in revision 1158018. > Improved OS

[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

2011-08-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087610#comment-13087610 ] Jukka Zitting commented on TIKA-683: > Just in case it can't be done with subclassing, a

[jira] [Updated] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-692: --- Attachment: 0001-TIKA-692-TikaCLI-x-or-h-on-a-Word-doc-sometimes-adds.patch I think the extra whitespac

[jira] [Issue Comment Edited] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088215#comment-13088215 ] Jukka Zitting edited comment on TIKA-692 at 8/20/11 4:30 PM: - Th

[jira] [Updated] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-692: --- Attachment: 0002-TIKA-692-TikaCLI-x-or-h-on-a-Word-doc-sometimes-adds.patch The content emitted by man

[jira] [Reopened] (TIKA-651) Unescaped attribute value generated

2011-08-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting reopened TIKA-651: Reopening based on discussion in TIKA-692. Could there be a way to implement this without the dependency

[jira] [Resolved] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-692. Resolution: Fixed Assignee: Jukka Zitting I committed all the patches, thus resolving this as f

[jira] [Resolved] (TIKA-447) Container aware mimetype detection

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-447. Resolution: Fixed Fix Version/s: 1.0 As suggested above, I moved the detector classes from o.a

[jira] [Commented] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088375#comment-13088375 ] Jukka Zitting commented on TIKA-692: {quote} -prettyPrint option {quote} Sounds OK to m

[jira] [Commented] (TIKA-676) Boilerpipe fails

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088377#comment-13088377 ] Jukka Zitting commented on TIKA-676: We can only update the boilerpipe dependency once t

[jira] [Resolved] (TIKA-677) Installing Tika 0.9 using Maven fails tests

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-677. Resolution: Duplicate Fix Version/s: (was: 1.0) Resolving as a duplicate of TIKA-551. > I

[jira] [Commented] (TIKA-648) Parsing HTML anchors with embedded div faulty

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088381#comment-13088381 ] Jukka Zitting commented on TIKA-648: This seems to be a result of TagSoup normalizing th

[jira] [Resolved] (TIKA-667) Changes to RFC822Parser to support turning off strict parsing

2011-08-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-667. Resolution: Fixed Assignee: Jukka Zitting Thanks! Patch committed in revision 1160018. Note th

[jira] [Commented] (TIKA-434) Bug in TagSoup causes IOException

2011-08-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089369#comment-13089369 ] Jukka Zitting commented on TIKA-434: TagSoup 1.2.1 is finally available, so in revision

[jira] [Commented] (TIKA-676) Boilerpipe fails

2011-08-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089401#comment-13089401 ] Jukka Zitting commented on TIKA-676: See [1] for why can't/shouldn't depend on external

[jira] [Commented] (TIKA-651) Unescaped attribute value generated

2011-08-24 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090159#comment-13090159 ] Jukka Zitting commented on TIKA-651: bq. Is it so bad to add a dependency on Xerce's/Xal

[jira] [Created] (TIKA-699) Automatic checks against backwards-incompatible API changes

2011-08-26 Thread Jukka Zitting (JIRA)
Automatic checks against backwards-incompatible API changes --- Key: TIKA-699 URL: https://issues.apache.org/jira/browse/TIKA-699 Project: Tika Issue Type: Improvement Repor

[jira] [Updated] (TIKA-699) Automatic checks against backwards-incompatible API changes

2011-08-26 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-699: --- Attachment: 0001-TIKA-699-Automatic-checks-against-backwards-incompat.patch The attached patch adds the

[jira] [Created] (TIKA-701) Fix problems with TemporaryFiles

2011-08-31 Thread Jukka Zitting (JIRA)
Fix problems with TemporaryFiles Key: TIKA-701 URL: https://issues.apache.org/jira/browse/TIKA-701 Project: Tika Issue Type: Bug Affects Versions: 0.9 Reporter: Jukka Zitting Fix

[jira] [Created] (TIKA-703) Drop deprecated methods/classes/interfaces

2011-09-01 Thread Jukka Zitting (JIRA)
Drop deprecated methods/classes/interfaces -- Key: TIKA-703 URL: https://issues.apache.org/jira/browse/TIKA-703 Project: Tika Issue Type: Improvement Reporter: Jukka Zitting Pri

[jira] [Resolved] (TIKA-701) Fix problems with TemporaryFiles

2011-09-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-701. Resolution: Fixed Assignee: Jukka Zitting Fixed in a series of recent commits. To summarize, I

[jira] [Commented] (TIKA-701) Fix problems with TemporaryFiles

2011-09-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095371#comment-13095371 ] Jukka Zitting commented on TIKA-701: The idea behind that logic is that if the stream we

[jira] [Resolved] (TIKA-687) Temporary file not removed after detection

2011-09-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-687. Resolution: Duplicate Assignee: Jukka Zitting Right, sorry for overlooking this issue! The prop

[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

2011-09-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095904#comment-13095904 ] Jukka Zitting commented on TIKA-683: +1, I'm eager to see us drop the javax.swing depend

[jira] [Resolved] (TIKA-207) MS word doc containing tracked changes produces incorrect text

2011-09-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-207. Resolution: Fixed Fix Version/s: 1.0 Assignee: Jukka Zitting Thanks, Curt! Patch comm

[jira] [Resolved] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed

2011-09-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-704. Resolution: Fixed Fix Version/s: 1.0 Assignee: Jukka Zitting Thanks for bringing this

[jira] [Resolved] (TIKA-702) Cannot compile Tika with Java 7 (ImageMetadataExtractor.java)

2011-09-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-702. Resolution: Fixed Assignee: Jukka Zitting Fixed in revision 1164617 by no longer using the com

[jira] [Resolved] (TIKA-698) "Invalid UTF-16 surrogate detected:" parsing PowerPoint 97-2003

2011-09-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-698. Resolution: Fixed Assignee: Jukka Zitting Thanks for reporting this! Fixed in revision 1164655.

[jira] [Commented] (TIKA-612) Specify PDFBox options via ParseContext

2011-09-02 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096266#comment-13096266 ] Jukka Zitting commented on TIKA-612: +1 looks good to me. A possible design improvement

[jira] [Commented] (TIKA-698) "Invalid UTF-16 surrogate detected:" parsing PowerPoint 97-2003

2011-09-05 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097160#comment-13097160 ] Jukka Zitting commented on TIKA-698: bq. Hmm, I think we should replace invalid chars wi

[jira] [Commented] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed

2011-09-05 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097212#comment-13097212 ] Jukka Zitting commented on TIKA-704: See also revisions 1165230 and 1165259 for followup

[jira] [Created] (TIKA-710) Make the Tika facade implement the Parser and Detector interfaces

2011-09-09 Thread Jukka Zitting (JIRA)
Make the Tika facade implement the Parser and Detector interfaces - Key: TIKA-710 URL: https://issues.apache.org/jira/browse/TIKA-710 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-710) Expose the Parser and Detector instances within the Tika facade

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-710: --- Summary: Expose the Parser and Detector instances within the Tika facade (was: Make the Tika facade im

[jira] [Commented] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101077#comment-13101077 ] Jukka Zitting commented on TIKA-704: Thanks! I added the test cases in revision 1167052.

[jira] [Resolved] (TIKA-710) Expose the Parser and Detector instances within the Tika facade

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-710. Resolution: Fixed Done in revision 1167051. > Expose the Parser and Detector instances within the Ti

[jira] [Commented] (TIKA-704) PDF and Outlook docs embedded in MS Word documents not parsed

2011-09-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101087#comment-13101087 ] Jukka Zitting commented on TIKA-704: Hmm, there was still a hidden copy of the Yamaha ma

[jira] [Updated] (TIKA-594) Upgrade Tika to pdfbox 1.6.0

2011-09-16 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-594: --- Description: Secured PDFs cause Tika to throw exceptions. Upgrading to pdfbox 1.4.0 (latest release) a

[jira] [Updated] (TIKA-605) Tika GDAL parser

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-605: --- Fix Version/s: (was: 0.10) > Tika GDAL parser > > > Key: TIKA-605

[jira] [Resolved] (TIKA-692) TikaCLI -x or -h on a Word doc sometimes adds newline after tag

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-692. Resolution: Fixed I committed Michael's patch in revision 1171929, so I guess we can resolve this as

[jira] [Updated] (TIKA-703) Drop deprecated methods/classes/interfaces

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-703: --- Fix Version/s: (was: 0.10) 1.0 > Drop deprecated methods/classes/interfaces > --

[jira] [Commented] (TIKA-552) Further improvements to Word .doc and .docx parsing

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107068#comment-13107068 ] Jukka Zitting commented on TIKA-552: Is there anything more we need to do here? If not,

[jira] [Resolved] (TIKA-691) java.lang.ArrayIndexOutOfBoundsException by MS Word CDF V2 Document

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-691. Resolution: Duplicate Fix Version/s: (was: 0.10) This got fixed along with the POI upgrade

[jira] [Updated] (TIKA-705) Valid OOXML PPT file hits InvalidFormatException thrown in POI

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-705: --- Fix Version/s: (was: 0.10) Removing from the 0.10 roadmap, let's set the fix version to the next re

[jira] [Updated] (TIKA-711) Word parser doesn't extract optional hyphen correctly

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-711: --- Fix Version/s: (was: 0.10) > Word parser doesn't extract optional hyphen correctly > --

[jira] [Updated] (TIKA-712) Master slide text isn't extracted

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-712: --- Fix Version/s: (was: 0.10) > Master slide text isn't extracted > -

[jira] [Updated] (TIKA-714) Word art isn't extracted for various doc types

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-714: --- Fix Version/s: (was: 0.10) > Word art isn't extracted for various doc types > -

[jira] [Resolved] (TIKA-598) Update HDF parser and NetCDF parser to emit minimal XHTML

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-598. Resolution: Fixed Done in revision 1171936. > Update HDF parser and NetCDF parser to emit minimal XH

[jira] [Resolved] (TIKA-603) Tika 0.9 compiles fine but failed a unit test

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-603. Resolution: Invalid Fix Version/s: (was: 0.10) Assignee: Jukka Zitting I checked

[jira] [Updated] (TIKA-676) Boilerpipe fails

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-676: --- Fix Version/s: (was: 0.10) Removed the 0.10 target version. We'll ship the fix once it's available.

[jira] [Resolved] (TIKA-688) Enhance content-type detector to recognize almost plain text

2011-09-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-688. Resolution: Fixed Assignee: Jukka Zitting Implemented in revision 1171952 by allowing text cont

[jira] [Commented] (TIKA-719) Concurrent usage of HtmlParser causes infinite loop in HashMap

2011-09-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108023#comment-13108023 ] Jukka Zitting commented on TIKA-719: Looks like a duplicate of TIKA-599. > Concurrent u

[jira] [Resolved] (TIKA-725) Empty title element makes Tika-generated HTML documents not open in Chromium

2011-09-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-725. Resolution: Fixed Fix Version/s: 0.10 Assignee: Jukka Zitting Hmm, good point. The XM

[jira] [Resolved] (TIKA-719) Concurrent usage of HtmlParser causes infinite loop in HashMap

2011-09-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-719. Resolution: Duplicate Resolving as a duplicate. > Concurrent usage of HtmlParser causes infinite loo

[jira] [Commented] (TIKA-640) RFC822Parser should configure Mime4j not to fail reading mails containing more than 1000 chars in one headers text (even if folded)

2011-09-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109066#comment-13109066 ] Jukka Zitting commented on TIKA-640: Note that along with TIKA-716 and Mime4J version 0.

[jira] [Resolved] (TIKA-716) Upgrade apache-Mime4J to Version 0.7

2011-09-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-716. Resolution: Fixed Fix Version/s: 0.10 Assignee: Jukka Zitting Done in revision 117342

[jira] [Updated] (TIKA-713) Tika can not parse all of the persian pdf files

2011-09-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-713: --- Fix Version/s: (was: 0.9) > Tika can not parse all of the persian pdf files > -

[jira] [Resolved] (TIKA-709) Tika network server does not print anything in response to, for example, Word documents

2011-09-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-709. Resolution: Fixed Fix Version/s: 0.10 Assignee: Jukka Zitting Good catch, thanks! Thi

[jira] [Commented] (TIKA-727) Improve the outputed XHTML by HSLFExtractor

2011-09-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112531#comment-13112531 ] Jukka Zitting commented on TIKA-727: .bq   Note that the XML serializer will automatica

[jira] [Issue Comment Edited] (TIKA-727) Improve the outputed XHTML by HSLFExtractor

2011-09-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112531#comment-13112531 ] Jukka Zitting edited comment on TIKA-727 at 9/22/11 1:11 PM: - bq

[jira] [Resolved] (TIKA-508) HtmlParser link processing should skip usemap and codebase attributes

2011-09-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-508. Resolution: Fixed I removed codebase and the related data and classid attributes from the URL_ATTRIB

[jira] [Resolved] (TIKA-552) Further improvements to Word .doc and .docx parsing

2011-09-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-552. Resolution: Fixed Resolving as fixed. Let's use followup issues with tighter scopes for further impr

[jira] [Commented] (TIKA-241) Rar archive support

2011-09-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113266#comment-13113266 ] Jukka Zitting commented on TIKA-241: bq. The changes are done now. Cool! bq. Edmund wa

[jira] [Commented] (TIKA-508) HtmlParser link processing should skip usemap and codebase attributes

2011-09-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113799#comment-13113799 ] Jukka Zitting commented on TIKA-508: To fix a failing test case I actually did end up im

[jira] [Updated] (TIKA-648) Parsing HTML anchors with embedded div faulty

2011-09-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-648: --- Fix Version/s: (was: 0.10) Yep, this probably needs to be addressed in one way or another within Ta

[jira] [Created] (TIKA-732) Upgrade to Commons Codec 1.5

2011-09-26 Thread Jukka Zitting (JIRA)
Upgrade to Commons Codec 1.5 Key: TIKA-732 URL: https://issues.apache.org/jira/browse/TIKA-732 Project: Tika Issue Type: Improvement Components: parser Reporter: Jukka Zitting As

[jira] [Resolved] (TIKA-732) Upgrade to Commons Codec 1.5

2011-09-26 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-732. Resolution: Fixed Done in revision 1175915. > Upgrade to Commons Codec 1.5 > ---

[jira] [Resolved] (TIKA-867) UTF-8 encoding does not work on windows

2012-05-18 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-867. Resolution: Not A Problem The rationale why we did TIKA-324 is that the default platform encoding as

[jira] [Moved] (TIKA-931) Tika's PDFParser fails to parse documents embedded in a PDF Package

2012-05-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting moved PDFBOX-1303 to TIKA-931: Component/s: (was: Text extraction) parser Fix Ve

[jira] [Updated] (TIKA-931) Tika's PDFParser fails to parse documents embedded in a PDF Package

2012-05-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-931: --- Fix Version/s: 1.2 Assignee: Jukka Zitting I copied the changes to Tika in revision 1341463.

[jira] [Created] (TIKA-932) Upgrade to Commons Compress 1.4.1

2012-05-23 Thread Jukka Zitting (JIRA)
Jukka Zitting created TIKA-932: -- Summary: Upgrade to Commons Compress 1.4.1 Key: TIKA-932 URL: https://issues.apache.org/jira/browse/TIKA-932 Project: Tika Issue Type: Improvement Comp

[jira] [Commented] (TIKA-932) Upgrade to Commons Compress 1.4.1

2012-05-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281656#comment-13281656 ] Jukka Zitting commented on TIKA-932: Indeed, good point! As you say, upgrading to 1.4(.1

[jira] [Created] (TIKA-942) HTTP Accept header evaluator

2012-06-24 Thread Jukka Zitting (JIRA)
Jukka Zitting created TIKA-942: -- Summary: HTTP Accept header evaluator Key: TIKA-942 URL: https://issues.apache.org/jira/browse/TIKA-942 Project: Tika Issue Type: New Feature Component

[jira] [Resolved] (TIKA-932) Upgrade to Commons Compress 1.4.1

2012-06-29 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-932. Resolution: Fixed Done in revisions 1355521 and 1355562. In addition to simply upgrading the depende

[jira] [Resolved] (TIKA-941) Detecting KML / KMZ files

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-941. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting Thanks! I committed your

[jira] [Commented] (TIKA-811) Upgrade metadatExtractor version for OpenJDK 7 support

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404441#comment-13404441 ] Jukka Zitting commented on TIKA-811: See http://www.sonatype.com/people/2009/02/why-put

[jira] [Resolved] (TIKA-929) Consistent, namespaced definitions for office file related metadata

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-929. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting Nice work, thanks! Patch

[jira] [Resolved] (TIKA-943) Add parameter to tika-app to supply password for decryption

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-943. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting Good idea! Done in revis

[jira] [Resolved] (TIKA-937) RFC822Parser is extracting only the first destination address

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-937. Resolution: Not A Problem Assignee: Jukka Zitting No, the parser explicitly adds all addresses

[jira] [Commented] (TIKA-811) Upgrade metadatExtractor version for OpenJDK 7 support

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404475#comment-13404475 ] Jukka Zitting commented on TIKA-811: Looking at TIKA-915 it appears that the process of

[jira] [Resolved] (TIKA-934) Tika in server mode stops responding and reports NPE over and over in logs

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-934. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting Thanks for the report! T

[jira] [Resolved] (TIKA-876) Signed pdf parsing

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-876. Resolution: Fixed Fix Version/s: (was: 1.0) 1.2 Assignee: Jukka

[jira] [Resolved] (TIKA-871) Text in nested groups within a pptx not parsed

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-871. Resolution: Duplicate OK, thanks for testing! Resolving this as a duplicate of some other change we

[jira] [Resolved] (TIKA-908) Adding XMP specification part one namespaces and properties

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-908. Resolution: Fixed Assignee: Jukka Zitting Thanks! Patch committed in revision 1355732. I also s

[jira] [Resolved] (TIKA-900) Tika fails to detect ISO9660 disk images

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-900. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting Thanks! I committed the

[jira] [Resolved] (TIKA-747) Ogg Vorbis and FLAC Parsers

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-747. Resolution: Fixed I fixed the tika-app issue in revision 1355741 by switching to the shade plugin fo

[jira] [Resolved] (TIKA-810) Upgrade to PDFbox 1.7.0 as available

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-810. Resolution: Fixed Fix Version/s: 1.2 Assignee: Jukka Zitting Upgraded to PDFBox 1.7.0

[jira] [Resolved] (TIKA-686) Split tika-parsers into separate components

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-686. Resolution: Won't Fix Resolving as Won't Fix as there's no clear consensus on how to proceed. Let's

[jira] [Resolved] (TIKA-758) Address TODOs when we upgrade to next PDFBox release

2012-06-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-758. Resolution: Incomplete Fix Version/s: (was: 1.2) Resolving as Incomplete. What are the TOD

  1   2   3   4   5   >