[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822055#comment-16822055 ] Jukka Zitting commented on TIKA-2849: - SGTM! Alternatively you could overload

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822038#comment-16822038 ] Jukka Zitting commented on TIKA-2849: - Exactly (sorry, s/n/maxBytesToSpool/).

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821995#comment-16821995 ] Jukka Zitting commented on TIKA-2849: - How about something like this: {code:

[jira] [Commented] (TIKA-2849) TikaInputStream copies the input stream locally

2019-04-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813593#comment-16813593 ] Jukka Zitting commented on TIKA-2849: - There's a related TODO in {{detect

Re: xmpcore in Maven Central?

2016-07-15 Thread Jukka Zitting
Searching for a groupId in https://issues.sonatype.org/projects/OSSRH can also help in other similar cases. Best, Jukka Zitting

[jira] [Commented] (TIKA-2001) Parsing XML outputs empty string

2016-06-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322710#comment-15322710 ] Jukka Zitting commented on TIKA-2001: - By default Tika only extracts the text bet

New moderators needed

2016-01-15 Thread Jukka Zitting
tor, so it would be good to have one or two new volunteers. See http://apache.org/dev/committers.html#mailing-list-moderators for more details. Best, Jukka Zitting

[jira] [Commented] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725703#comment-14725703 ] Jukka Zitting commented on TIKA-1726: - Could createTemporaryFile() be accompanied

[jira] [Commented] (TIKA-1672) Integrate tika-java7 component

2015-08-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722701#comment-14722701 ] Jukka Zitting commented on TIKA-1672: - I'm actually not sure if we should do

[jira] [Resolved] (TIKA-1719) Utilize try-with-resources where it is trivial

2015-08-30 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1719. - Resolution: Fixed Assignee: Jukka Zitting Committed in revision 1700195. Thanks a lot for

[jira] [Resolved] (TIKA-1720) Collect multiple exceptions in TemporaryResources.close() using Throwable.addSuppressed()

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1720. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Committed in revision 1698150. > Coll

[jira] [Resolved] (TIKA-1721) Replace IOExceptionWithCause in ForkClient

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1721. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Committed in 1698101. PS. Note that we

[jira] [Resolved] (TIKA-1722) Tika methods that accept a File needlessly convert it to a URL

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1722. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Committed in revision 1698100. My

[jira] [Commented] (TIKA-1706) Bring back commons-io to tika-core

2015-08-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716318#comment-14716318 ] Jukka Zitting commented on TIKA-1706: - Note that o.a.tika.io is a part of the pu

Re: comparing Tika's file detect with other tools?

2015-04-22 Thread Jukka Zitting
Hi, Copyright also covers databases, so we'll need to honor the license terms equally when copying file's code or detection patterns. Luckily file (from http://www.darwinsys.com/file/) comes under a BSD license, so reusing the code or data is quite simple from a licensing perspective. In fact we'v

Re: Multiple parsers for the same MIME type

2015-01-02 Thread Jukka Zitting
tection to the parsing phase [2]. [1] https://tika.apache.org/1.6/api/org/apache/tika/io/TikaInputStream.html#getOpenContainer() [2] https://github.com/apache/tika/blob/1.6/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java#L385 BR, Jukka Zitting

Re: Multiple parsers for the same MIME type

2015-01-02 Thread Jukka Zitting
sm to correctly detect such files. To avoid the extra work, you could simply mark your new parser as being able to handle all files of the more generic type, and then in your parser include a fallback option to call the original Tika parser when encountering a file the new parser can't handle. BR, Jukka Zitting

Re: tika-dotnet Module

2014-07-28 Thread Jukka Zitting
solution. It would be great if someone wanted to carry on with further improvements. -- Jukka Zitting

Re: tika-trunk-jdk1.7 - Build # 37 - Failure

2014-06-10 Thread Jukka Zitting
tories/snapshots/org/apache/tika/tika-parsers/1.6-SNAPSHOT/maven-metadata.xml. Return code is: 503, ReasonPhrase:Service Temporarily Unavailable. -> [Help 1] Looks like a temporary problem with repository.apache.org. BR, Jukka Zitting

[jira] [Commented] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009666#comment-14009666 ] Jukka Zitting commented on TIKA-1294: - +1 to making this configurable and of

[jira] [Updated] (TIKA-1287) Update NetCDF .jar file on Maven Central

2014-05-06 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-1287: Issue Type: Improvement (was: Bug) > Update NetCDF .jar file on Maven Cent

[jira] [Commented] (TIKA-1283) Add "thumbnail" as possible metadata item to TikaCoreProperties

2014-04-28 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983167#comment-13983167 ] Jukka Zitting commented on TIKA-1283: - I'm not sure if it's a good i

[jira] [Resolved] (TIKA-1277) Magic bytes from Wikipedia

2014-04-23 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1277. - Resolution: Fixed Fix Version/s: 1.6 The list in Wikipedia is in fact quite incomplete and

[jira] [Created] (TIKA-1277) Magic bytes from Wikipedia

2014-04-23 Thread Jukka Zitting (JIRA)
Jukka Zitting created TIKA-1277: --- Summary: Magic bytes from Wikipedia Key: TIKA-1277 URL: https://issues.apache.org/jira/browse/TIKA-1277 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-936) encoding of ZipArchiveInputStream

2014-04-18 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-936: --- Description: When extracting from the zip files which are zipped at Windows OS(Japanese), the file

[jira] [Resolved] (TIKA-1268) Extract images from PDF documents

2014-04-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1268. - Resolution: Fixed Fix Version/s: 1.6 Assignee: Jukka Zitting Implemented in

[jira] [Created] (TIKA-1268) Extract images from PDF documents

2014-04-09 Thread Jukka Zitting (JIRA)
Jukka Zitting created TIKA-1268: --- Summary: Extract images from PDF documents Key: TIKA-1268 URL: https://issues.apache.org/jira/browse/TIKA-1268 Project: Tika Issue Type: New Feature

Re: PDF parser (two more questions)

2014-03-28 Thread Jukka Zitting
the WriteOutContentHandler class. However, the only way for the WriteOutContentHandler to signal that parsing should be stopped is by throwing a SAXException, which is what we're doing here. By catching the exception and inspecting it with isWriteLimitReached() the client can determine whether this is what happened. BR, Jukka Zitting

Re: PDF parser (two more questions)

2014-03-27 Thread Jukka Zitting
Handler(out), ...); } catch (SAXException e) { if (!out.isWriteLimitReached(e)) { throw e; } } String content = out.toString(); BR, Jukka Zitting

Re: Parser.parse with file instead of stream

2014-03-27 Thread Jukka Zitting
e); try { parser.parse(stream, ...); } finally { stream.close(); } > What do you think about extending the Parse interface accordingly? See https://issues.apache.org/jira/browse/TIKA-153 (and the TikaInputStream javadocs) for details on how we already achieve this functionality. BR, Jukka Zitting

[jira] [Updated] (TIKA-1255) WordExtractor - bold hyperlink not closed properly

2014-03-25 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-1255: Fix Version/s: (was: 1.5) (was: 1.4) (was: 1.3

[jira] [Resolved] (TIKA-1261) Commons Compress version should be 1.5

2014-03-25 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1261. - Resolution: Fixed Assignee: Jukka Zitting Fixed in revision 1581402. > Commons Compr

[jira] [Commented] (TIKA-1262) parseToString fails to detect content-type / charset

2014-03-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941336#comment-13941336 ] Jukka Zitting commented on TIKA-1262: - The {{CharsetDetector}} class detects

[jira] [Resolved] (TIKA-1260) Detection result for zero-byte files is text/plain

2014-03-17 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1260. - Resolution: Not A Problem Fix Version/s: (was: 1.5) What you're seeing is the r

Re: Using guava on tika ?

2014-03-06 Thread Jukka Zitting
s a dependency in client applications. We've even gone as far as including copies of some Commons IO classes in org.apache.tika.io instead of referring to commons-io as a dependency. BR, Jukka Zitting

Re: [ANNOUNCE] Apache Tika 1.5 Released

2014-03-06 Thread Jukka Zitting
Hi, On Thu, Mar 6, 2014 at 10:14 AM, Hong-Thai Nguyen wrote: > I guess that users could maintain hotfixes basing on a released branch in > attending next release. Right, at least there's no harm in having the branch, so I just created it in revision 1574919. BR, Jukka Zitting

Re: [ANNOUNCE] Apache Tika 1.5 Released

2014-03-06 Thread Jukka Zitting
Hi, On Thu, Mar 6, 2014 at 8:27 AM, Hong-Thai Nguyen wrote: > Anyone can create branch remotes/origin/1.5 on git ? Do we need a 1.5 branch? BR, Jukka Zitting

Re: Submission to ApacheCon on Tika

2014-03-02 Thread Jukka Zitting
Hi, On Fri, Jan 31, 2014 at 10:44 AM, Jukka Zitting wrote: > OK, good! I'll adjust my submission so that it would work well as a > possible followup to your talk, and we can coordinate the details if > both get accepted. Looks like all the Tika talks got accept

Re: Submission to ApacheCon on Tika

2014-01-31 Thread Jukka Zitting
od correctly, there will be more tracks than usually, so we might have a chance to dig deeper over more than just one Tika talk. BR, Jukka Zitting

Re: Submission to ApacheCon on Tika

2014-01-31 Thread Jukka Zitting
r more Tika coverage. Do we have others coming in who're planning to present Tika? BR, Jukka Zitting

Re: Tika on Jenkins?

2014-01-29 Thread Jukka Zitting
t's gotten way too complex nowadays for part-time volunteers to manage. Unless one of us wants to step up and get their hands dirty fixing Jenkins issues, I'd probably opt to disable the Jenkins build entirely and instead use something like Travis (https://travis-ci.org/) for our CI builds. BR, Jukka Zitting

Re: [DISCUSS] Prepare Release 1.5?

2014-01-29 Thread Jukka Zitting
Hi, On Wed, Jan 29, 2014 at 12:56 PM, Sergey Beryozkin wrote: > I've updated CHANGES.txt but I can't assign to myself and in case of > TIKA-1196 I can't even resolve it I assigned you the PMC member role in the TIKA project on jira, which should give you full access to th

Re: status of jdk 1.5 support

2014-01-27 Thread Jukka Zitting
Hi, On Mon, Jan 27, 2014 at 11:20 AM, Yegor Kozlov wrote: > Does Tika support jdk 1.5? > We are discussing abandoning jdk 1.5 in future versions of POI. Will it be > a compatibility breaker for Tika? Shouldn't be an issue. Tika 1.4 switched to Java 6 as the base platform. BR, Jukka Zitting

Re: [DISCUSS] Prepare Release 1.5?

2014-01-27 Thread Jukka Zitting
Hi, On Mon, Jan 27, 2014 at 8:23 AM, Allison, Timothy B. wrote: > Should we wait for POI 3.10? That should be out within the next few weeks. No, we can cut Tika 1.6 then. BR, Jukka Zitting

[jira] [Resolved] (TIKA-1219) Add .svn to .gitignore

2014-01-14 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1219. - Resolution: Not A Problem Fix Version/s: (was: 1.5) > Add .svn to .gitign

[jira] [Commented] (TIKA-1219) Add .svn to .gitignore

2014-01-14 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870844#comment-13870844 ] Jukka Zitting commented on TIKA-1219: - There actually is a better way. :-) You

[jira] [Commented] (TIKA-1219) Add .svn to .gitignore

2014-01-14 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870826#comment-13870826 ] Jukka Zitting commented on TIKA-1219: - Why would you have a {{.svn}} director

[jira] [Commented] (TIKA-1217) Integrate with Java-7 FileTypeDetector API

2014-01-13 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869780#comment-13869780 ] Jukka Zitting commented on TIKA-1217: - Thanks! I committed the patch in revi

[jira] [Resolved] (TIKA-1214) Infinity Loop in Mpeg Stream

2014-01-13 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1214. - Resolution: Duplicate Fix Version/s: (was: 1.5) Resolving as duplicate of TIKA-1179

[jira] [Resolved] (TIKA-1215) Regression: Unable to parse a mp3 file on 1.5 which parsed successfully on 1.4

2014-01-13 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1215. - Resolution: Not A Problem You're misusing the {{ToHTMLContentHandler}} class:

[jira] [Commented] (TIKA-1218) Unable to parse a mp3 file on 1.5 getting a exception

2014-01-13 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869662#comment-13869662 ] Jukka Zitting commented on TIKA-1218: - Reproduced. It looks like the last frame

[jira] [Commented] (TIKA-1217) Integrate with Java-7 FileTypeDetector API

2014-01-09 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866722#comment-13866722 ] Jukka Zitting commented on TIKA-1217: - Nice idea! I think putting such a feature

[jira] [Resolved] (TIKA-1160) Add support for SolidWorks files

2013-12-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1160. - Resolution: Fixed Excellent, thanks! I committed the latest patch and test files in revision

[jira] [Resolved] (TIKA-1193) Allow access to HtmlParser's HtmlSchema

2013-12-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1193. - Resolution: Fixed The patch is perfect, thanks! Committed in revision 1553774. > Allow access

[jira] [Commented] (TIKA-245) Support of CHM Format

2013-12-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857520#comment-13857520 ] Jukka Zitting commented on TIKA-245: bq. tika is not able to extract contents from

[jira] [Resolved] (TIKA-1122) Tika fails to parse chm files

2013-12-27 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1122. - Resolution: Duplicate Fix Version/s: (was: 1.5) Seems like a duplicate of TIKA-1110

[jira] [Assigned] (TIKA-1193) Allow access to HtmlParser's HtmlSchema

2013-12-26 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting reassigned TIKA-1193: --- Assignee: Jukka Zitting > Allow access to HtmlParser's Ht

[jira] [Resolved] (TIKA-1210) Address tika-parsers o.a.t.mime.TestMimeTypes TODO: Need a test flash file

2013-12-26 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1210. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Patch and test files committed in

[jira] [Resolved] (TIKA-1152) Process loops infinitely on parsing of a CHM file

2013-12-26 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1152. - Resolution: Fixed Assignee: Jukka Zitting Thanks! Patch committed in revision 1553621

[jira] [Resolved] (TIKA-1213) Parsing (extracting content) a single 5Mb pdf file takes 3minutes

2013-12-26 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1213. - Resolution: Not A Problem Resolving as Not A Problem based on the discussion in PDFBOX-1821

[jira] [Resolved] (TIKA-1110) Incorrectly declared SUPPORTED_TYPES in ChmParser.

2013-12-26 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1110. - Resolution: Fixed Assignee: Jukka Zitting Thanks, Vadim! I committed your patch (with

Re: Switch to JUnit 4.x?

2013-12-16 Thread Jukka Zitting
Hi, On Sat, Dec 14, 2013 at 6:39 PM, Ken Krugler wrote: > See https://issues.apache.org/jira/browse/TIKA-1209 > > Any objections to switching to JUnit 4.11? None from me. The patch looks good to me. BR, Jukka Zitting

[jira] [Commented] (TIKA-1193) Allow access to HtmlParser's HtmlSchema

2013-11-18 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825832#comment-13825832 ] Jukka Zitting commented on TIKA-1193: - A cleaner approach would probably be to a

[jira] [Commented] (TIKA-1190) ZipContainerDetector.detect() can spool the entire stream to a temporary file

2013-11-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811392#comment-13811392 ] Jukka Zitting commented on TIKA-1190: - bq. Isn't the right fix then to pull

[jira] [Commented] (TIKA-1190) ZipContainerDetector.detect() can spool the entire stream to a temporary file

2013-11-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811245#comment-13811245 ] Jukka Zitting commented on TIKA-1190: - bq. We need to buffer it if it's

[jira] [Created] (TIKA-1190) ZipContainerDetector.detect() can spool the entire stream to a temporary file

2013-10-31 Thread Jukka Zitting (JIRA)
Jukka Zitting created TIKA-1190: --- Summary: ZipContainerDetector.detect() can spool the entire stream to a temporary file Key: TIKA-1190 URL: https://issues.apache.org/jira/browse/TIKA-1190 Project

[jira] [Commented] (TIKA-817) (PPT/PPTX) Missing date/time in text content.

2013-10-31 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810987#comment-13810987 ] Jukka Zitting commented on TIKA-817: The tests were failing on Windows due to

Re: problem with the inputstream after calling the detect(InputStream in) method

2013-09-30 Thread Jukka Zitting
InputStream in= new BufferedInputStream(attachment.getFileInputStream()); [1] http://tika.apache.org/1.4/api/org/apache/tika/Tika.html#detect(java.io.InputStream) [2] http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#mark(int) [3] http://docs.oracle.com/javase/7/docs/api/java/io/BufferedInpu

[jira] [Updated] (TIKA-1149) Improve parser lookup performance

2013-08-05 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-1149: Attachment: 0001-TIKA-1149-Improve-parser-lookup-performance.patch See the attached patch for a

Re: [Announce] Welcome Tim Allison as Tika PM member and committer

2013-08-02 Thread Jukka Zitting
d I focus on applied R&D in text processing (with emphasis on > Lucene, recently). Sounds really cool! You're based in the Boston area, right? I'm moving there in two weeks, and would love to catch up some day if you have time. BR, Jukka Zitting

[jira] [Updated] (TIKA-1149) Improve parser lookup performance

2013-08-01 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting updated TIKA-1149: Summary: Improve parser lookup performance (was: 12% performance improvement by caching in

[jira] [Commented] (TIKA-1149) 12% performance improvement by caching in CompositeParser

2013-07-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715180#comment-13715180 ] Jukka Zitting commented on TIKA-1149: - Note that for exa

[jira] [Resolved] (TIKA-881) HtmlParser sometimes(!) throws IOException while determining Html-Encoding

2013-05-14 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-881. Resolution: Duplicate This has been fixed meanwhile with the AutoDetectReader class that the

[jira] [Commented] (TIKA-1103) Tika.parseToString(InputStream) does not output the same content as parseToString(File)

2013-04-11 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628753#comment-13628753 ] Jukka Zitting commented on TIKA-1103: - It looks like the mentioned PDF starts wi

[jira] [Commented] (TIKA-1101) XML parse error caused by org.xml.sax.SAXParseException;The entity "nbsp" was referenced, but not declared

2013-04-04 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623396#comment-13623396 ] Jukka Zitting commented on TIKA-1101: - We already have [heuristics|https://github

[jira] [Commented] (TIKA-1074) Extraction should continue if an exception is hit visiting an embedded document

2013-02-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584294#comment-13584294 ] Jukka Zitting commented on TIKA-1074: - bq. Wait, do you mean I should remove

[jira] [Commented] (TIKA-1074) Extraction should continue if an exception is hit visiting an embedded document

2013-02-22 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584229#comment-13584229 ] Jukka Zitting commented on TIKA-1074: - bq. InterruptedException is never throw

[jira] [Commented] (TIKA-1074) Extraction should continue if an exception is hit visiting an embedded document

2013-02-21 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584034#comment-13584034 ] Jukka Zitting commented on TIKA-1074: - If we get an InterruptedException, the

Re: [DISCUSS] Should Tika require Java6? (was Re: Build failed in Jenkins: Tika-trunk #977)

2013-02-08 Thread Jukka Zitting
the upgrade. BR, Jukka Zitting

[jira] [Commented] (TIKA-1080) Arabic characters under windows

2013-02-07 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573496#comment-13573496 ] Jukka Zitting commented on TIKA-1080: - If you don't provide an option like -

[jira] [Commented] (TIKA-1062) Add list detection to RTFParser

2013-01-24 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562457#comment-13562457 ] Jukka Zitting commented on TIKA-1062: - bq. coding style We've generally

Re: [ANNOUNCE] Apache Tika 1.3 Released

2013-01-22 Thread Jukka Zitting
ase file an improvement request for that? If there's popular demand, we can probably cut Tika 1.4 in near future with POI 3.9 in it. > If/when POI library upgrades are included in Tika, > are they mentioned in release notes? Yes. BR, Jukka Zitting

Re: KEYS file and dist.apache.org (Re: [VOTE] Apache Tika 1.3 Release Candidate #1)

2013-01-21 Thread Jukka Zitting
o http://www.apache.org/dist/tika/KEYS instead of using mirrors. BR, Jukka Zitting

Re: buildbot failure in ASF Buildbot on tika-trunk

2013-01-20 Thread Jukka Zitting
Hi, [cc += builds@] On Mon, Jan 21, 2013 at 9:16 AM, wrote: > http://ci.apache.org/builders/tika-trunk/builds/1020 Looks like a buildbot issue: "cp: writing `/home/buildmaster/master1/public_html/projects/tika/rat-output.xml': No space left on device" BR, Jukka Zitting

[jira] [Resolved] (TIKA-1060) Degrade gracefully when juniversalchardet not present

2013-01-20 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1060. - Resolution: Fixed Fix Version/s: 1.4 Fixed in revision 1436209. > Degr

[jira] [Created] (TIKA-1060) Degrade gracefully when juniversalchardet not present

2013-01-20 Thread Jukka Zitting (JIRA)
Jukka Zitting created TIKA-1060: --- Summary: Degrade gracefully when juniversalchardet not present Key: TIKA-1060 URL: https://issues.apache.org/jira/browse/TIKA-1060 Project: Tika Issue Type

KEYS file and dist.apache.org (Re: [VOTE] Apache Tika 1.3 Release Candidate #1)

2013-01-20 Thread Jukka Zitting
.3 is out. I can volunteer to take care of this as I've already done it for Jackrabbit. [1] http://www.apache.org/dev/release-signing.html#keyserver [2] http://www.apache.org/dev/release-publishing.html#distribution_dist BR, Jukka Zitting

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Jukka Zitting
Hi, On Sat, Jan 19, 2013 at 6:30 AM, Dave Meikle wrote: > Please vote on releasing this package as Apache Tika 1.3. [x] +1 Release this package as Apache Tika 1.3 BR, Jukka Zitting

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-17 Thread Jukka Zitting
Hi, We're planning to cut Jackrabbit 2.6 by the end of the month. It would be great if we could have Tika 1.3 out by then (say sometime next week), so we could ship it with the new Jackrabbit release. BR, Jukka Zitting

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-09 Thread Jukka Zitting
for us to release again! Re: binary compatibility; Before cutting the release it would be a good idea to update the clirr plugin configuration to use Tika 1.2 instead of 1.0 when checking for binary compatibility. Also, happy to do the Release Management for it. > Great! BR, Jukka Zitting

[jira] [Commented] (TIKA-775) Embed Capabilities

2012-12-14 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532205#comment-13532205 ] Jukka Zitting commented on TIKA-775: bq. {{ catch (InterruptedException ig

[jira] [Resolved] (TIKA-1041) Tika 1.2 universalcharset errors

2012-12-13 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1041. - Resolution: Fixed Fix Version/s: (was: 1.2) Assignee: Jukka Zitting I fixed

Re: Build failed in Jenkins: Tika-trunk #948

2012-12-02 Thread Jukka Zitting
Hi, On Sun, Dec 2, 2012 at 6:26 PM, Michael McCandless wrote: > On Sun, Dec 2, 2012 at 11:17 AM, Jukka Zitting > wrote: >> Looks like there's a subdir/foo.txt file in the tika-app directory >> that for some reason doesn't get cleaned up by the testZipWithSubdirs &

Re: Build failed in Jenkins: Tika-trunk #948

2012-12-02 Thread Jukka Zitting
> committed, but it seems to not be working now (I had kicked this one > off manually...). Does anyone know why commits are not triggering > builds anymore? There's been a number of problems with the Jenkins server over the last year or so, and I'm not sure what the current status is. builds@a.o might know more. BR, Jukka Zitting

[jira] [Resolved] (TIKA-1034) MimeTypes seems to be doing unnecessary work in the detect method

2012-11-29 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1034. - Resolution: Won't Fix See the {{Detector}} javadocs. You can pass {{null}} as the {{InputS

[jira] [Commented] (TIKA-1027) Allow null values when setting metadata

2012-11-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500330#comment-13500330 ] Jukka Zitting commented on TIKA-1027: - Hmm, good point. I'd argue that the s

[jira] [Resolved] (TIKA-1027) Allow null values when setting metadata

2012-11-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1027. - Resolution: Fixed Done in revision 1411237. > Allow null values when sett

[jira] [Reopened] (TIKA-775) Embed Capabilities

2012-11-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting reopened TIKA-775: There's a few problems with the implementation. * The ExternalEmbedderTest fails in a plain Wi

[jira] [Created] (TIKA-1027) Allow null values when setting metadata

2012-11-19 Thread Jukka Zitting (JIRA)
Jukka Zitting created TIKA-1027: --- Summary: Allow null values when setting metadata Key: TIKA-1027 URL: https://issues.apache.org/jira/browse/TIKA-1027 Project: Tika Issue Type: Improvement

[jira] [Resolved] (TIKA-1026) ServiceLoader should respect OSGi service ranking

2012-11-19 Thread Jukka Zitting (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jukka Zitting resolved TIKA-1026. - Resolution: Fixed Fix Version/s: 1.3 Done in revision 148

  1   2   3   4   5   6   7   8   >