[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387863#comment-14387863 ] Hudson commented on TIKA-1423: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #589 (See [https://b

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387864#comment-14387864 ] Hudson commented on TIKA-1330: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #589 (See [https://b

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387865#comment-14387865 ] Hudson commented on TIKA-1511: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #589 (See [https://b

[jira] [Commented] (TIKA-1585) Create Example Website with Form Submission

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387826#comment-14387826 ] Tim Allison commented on TIKA-1585: --- Done. Let me know if it works before we shutdown 99

[jira] [Created] (TIKA-1588) Upgrade to PDFBox 1.8.10 when available

2015-03-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1588: - Summary: Upgrade to PDFBox 1.8.10 when available Key: TIKA-1588 URL: https://issues.apache.org/jira/browse/TIKA-1588 Project: Tika Issue Type: Improvement

Re: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-30 Thread Tyler Palsulich
I just remembered TIKA-1509 and TIKA-1558 -- testing now for blacklist functionality through TIKA-1509. If that works, I'll back out TIKA-1558. Tim, I think you should run govdocs from the RC, in case something changes between your run and the cut. Tyler On Mon, Mar 30, 2015 at 10:17 AM, Allison

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387394#comment-14387394 ] Hudson commented on TIKA-1330: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #588 (See [https://b

Re: including refactored docs from govdocs1 in test suite

2015-03-30 Thread Konstantin Gribov
At least, parser should not hang on processing corrupted document. IMHO, cases with hanging parser code should be considered blocker issue. Personally I prefer variant with partial result and some meta which says that document parsing failed somehow. But it can be hard to do. -- Best regards, Ko

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386963#comment-14386963 ] Hudson commented on TIKA-1581: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #587 (See [https://b

Re: Broken build because of clirr plugin

2015-03-30 Thread Konstantin Gribov
Thanks for your help in resolving this issue, Tim. Commited in r1670125, jenkins build succeed. -- Best regards, Konstantin Gribov пн, 30 марта 2015 г. в 18:56, Allison, Timothy B. : +1. Thank you, Konstantin! > > -Original Message- > From: Konstantin Gribov [mailto:gros...@gmail.com]

[jira] [Resolved] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov resolved TIKA-1587. - Resolution: Fixed > ForkParser::setJavaCommand should take List > -

[jira] [Commented] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386922#comment-14386922 ] Hudson commented on TIKA-1587: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #586 (See [https://b

[jira] [Comment Edited] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386906#comment-14386906 ] Tyler Palsulich edited comment on TIKA-1584 at 3/30/15 4:05 PM: -

[jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386906#comment-14386906 ] Tyler Palsulich commented on TIKA-1584: --- Yup! The 1.7 release process should start th

[jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386903#comment-14386903 ] Tim Allison commented on TIKA-1584: --- Community voted to cut release candidate from trunk

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-30 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386900#comment-14386900 ] Hong-Thai Nguyen commented on TIKA-1581: And great thank to [~kkrugler] with many i

[jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Rob Tulloh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386899#comment-14386899 ] Rob Tulloh commented on TIKA-1584: -- Thanks for the quick turn around to fixing this. Expec

RE: Broken build because of clirr plugin

2015-03-30 Thread Allison, Timothy B.
+1. Thank you, Konstantin! -Original Message- From: Konstantin Gribov [mailto:gros...@gmail.com] Sent: Monday, March 30, 2015 11:19 AM To: dev@tika.apache.org Subject: Re: Broken build because of clirr plugin I think, simple way would be to keep old methods (and mark them @Deprecated) t

Re: Broken build because of clirr plugin

2015-03-30 Thread Konstantin Gribov
I think, simple way would be to keep old methods (and mark them @Deprecated) to avoid build failure. And use new ones internally. I'll do `mvn verify` before commiting this time. Sorry for inconvenience. -- Best regards, Konstantin Gribov пн, 30 марта 2015 г. в 18:09, Allison, Timothy B. : > H

[jira] [Reopened] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov reopened TIKA-1587: - > ForkParser::setJavaCommand should take List > ---

RE: Broken build because of clirr plugin

2015-03-30 Thread Allison, Timothy B.
How much of an effort would it be to migrate somewhat slowly: Leave in but deprecate setCommandLine(String ) and String getCommandLine() Add something like: setCommandLineArr(String[] ) and String[] getCommandLineArr()? -Original Message- From: Konstantin Gribov [mailto:gros...@gmail.

Broken build because of clirr plugin

2015-03-30 Thread Konstantin Gribov
Hi, folks. I've broken build (by commit r1670105 for TIKA-1587). Should I revert this commit and change it to preserve old API or add exclude to clirr plugin configuration? -- Best regards, Konstantin Gribov

RE: [jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Allison, Timothy B.
Backwards compatibility issue found by clirr on TIKA-1587 [INFO] --- clirr-maven-plugin:2.3:check (default) @ tika-core --- [ERROR] org.apache.tika.fork.ForkParser: Return type of method 'public java.lang.String getJavaCommand()' has been changed to java.util.List [ERROR] org.apache.tika.fork.Fo

[jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386765#comment-14386765 ] Hudson commented on TIKA-1584: -- FAILURE: Integrated in tika-trunk-jdk1.7 #585 (See [https://b

tika-trunk-jdk1.7 - Build # 585 - Failure

2015-03-30 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-trunk-jdk1.7 (build #585) Status: Failure Check console output at https://builds.apache.org/job/tika-trunk-jdk1.7/585/ to view the results.

[jira] [Resolved] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov resolved TIKA-1587. - Resolution: Fixed Fix Version/s: 1.8 Assignee: Konstantin Gribov Sorry, I f

[jira] [Commented] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386746#comment-14386746 ] Konstantin Gribov commented on TIKA-1587: - LGTM. Commited with integration test tri

RE: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-30 Thread Allison, Timothy B.
All, I've made the changes that I had hoped to. Grib pdf exclusion remains for any takers. Let me know when I should initiate the run against govdocs1 to see if there are any surprises on that corpus with Tika 1.8. Best, Tim -Original Message- From: Allison, Timothy B. [

[jira] [Assigned] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov reassigned TIKA-1587: --- Assignee: (was: Konstantin Gribov) > ForkParser::setJavaCommand should take List >

[jira] [Updated] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Oleg Oshmyan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Oshmyan updated TIKA-1587: --- Attachment: TIKA-1587.patch Here’s a patch that changes the existing getter and setter signatures. Is t

[jira] [Assigned] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov reassigned TIKA-1587: --- Assignee: Konstantin Gribov > ForkParser::setJavaCommand should take List > --

[jira] [Resolved] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1584. --- Resolution: Fixed Fix Version/s: 1.8 r1670095. Thank you, [~rtulloh], for raising this issue!

[jira] [Comment Edited] (TIKA-1512) WordParser fails on many Word files

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386697#comment-14386697 ] Tim Allison edited comment on TIKA-1512 at 3/30/15 1:54 PM: Tem

RE: including refactored docs from govdocs1 in test suite

2015-03-30 Thread Allison, Timothy B.
I think this is an open question within Tika. Some parsers prefer one thing over another. And there are different levels of corruption. In the two cases where govdocs1 docs might be useful in tests, the hyperlinks in .doc files do not appear to be "standard", but MSWord opens them without a

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386720#comment-14386720 ] Hudson commented on TIKA-1512: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #584 (See [https://b

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-03-30 Thread Yauheni Salopiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386716#comment-14386716 ] Yauheni Salopiy commented on TIKA-1512: --- Hi [~talli...@mitre.org], Thank You very mu

RE: including refactored docs from govdocs1 in test suite

2015-03-30 Thread Tyler Palsulich
Ah. I see. In general, what is the goal with handling corrupted files? Extract as much as possible and fail gracefully? Tyler On Mar 30, 2015 9:32 AM, "Allison, Timothy B." wrote: > > Unfortunately, no. MSOffice fixes the document when I do that. > > -Original Message- > From: Tyler Pa

RE: including refactored docs from govdocs1 in test suite

2015-03-30 Thread Allison, Timothy B.
Unfortunately, no. MSOffice fixes the document when I do that. -Original Message- From: Tyler Palsulich [mailto:tpalsul...@gmail.com] Sent: Monday, March 30, 2015 9:24 AM To: dev@tika.apache.org Subject: Re: including refactored docs from govdocs1 in test suite Can you copy the hyperlin

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386697#comment-14386697 ] Tim Allison commented on TIKA-1512: --- Temporary fix ignoring tests and excluding test docs

Re: including refactored docs from govdocs1 in test suite

2015-03-30 Thread Tyler Palsulich
Can you copy the hyperlink into a new doc and change the URL? I have no idea about including the modified version. Tyler On Mar 30, 2015 9:18 AM, "Allison, Timothy B." wrote: > All, > > As part of TIKA-1512, I found that I can delete all of the contents, > including the metadata, except for on

[jira] [Commented] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386685#comment-14386685 ] Tyler Palsulich commented on TIKA-1587: --- Thank you for reporting this! It seems like

including refactored docs from govdocs1 in test suite

2015-03-30 Thread Allison, Timothy B.
All, As part of TIKA-1512, I found that I can delete all of the contents, including the metadata, except for one hyperlink in two documents from govdocs1 and still get the proper behavior -- fail before fix, work after fix. These documents are in the public domain. Is it ok to include th

[jira] [Created] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Oleg Oshmyan (JIRA)
Oleg Oshmyan created TIKA-1587: -- Summary: ForkParser::setJavaCommand should take List Key: TIKA-1587 URL: https://issues.apache.org/jira/browse/TIKA-1587 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1587) ForkParser::setJavaCommand should take List

2015-03-30 Thread Oleg Oshmyan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleg Oshmyan updated TIKA-1587: --- Description: ForkParser::setJavaCommand currently takes a string and splits it on whitespace. This make

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-03-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386648#comment-14386648 ] Hudson commented on TIKA-1511: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #583 (See [https://b

[jira] [Resolved] (TIKA-1511) Create a parser for SQLite3

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1511. --- Resolution: Fixed r1670069. Removed "provided" in parsers' pom. Happy to revisit this if there are s

[jira] [Comment Edited] (TIKA-944) Extend tika-server API to be consistent with tika-app CLI

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343665#comment-14343665 ] Tim Allison edited comment on TIKA-944 at 3/30/15 11:41 AM: Some

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-03-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386562#comment-14386562 ] Tim Allison commented on TIKA-1511: --- Thank you, [~thetaphi]. I was aware of about half o

RE: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-30 Thread Allison, Timothy B.
Unless there are objections, I'd like these to be resolved before 1.8: TIKA-1584 -- I'll fix TIKA-1575 -- Resolved by Konstantin Gribov (thank you!) TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs, but I'll leave this open and do some more digging to see if we need to open a