Re: [DISCUSS] Enable specific ContentHandler for tika-server

2017-10-24 Thread Chris Mattmann
This makes sense to me, +1 Giuseppe! On 10/24/17, 6:12 PM, "Giuseppe Totaro" wrote: Hi folks, I am developing the proposed solutions within tika-server for enabling specific ContentHandlers. Basically, I am working to provide the ability of giving the name of the ContentHa

Re: [DISCUSS] Enable specific ContentHandler for tika-server

2017-10-24 Thread Giuseppe Totaro
Hi folks, I am developing the proposed solutions within tika-server for enabling specific ContentHandlers. Basically, I am working to provide the ability of giving the name of the ContentHandler to be used by either command-line or HTTP header. In order to complete my work, I would like to get you

Re: Tika 2 parsers

2017-10-24 Thread Sergey Beryozkin
I did try the modules in the earlier version of the CXF demo, see the right panel, https://github.com/apache/cxf/commit/c2ccecb23ba23497c95be89f9b37f38c69faba7a#diff-b5ed531ebf92978dcbcf1ac6cc6331c0 They should be available in the snapshot repo Cheers, Sergey On 24/10/17 19:45, Allison, Timoth

[jira] [Resolved] (TIKA-1788) message/rfc822 parser doesn't identify attachment filenames from Content-Disposition header

2017-10-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1788. --- Resolution: Fixed Fix Version/s: 1.17 Thank you, AarjavP! > message/rfc822 parser doesn't ident

[jira] [Comment Edited] (TIKA-2478) MBOX import includes redundant copies of the text

2017-10-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217509#comment-16217509 ] Tim Allison edited comment on TIKA-2478 at 10/24/17 7:30 PM: - F

[jira] [Updated] (TIKA-2478) MBOX import includes redundant copies of the text

2017-10-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2478: -- Attachment: TIKA-2478.patch First patch. This incorporates the test file from TIKA-2471 and [~kkrugler]'

RE: Tika 2 parsers

2017-10-24 Thread Allison, Timothy B.
We'll switch master over to the 2.0 layout after our next release, which should happen shortly after the release of PDFBox 2.0.8...roughly in the next week for PDFBox, next month for Tika. We have abandoned keeping the current 2.x up to date, and I was hoping there would at least be a build her

Tika 2 parsers

2017-10-24 Thread Gethin James
Hi, I am interested in trying the more modular approach of using the Tika 2 parsers. Are the Tika 2 artifacts available in a maven repo somewhere? Is the any documentation on how to use them or how they differ from Tika 1? Thanks, Gethin.