[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144865#comment-13144865
]
Joseph Vychtrle commented on TIKA-772:
--
Funny thing Jukka, I will talk to Cedric Beust
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144862#comment-13144862
]
Jukka Zitting commented on TIKA-772:
The metacharacters you mention do sound suspicious.
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144855#comment-13144855
]
Joseph Vychtrle commented on TIKA-772:
--
Attached... I'm on linux, using UTF-8 encoding
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Vychtrle updated TIKA-772:
-
Attachment: it.html
> media type detection fails for html documents, results in text/plain inst
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144854#comment-13144854
]
Jukka Zitting commented on TIKA-772:
The test case you added prints out "text/html" for
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144853#comment-13144853
]
Joseph Vychtrle commented on TIKA-772:
--
But to be honest, it makes sense. Tika doesn't
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144851#comment-13144851
]
Joseph Vychtrle commented on TIKA-772:
--
Weird,
{noformat}
java -jar tika-app-0.10.jar -
Hi Chris,
On 4 November 2011 15:42, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:
>
> Please vote on releasing this package as Apache Tika 1.0.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
>[X] +1 Releas
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144849#comment-13144849
]
Jukka Zitting commented on TIKA-772:
The latter method makes also the .html suffix avail
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144840#comment-13144840
]
Joseph Vychtrle commented on TIKA-772:
--
Got it, if I do
{code}tika.detect(TikaInputStr
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144836#comment-13144836
]
Jukka Zitting commented on TIKA-772:
I piped the files to tika-app to prevent it from se
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144828#comment-13144828
]
Joseph Vychtrle commented on TIKA-772:
--
MimeType detector doesn't find it, name of the
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Vychtrle updated TIKA-772:
-
Attachment: tika.png
I don't know then. Take a look at my results with tika v 0.10
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting resolved TIKA-772.
Resolution: Cannot Reproduce
Assignee: Jukka Zitting
Works for me:
{code}
$ for f in *.html; d
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joseph Vychtrle updated TIKA-772:
-
Attachment: html.zip
> media type detection fails for html documents, results in text/plain ins
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144772#comment-13144772
]
Joseph Vychtrle commented on TIKA-772:
--
Hey Jukka,
I found it happened only for html
[
https://issues.apache.org/jira/browse/TIKA-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144763#comment-13144763
]
Jukka Zitting commented on TIKA-772:
Can you attach an example document that illustrates
+1
BR
Christian
_
From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov]
To: dev@tika.apache.org [mailto:dev@tika.apache.org]
Cc: u...@tika.apache.org [mailto:u...@tika.apache.org]
Sent: Fri, 04 Nov 2011 16:42:29 +0100
Subject: [VOTE] Apache Tika 1.0 release rc #1
Hi Folk
I would love to see better integration w/ dynamic languages!
I can help on the Python side. Can we simply wrap Tika's APIs using
jcc, to expose in Python? Ooh, it's already been done:
http://redmine.djity.net/projects/pythontika/wiki
Mike McCandless
http://blog.mikemccandless.com
2011/11/5 Jé
>
> I totally am. I've got some PHP skillz and Python skillz
> that I would be willing to throw into the mix here.
>
Yes, I have some basic skillz on Python, and some advanced skillz on PHP,
so I can help you!
> One other thing along these lines I've had in mind for a while:
> how cool would it b
[
https://issues.apache.org/jira/browse/TIKA-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144652#comment-13144652
]
Michael McCandless commented on TIKA-529:
-
This patch looks safe, and avoids crazy a
21 matches
Mail list logo