[
https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445415#comment-13445415
]
Markus Jelsma commented on TIKA-980:
No, the Any23 parser is DOM-based and the Microdata
[
https://issues.apache.org/jira/browse/TIKA-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445351#comment-13445351
]
Ken Krugler commented on TIKA-539:
--
I'm looking at refactoring the detector code in Tika, t
On Aug 29, 2012, at 8:55am, chraj007 wrote:
> Hello,
> Im trying to parse a file whose content type is UTF-16. Im unable to
> parse the document using the following code. Please Help me.
>
> ContentHandler textHandler = new BodyContentHandler();
>TeeContentHandler teeHandler
Hey Ken,
I personally don't care too much about having @author tags, or not having them,
but I know there are others more passionate (for example about NOT having them)
:)
Cheers,
Chris
On Aug 30, 2012, at 2:03 PM, Ken Krugler wrote:
> Hi all,
>
> I'm wondering if we've got any convention fo
On Aug 29, 2012, at 9:24am, Jukka Zitting wrote:
> Hi,
>
> On Wed, Aug 29, 2012 at 6:02 PM, chraj007 wrote:
>> http://lucene.472066.n3.nabble.com/file/n4004078/test.html test.html
>
> Looks like that file has an incorrect http-equiv declaration:
>
>
>
> The encoding of the file is not UT
[
https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445293#comment-13445293
]
Ken Krugler commented on TIKA-980:
--
Hi Markus - in general looks good. Did the guts of Micr
[
https://issues.apache.org/jira/browse/TIKA-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-980:
Assignee: Ken Krugler
> MicrodataContentHandler for Apache Tika
> ---
Hi all,
I'm wondering if we've got any convention for including/excluding @author tags.
I remember discussion on other Apache project lists about explicitly not
including these, but I see 19 in the trunk Tika source.
Asking because some patch code being contributed has @author tags.
Thanks,
-
Hi Jukka,
I was looking into a failure in a Bixo test, when using BodyContentHandler
(wrapped by XHTMLContentHandler).
The issue is that BodyContentHandler uses MatchingContentHandler to find only
text in nodes under the /html/body hierarchy.
And this in turn winds up not matching the element
[
https://issues.apache.org/jira/browse/TIKA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-986:
Attachment: TIKA-986.patch
Awesome, thanks Robert: new patch with test case. I think it's re
[
https://issues.apache.org/jira/browse/TIKA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated TIKA-986:
-
Attachment: smime.p7s
> NullPointerException trying to parse detached .pk7s signature
> ---
[
https://issues.apache.org/jira/browse/TIKA-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated TIKA-986:
Attachment: TIKA-986.patch
Patch, I think it's ready.
The example I have isn't shareable ...
Michael McCandless created TIKA-986:
---
Summary: NullPointerException trying to parse detached .pk7s
signature
Key: TIKA-986
URL: https://issues.apache.org/jira/browse/TIKA-986
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated TIKA-985:
---
Attachment: TIKA-985-1.3-2.patch
Here's a new patch listing all HTML5 elements that are missing in the
[
https://issues.apache.org/jira/browse/TIKA-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated TIKA-985:
---
Attachment: TIKA-985-1.3-1.patch
Here's a preliminary patch for 1.3. It adds some HTML5 elements to Tag
Markus Jelsma created TIKA-985:
--
Summary: Support for HTML5 elements
Key: TIKA-985
URL: https://issues.apache.org/jira/browse/TIKA-985
Project: Tika
Issue Type: Improvement
Components:
16 matches
Mail list logo