[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914482#comment-16914482
]
Ken Krugler commented on TIKA-1599:
---
>From TIKA-2928, an example of text that fails with
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-1599:
--
Priority: Major (was: Minor)
> Switch from TagSoup to JSoup
>
>
>
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914481#comment-16914481
]
Ken Krugler commented on TIKA-2928:
---
Hi [~Sargent_D] - thanks for trying this out! I'm g
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2928:
--
Issue Type: Improvement (was: Bug)
Priority: Minor (was: Major)
> Less than sign within tag boun
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913382#comment-16913382
]
Ken Krugler commented on TIKA-2928:
---
The issue isn't that this is "somewhat non-standard
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869004#comment-16869004
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - I finally got around to
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856107#comment-16856107
]
Ken Krugler commented on TIKA-2790:
---
[~talli...@apache.org] - I'd have to look at the co
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856052#comment-16856052
]
Ken Krugler commented on TIKA-2790:
---
Yalder processes the entire string. I thought Optim
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836738#comment-16836738
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - thanks for running the
[
https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812492#comment-16812492
]
Ken Krugler commented on TIKA-2849:
---
Hi [~boris-petrov] - two things here. First, do you
[
https://issues.apache.org/jira/browse/TIKA-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710767#comment-16710767
]
Ken Krugler commented on TIKA-2794:
---
Hi [~phallett] - it's better if you first post some
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707822#comment-16707822
]
Ken Krugler commented on TIKA-2790:
---
[~talli...@apache.org] - I've compared Yalder to Op
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707521#comment-16707521
]
Ken Krugler commented on TIKA-2790:
---
Yalder is about 2-2.5x faster than language-detecto
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707343#comment-16707343
]
Ken Krugler commented on TIKA-2790:
---
My concern with OpenNLP is that during a web crawl,
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707292#comment-16707292
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - Is there an issue with
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658028#comment-16658028
]
Ken Krugler commented on TIKA-2758:
---
[~markus17] - My comment above was about the previo
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657976#comment-16657976
]
Ken Krugler edited comment on TIKA-2758 at 10/20/18 7:51 PM:
-
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657976#comment-16657976
]
Ken Krugler commented on TIKA-2758:
---
At least for the "detroidnews.html" file, I believe
[
https://issues.apache.org/jira/browse/TIKA-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-2683.
---
Resolution: Fixed
Fixed via [PR
#243|https://github.com/apache/tika/commit/8851d511c4768a3200eafa0623
[
https://issues.apache.org/jira/browse/TIKA-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-2683:
-
Assignee: Ken Krugler
> Missing space and inappropriate new-line in Boilerpipe extracted text
> -
[
https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536396#comment-16536396
]
Ken Krugler commented on TIKA-2648:
---
[~wastl-nagel] - you mentioned that you thought thi
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2671:
--
Description:
org.apache.tika.parser.html.HtmlEncodingDetector ignores the document's
metadata. So when
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2671:
--
Component/s: detector
> HtmlEncodingDetector doesnt take provided metadata into account
> --
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516644#comment-16516644
]
Ken Krugler commented on TIKA-2671:
---
Hi [~gbouchar] - I'm curious how much testing you d
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514355#comment-16514355
]
Ken Krugler commented on TIKA-2671:
---
Unfortunately there's no great solution here. Ideal
[
https://issues.apache.org/jira/browse/TIKA-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493927#comment-16493927
]
Ken Krugler commented on TIKA-2654:
---
Hi Ankit - for problems encountered while building/
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482586#comment-16482586
]
Ken Krugler commented on TIKA-2643:
---
When you've got conflicting jars on the classpath, y
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481791#comment-16481791
]
Ken Krugler commented on TIKA-2643:
---
Looking at the crash log, I see the following duplic
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481786#comment-16481786
]
Ken Krugler commented on TIKA-2643:
---
Hi [~fyemaple] - how do you know that Tika 1.5 (or a
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479468#comment-16479468
]
Ken Krugler commented on TIKA-2643:
---
[~fyemaple] - yes, but note that {{kill -QUIT doesn
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477811#comment-16477811
]
Ken Krugler commented on TIKA-2643:
---
[~talli...@apache.org] - different versions of frame
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477513#comment-16477513
]
Ken Krugler commented on TIKA-2643:
---
If I was going to guess, it's that your Cloudera ins
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16384242#comment-16384242
]
Ken Krugler commented on TIKA-2592:
---
[~AndreasMeier] - I assume when you said:
{quote}I d
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Attachment: IANA Charset names.txt
> HTML with charset unicode handled as utf-16 instead utf-8
>
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Priority: Minor (was: Major)
> HTML with charset unicode handled as utf-16 instead utf-8
> -
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Issue Type: Improvement (was: Bug)
> HTML with charset unicode handled as utf-16 instead utf-8
> ---
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382330#comment-16382330
]
Ken Krugler commented on TIKA-2592:
---
Before making this kind of change (default "unicode"
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380874#comment-16380874
]
Ken Krugler commented on TIKA-2592:
---
Hi [~AndreasMeier] - actually "unicode" is a support
[
https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379747#comment-16379747
]
Ken Krugler commented on TIKA-2576:
---
[~talli...@mitre.org] - After some grepping, I found
[
https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377744#comment-16377744
]
Ken Krugler commented on TIKA-2576:
---
Is this going to trigger more warnings in the logs?
[
https://issues.apache.org/jira/browse/TIKA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-2539.
---
Resolution: Duplicate
> TagSoup HTML parser is project EOL
> --
>
>
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215838#comment-16215838
]
Ken Krugler commented on TIKA-2478:
---
Hi [~talli...@apache.org] - I've attached two mixed
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2478:
--
Attachment: mixed-simple
mixed-with-pdf-inline
> MBOX import includes redundant copies of
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214491#comment-16214491
]
Ken Krugler commented on TIKA-2478:
---
I recently had to dig into extracting text from emai
[
https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213150#comment-16213150
]
Ken Krugler commented on TIKA-2471:
---
Hi [~talli...@apache.org] - I don't think using MBox
[
https://issues.apache.org/jira/browse/TIKA-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212870#comment-16212870
]
Ken Krugler commented on TIKA-2482:
---
Hi [~cermar] - in general it's best to first post th
[
https://issues.apache.org/jira/browse/TIKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195386#comment-16195386
]
Ken Krugler commented on TIKA-2472:
---
I had to deal with this before in another project -
[
https://issues.apache.org/jira/browse/TIKA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423280#comment-15423280
]
Ken Krugler commented on TIKA-2056:
---
Hi [~chrismattmann] - I haven't actually dealt with
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2038:
--
Description:
Currently, Tika uses icu4j for detecting charset encoding of HTML documents as
well as the
[
https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378434#comment-15378434
]
Ken Krugler commented on TIKA-2033:
---
Yes, of course...I was thinking of whether we'd want
[
https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378358#comment-15378358
]
Ken Krugler commented on TIKA-2033:
---
Do you have a suggestion for how the text should app
[
https://issues.apache.org/jira/browse/TIKA-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332124#comment-15332124
]
Ken Krugler commented on TIKA-2010:
---
OK - I think then we'll want to escalate [TIKA-1599]
[
https://issues.apache.org/jira/browse/TIKA-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2010:
--
Priority: Minor (was: Major)
Issue Type: Improvement (was: Bug)
> Unable to get value when heade
[
https://issues.apache.org/jira/browse/TIKA-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15331829#comment-15331829
]
Ken Krugler commented on TIKA-2010:
---
Would it be possible for you to try this broken HTML
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler closed TIKA-1938.
-
Resolution: Fixed
Fix with commit da5bbbe..46d5775.
Thanks Joseph!
> HtmlParser drops elements found ins
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-1938:
-
Assignee: Ken Krugler
> HtmlParser drops elements found inside
> -
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227078#comment-15227078
]
Ken Krugler commented on TIKA-1835:
---
I’d rolled in Markus’s patch directly to support the
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-1896:
--
Priority: Minor (was: Major)
Issue Type: Improvement (was: Bug)
> Invalid closing script tag not
[
https://issues.apache.org/jira/browse/TIKA-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218412#comment-15218412
]
Ken Krugler commented on TIKA-1896:
---
Hi Tim - hmm, changing the type of the script tag fr
[
https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202149#comment-15202149
]
Ken Krugler commented on TIKA-1855:
---
In general I'd still prefer to keep test data with i
[
https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167891#comment-15167891
]
Ken Krugler commented on TIKA-1855:
---
The things I don't like about this approach are that
[
https://issues.apache.org/jira/browse/TIKA-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165642#comment-15165642
]
Ken Krugler commented on TIKA-1855:
---
I'm ok with having some duplicated test files - thou
[
https://issues.apache.org/jira/browse/TIKA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150618#comment-15150618
]
Ken Krugler commented on TIKA-1858:
---
Hi Raghu,
This is a great question for the user mai
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145135#comment-15145135
]
Ken Krugler commented on TIKA-1851:
---
+1 for the proposal. Let me know if you want me to t
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141632#comment-15141632
]
Ken Krugler commented on TIKA-1851:
---
Hi [~talli...@apache.org] - thanks for generating th
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136079#comment-15136079
]
Ken Krugler commented on TIKA-1851:
---
After poking around a bit, my vote would be to (a) m
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136077#comment-15136077
]
Ken Krugler commented on TIKA-1723:
---
OK, I've committed this code to a new tika-langdetec
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136003#comment-15136003
]
Ken Krugler commented on TIKA-1851:
---
I got a clean build w/o any pre-installed modules, s
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135342#comment-15135342
]
Ken Krugler commented on TIKA-1851:
---
Hmm, now the top-level build fails on the tika parse
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135336#comment-15135336
]
Ken Krugler commented on TIKA-1851:
---
I did a top-level "mvn clean install", which failed
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133629#comment-15133629
]
Ken Krugler commented on TIKA-1851:
---
I'm also curious why we have Groovy code and shell s
[
https://issues.apache.org/jira/browse/TIKA-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133624#comment-15133624
]
Ken Krugler commented on TIKA-1851:
---
Hi [~talli...@apache.org] - I'm also getting a local
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132961#comment-15132961
]
Ken Krugler commented on TIKA-1723:
---
Good idea re gathering input - I just emailed the de
[
https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131749#comment-15131749
]
Ken Krugler commented on TIKA-1824:
---
As someone who regularly deals with 100s of jars in
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130676#comment-15130676
]
Ken Krugler commented on TIKA-1723:
---
[~talli...@apache.org] I must admit, focusing on thi
[
https://issues.apache.org/jira/browse/TIKA-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130666#comment-15130666
]
Ken Krugler commented on TIKA-1848:
---
Unless I'm not understanding the issues properly, I
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1558#comment-1558
]
Ken Krugler edited comment on TIKA-1835 at 1/21/16 7:36 PM:
Git
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-1835.
---
Resolution: Fixed
Git commit 489ab93..fe841bc
> LinkContentHandler skips iframe and rel tags
> ---
[
https://issues.apache.org/jira/browse/TIKA-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-1835:
-
Assignee: Ken Krugler
> LinkContentHandler skips iframe and rel tags
> ---
[
https://issues.apache.org/jira/browse/TIKA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109054#comment-15109054
]
Ken Krugler commented on TIKA-1838:
---
Hi Raymond - this is a question that you should post
[
https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106908#comment-15106908
]
Ken Krugler commented on TIKA-1836:
---
This seems to be an issue for POI, as per the messag
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048819#comment-15048819
]
Ken Krugler commented on TIKA-1599:
---
I think we'd be wanting to parse the raw crawl resul
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048806#comment-15048806
]
Ken Krugler commented on TIKA-1599:
---
Hi [~markus.jel...@openindex.io] - I was actually ta
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048773#comment-15048773
]
Ken Krugler commented on TIKA-1599:
---
I'm hoping we could use one or the other, as I don't
[
https://issues.apache.org/jira/browse/TIKA-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047029#comment-15047029
]
Ken Krugler commented on TIKA-1808:
---
Hi Markus - I don't think this is actually a bug. I
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006797#comment-15006797
]
Ken Krugler commented on TIKA-1794:
---
Tika uses XHTML 1.0, which doesn't allow the form-fe
[
https://issues.apache.org/jira/browse/TIKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006743#comment-15006743
]
Ken Krugler commented on TIKA-1794:
---
The output of the Tika parse process is XHTML, and I
[
https://issues.apache.org/jira/browse/TIKA-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984111#comment-14984111
]
Ken Krugler commented on TIKA-1443:
---
Hi [~talli...@apache.org] - I did look at it, and re
[
https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901434#comment-14901434
]
Ken Krugler commented on TIKA-1726:
---
[~talli...@apache.org] had asked for input on this -
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729595#comment-14729595
]
Ken Krugler commented on TIKA-1723:
---
Biggest remaining issue before I commit is how to de
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729588#comment-14729588
]
Ken Krugler commented on TIKA-1723:
---
Hi Tim,
1. Not sure about "Make language detection
[
https://issues.apache.org/jira/browse/TIKA-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729432#comment-14729432
]
Ken Krugler commented on TIKA-491:
--
Currently the language-detector library I'm integrating
[
https://issues.apache.org/jira/browse/TIKA-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-491:
Assignee: Ken Krugler
> Add language identification support for Norwegian Bokmål and Norwegian Nynors
[
https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729427#comment-14729427
]
Ken Krugler commented on TIKA-492:
--
Currently the language-detector library I'm integrating
[
https://issues.apache.org/jira/browse/TIKA-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729416#comment-14729416
]
Ken Krugler commented on TIKA-856:
--
The language-detector project has support for Japanese,
[
https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729414#comment-14729414
]
Ken Krugler commented on TIKA-568:
--
The new LanguageDetector API has a getRawScore() call o
[
https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-568:
Assignee: Ken Krugler
> Language Detection isReasonablyCertain() hides valuable information
> ---
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729250#comment-14729250
]
Ken Krugler commented on TIKA-1723:
---
Regarding the current detection code...
I'm going t
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-1723:
--
Attachment: TIKA-1723-3.patch
New patch which uses Locale to handle language names (language tags).
> In
[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726266#comment-14726266
]
Ken Krugler commented on TIKA-1723:
---
Hi Tim - I just attached a new version of my patch,
1 - 100 of 395 matches
Mail list logo