[jira] [Closed] (TIKA-2829) Security Vulnerability in boilerpipe (CVE-2018-16481)

2022-02-03 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler closed TIKA-2829. - > Security Vulnerability in boilerpipe (CVE-2018-16481) >

[jira] [Resolved] (TIKA-2829) Security Vulnerability in boilerpipe (CVE-2018-16481)

2022-02-03 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler resolved TIKA-2829. --- Resolution: Not A Bug > Security Vulnerability in boilerpipe (CVE-2018-16481)

[jira] [Assigned] (TIKA-2829) Security Vulnerability in boilerpipe (CVE-2018-16481)

2022-01-24 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler reassigned TIKA-2829: - Assignee: Kenneth William Krugler > Security Vulnerability in boilerpipe

[jira] [Commented] (TIKA-2829) Security Vulnerability in boilerpipe (CVE-2018-16481)

2022-01-24 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481434#comment-17481434 ] Kenneth William Krugler commented on TIKA-2829: --- Hi Alex - I took a look at

[jira] [Commented] (TIKA-3546) Side effects of setting WriteOutContentHandler write limit as -1 are unknown

2021-09-13 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414273#comment-17414273 ] Kenneth William Krugler commented on TIKA-3546: --- [~yash.mehta] As per the we

[jira] [Commented] (TIKA-3510) tika-parser-scientific-module seems to embbed many dependencies

2021-08-05 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394139#comment-17394139 ] Kenneth William Krugler commented on TIKA-3510: --- I'd vote for option #1. The

[jira] [Commented] (TIKA-3471) Some ideas

2021-07-09 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378304#comment-17378304 ] Kenneth William Krugler commented on TIKA-3471: --- Hi [~IvanRadovanovic] - it'

[jira] [Resolved] (TIKA-3471) Some ideas

2021-07-09 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler resolved TIKA-3471. --- Resolution: Won't Fix > Some ideas > -- > > Key: TIKA-

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376851#comment-17376851 ] Kenneth William Krugler commented on TIKA-3466: --- Hi [~psakkanan] - that name

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376771#comment-17376771 ] Kenneth William Krugler commented on TIKA-3466: --- Browsers do all kinds of he

[jira] [Commented] (TIKA-3466) Cannot detect mimetype of xhtml file when script is first node instead of html

2021-07-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376712#comment-17376712 ] Kenneth William Krugler commented on TIKA-3466: --- This looks like broken HTML

[jira] [Commented] (TIKA-3464) Is it possible to extract individual pdf pages using Tika Server?

2021-07-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376701#comment-17376701 ] Kenneth William Krugler commented on TIKA-3464: --- Hi [~sallas] - just FYI, th

[jira] [Commented] (TIKA-3263) WriteLimitReachedException is not public

2021-05-03 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338637#comment-17338637 ] Kenneth William Krugler commented on TIKA-3263: --- Isn't this a duplicate of -

[jira] [Commented] (TIKA-3382) Improve writelimitreached handling

2021-05-03 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338636#comment-17338636 ] Kenneth William Krugler commented on TIKA-3382: --- I'm going to call this out

[jira] [Commented] (TIKA-3375) Release new version

2021-04-28 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334847#comment-17334847 ] Kenneth William Krugler commented on TIKA-3375: --- Yes. And I think it's best

[jira] [Commented] (TIKA-3343) Remove Tika custom lang detection for 2.x

2021-03-30 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311844#comment-17311844 ] Kenneth William Krugler commented on TIKA-3343: --- [~tallison] - I didn't find

[jira] [Commented] (TIKA-3339) Update tika's log framework from log4j to logback

2021-03-29 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310701#comment-17310701 ] Kenneth William Krugler commented on TIKA-3339: --- Hi [~wyx-98] - could you ad

[jira] [Updated] (TIKA-3339) Update tika's log framework from log4j to logback

2021-03-29 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler updated TIKA-3339: -- Summary: Update tika's log framework from log4j to logback (was: update tika's

[jira] [Commented] (TIKA-3326) Code cleaning and Javadoc

2021-03-16 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302783#comment-17302783 ] Kenneth William Krugler commented on TIKA-3326: --- Hi [~subhajitdas298] - for

[jira] [Updated] (TIKA-3263) WriteLimitReachedException is not public

2021-01-05 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler updated TIKA-3263: -- Issue Type: Improvement (was: Bug) > WriteLimitReachedException is not public >

[jira] [Updated] (TIKA-3263) WriteLimitReachedException is not public

2021-01-05 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler updated TIKA-3263: -- Priority: Minor (was: Major) > WriteLimitReachedException is not public > -

[jira] [Commented] (TIKA-3263) WriteLimitReachedException is not public

2021-01-05 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259279#comment-17259279 ] Kenneth William Krugler commented on TIKA-3263: --- I've meant to comment on th

[jira] [Commented] (TIKA-3255) Parsing MP3 file with record size > 100000 fails

2020-12-22 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253631#comment-17253631 ] Kenneth William Krugler commented on TIKA-3255: --- Hi [~tallison] - looks like

[jira] [Updated] (TIKA-3255) Parsing MP3 file with record size > 100000 fails

2020-12-22 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler updated TIKA-3255: -- Summary: Parsing MP3 file with record size > 10 fails (was: Parsing MP3 fil

[jira] [Issue Comment Deleted] (TIKA-3255) Parsing MP3 file with record > 100000

2020-12-22 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler updated TIKA-3255: -- Comment: was deleted (was: A description of what happens, or a title that indica

[jira] [Commented] (TIKA-3255) Parsing MP3 file with record > 100000

2020-12-22 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253630#comment-17253630 ] Kenneth William Krugler commented on TIKA-3255: --- A description of what happe

[jira] [Closed] (TIKA-3239) TikaException: data length must be < 1000000

2020-11-30 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler closed TIKA-3239. - Resolution: Not A Problem > TikaException: data length must be < 100 > ---

[jira] [Commented] (TIKA-3239) TikaException: data length must be < 1000000

2020-11-30 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240795#comment-17240795 ] Kenneth William Krugler commented on TIKA-3239: --- Hi [~harirehm] - this is th

[jira] [Commented] (TIKA-3235) Build failure caused by timeouts in XMLReaderUtils

2020-11-24 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238388#comment-17238388 ] Kenneth William Krugler commented on TIKA-3235: --- It still happens on occasio

[jira] [Commented] (TIKA-3235) Build failure caused by timeouts in XMLReaderUtils

2020-11-24 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238215#comment-17238215 ] Kenneth William Krugler commented on TIKA-3235: --- I took a quick look at the

[jira] [Commented] (TIKA-3235) Build failure caused by timeouts in XMLReaderUtils

2020-11-23 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237713#comment-17237713 ] Kenneth William Krugler commented on TIKA-3235: --- A second build attempt didn

[jira] [Commented] (TIKA-3233) aaaa

2020-11-21 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236685#comment-17236685 ] Kenneth William Krugler commented on TIKA-3233: --- Hi [~863775910] - please fi

[jira] [Commented] (TIKA-3204) License incompliance with xmp-core 6.1.10

2020-09-24 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201587#comment-17201587 ] Kenneth William Krugler commented on TIKA-3204: --- And isn't the root issue th

[jira] [Commented] (TIKA-3204) License incompliance with xmp-core 6.1.10

2020-09-24 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201583#comment-17201583 ] Kenneth William Krugler commented on TIKA-3204: --- It's odd that 5.1.3 is unde

[jira] [Commented] (TIKA-3183) Tika 2.0.0 -- Move all versions to properties in tika-parent

2020-08-21 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182077#comment-17182077 ] Kenneth William Krugler commented on TIKA-3183: --- Apologies in advance if thi

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175608#comment-17175608 ] Kenneth William Krugler commented on TIKA-3155: --- For the common cases, it wo

[jira] [Commented] (TIKA-3153) Text File identified as message/rfc822

2020-08-10 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174976#comment-17174976 ] Kenneth William Krugler commented on TIKA-3153: --- I think that for many text-

[jira] [Commented] (TIKA-3147) String punctuation in lang id component within tika-eval

2020-07-27 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165745#comment-17165745 ] Kenneth William Krugler commented on TIKA-3147: --- Hi [~tallison] - can you in

[jira] [Commented] (TIKA-3123) request to parse Chinese, but return Russian

2020-06-23 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142921#comment-17142921 ] Kenneth William Krugler commented on TIKA-3123: --- This looks like a character

[jira] [Commented] (TIKA-3114) Error reading transcript from document

2020-06-12 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134590#comment-17134590 ] Kenneth William Krugler commented on TIKA-3114: --- [~dbalasub] - unfortunately

[jira] [Commented] (TIKA-3115) Detect parquet files

2020-06-12 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134535#comment-17134535 ] Kenneth William Krugler commented on TIKA-3115: --- What would be the data you'

[jira] [Commented] (TIKA-3115) Detect parquet files

2020-06-12 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134511#comment-17134511 ] Kenneth William Krugler commented on TIKA-3115: --- Sadly, 'PAR1' is about all

[jira] [Commented] (TIKA-3114) Error reading transcript from document

2020-06-11 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133808#comment-17133808 ] Kenneth William Krugler commented on TIKA-3114: --- Hi [~dbalasub] - unfortunat

[jira] [Commented] (TIKA-3114) Error reading transcript from document

2020-06-11 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133798#comment-17133798 ] Kenneth William Krugler commented on TIKA-3114: --- Hi [~dbalasub] - please att

[jira] [Commented] (TIKA-3109) Ingest attachment: failed to extract text from iframe

2020-06-10 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132737#comment-17132737 ] Kenneth William Krugler commented on TIKA-3109: --- I think we have to treat it

[jira] [Commented] (TIKA-3109) Ingest attachment: failed to extract text from iframe

2020-06-10 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132729#comment-17132729 ] Kenneth William Krugler commented on TIKA-3109: --- I'm guessing the issue is t

[jira] [Commented] (TIKA-3109) Ingest attachment: failed to extract text from iframe

2020-06-10 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132724#comment-17132724 ] Kenneth William Krugler commented on TIKA-3109: --- Over at [Elasticsearch|[ht

[jira] [Commented] (TIKA-3109) Ingest attachment: failed to extract text from iframe

2020-06-10 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132719#comment-17132719 ] Kenneth William Krugler commented on TIKA-3109: --- Also I tried opening it wit

[jira] [Commented] (TIKA-3109) Ingest attachment: failed to extract text from iframe

2020-06-10 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132718#comment-17132718 ] Kenneth William Krugler commented on TIKA-3109: --- Who generated this badly br

[jira] [Commented] (TIKA-3096) detect image in any document

2020-04-30 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096592#comment-17096592 ] Kenneth William Krugler commented on TIKA-3096: --- Hi [~suchendra] - please as

[jira] [Commented] (TIKA-3089) Text should be wrapped in pre-tags instead of in p-tags

2020-04-13 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082761#comment-17082761 ] Kenneth William Krugler commented on TIKA-3089: --- Wouldn't this be a breaking

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-25 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044722#comment-17044722 ] Kenneth William Krugler commented on TIKA-3035: --- FWIW,  [picocli|[https://pi

[jira] [Commented] (TIKA-3048) Tika unable to parse html files with non UTF-8 charset

2020-02-20 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041261#comment-17041261 ] Kenneth William Krugler commented on TIKA-3048: --- Hi [~tallison] - I think th

[jira] [Commented] (TIKA-3019) [9.8] [CVE-2019-17571] [tika-app] [1.23]

2020-01-08 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010980#comment-17010980 ] Kenneth William Krugler commented on TIKA-3019: --- For option (a), I think the

[jira] [Commented] (TIKA-3019) [9.8] [CVE-2019-17571] [tika-app] [1.23]

2020-01-07 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010262#comment-17010262 ] Kenneth William Krugler commented on TIKA-3019: --- Hi [~tallison] - if we're e

[jira] [Commented] (TIKA-2955) PDF parsing to XHTML results in tika attempting to write invalid HTML characters.

2019-10-28 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961497#comment-16961497 ] Kenneth William Krugler commented on TIKA-2955: --- Hi [~tallison] - no blocker

[jira] [Commented] (TIKA-2624) Rendering PDFs for OCR with Tesseract uses different DPI than claimed

2019-10-22 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957265#comment-16957265 ] Kenneth William Krugler commented on TIKA-2624: --- Seems like a good change to