Hi devs,
I saw this dataset on Hugging Face, seems useful for evaluating Tika OCR…
— Ken
https://huggingface.co/datasets/pixparse/idl-wds
Hi Tika devs,
Check out Magika at https://github.com/google/magika
Wondering if we could leverage Deeplearing4j to run the model from that project.
— Ken
--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink & Pinot
) Keep Java 11 in "main"/3.x now and set the EOL for Tika 2.x/Java 8 in say
> 6 months or fewer?
>
> Thank you, all, for your feedback!
>
> Best,
>
> Tim
>
>
--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch
e, Sep 12, 2023 at 10:49 AM Tim Allison <mailto:talli...@apache.org>> wrote:
>> >If Tika users will be happy to move on and drop Java 8 and/or javax. Please
>> >drop them :)))
>>
>> Fellow devs and broader Tika community, are we ok with EOL'ing Tika
with
> ASF projects. I'd want to copy the header pretty much literally about
> no endorsements, etc.
> What would you think of adding something similar to our wiki or our website?
>
>Best,
>
> Tim
--
Ken Krugler
http://www
s successfully.
>>
>>>
>>> [X] +1 Release this package as Apache Tika 2.2.0
>>
>> I did notice that the tika DL's module(s) are pulling in the enire Hadoop
>> dependency chain. I wonder if we can cut down on this... that is however a
>> concern outside of this release candidate review.
>>
>> Thanks for the quick turnaround.
>> lewismc
>>
--
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch
gt; a) tika-pipes hands-on workshop
> b) get to know the users -- 5 minute go-around the room "this is how
> we use it; these are our pain points"
> c) ???
>
> Again, thank you!
>
> Best,
>
> Tim
--
Ken K
est this with more
> recent versions of the surefire plugin, or is there a recommended
> workaround?
>
> Thank you.
>
>Best,
>
> Tim
>
> [0]
> http://maven.apache.org/surefire/maven-surefire-plugin/faq.html#vm-termination
--
M Nicholas DiPiazza
>>> wrote:
>>>>
>>>> +1 on 1.27 release.
>>>>
>>>> On Mon, Jun 28, 2021, 10:57 AM Tim Allison wrote:
>>>>>
>>>>> All,
>>>>> The recent release of PDFBox fixed 2 DoS CVEs
dency, etc.
>
> Some options for classic-> basic, base, ...what else?
>
> Any other recommendations for these names? Thank you!
>
> Best,
>
> Tim
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
Allison
>> Priority: Minor
>>Fix For: 2.0.0
>>
>>
>> We or our dependencies use 4? json parsers last time I looked. It feels like
>> a majority of our dependencies use jackson. I used to have a preference for
>> GSON, which is why we h
a>
>
> Please vote on releasing this package as Apache Tika 1.25.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.25
> [ ] -1 Do not release this package bec
rent SAXParser which is not handled correctly in
> XMLReaderUtils? What OS, what version of java?
>
> Thank you, again.
>
> Best,
>
> Tim
>
> On Mon, Nov 23, 2020 at 1:40 PM Ken Krugler
> wrote:
>
>> Hi
131-b11, mixed mode)
— Ken
> On Mon, Nov 23, 2020 at 1:40 PM Ken Krugler
> wrote:
>
>> Hi all,
>>
>> I got past the JCE issue, but now some tests are failing with timeouts.
>>
>> For this test:
>>
>> [INFO] Running org.apache.tika.parser.micr
asing the
XMLReaderUtils.POOL_SIZE
Nov 21, 2020 10:39:07 PM org.apache.tika.utils.XMLReaderUtils acquireSAXParser
WARNING: Contention waiting for a SAXParser. Consider increasing the
XMLReaderUtils.POOL_SIZE
… and so on…
Any suggestions?
Thanks!
— Ken
--
Ken Krugler
http://www.scaleunlimite
e/lib/security/
> sudo cp ~/Downloads/UnlimitedJCEPolicyJDK8/local_policy.jar
> $JAVA_HOME/jre/lib/security/
— Ken
>
> On Fri, Nov 20, 2020 at 1:43 PM Ken Krugler
> wrote:
>
>> Hi all,
>>
>> I was trying to build the 1.25-rc1 branch, and ran into this same issue
>&
ache.org/jira/browse/TIKA-2917
>>>Project: Tika
>>> Issue Type: Improvement
>>> Reporter: Tim Allison
>>> Assignee: Tim Allison
>>> Priority: Minor
>>>
>>> Inline images may have XMP associated with them. We are not currently
>>> extracting this metadata.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v7.6.14#76016)
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
ime to start working on integrating Bob's
>>
>> work on the current main branch. I'll have to ignore most of the incoming
>>
>> issues for a bit...unlike the last 4 years...this time I mean it. :)
>>
>> Let me know if there are any objections to heading down this path now.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Tim
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
a wiki page, making a whole sale set of
> edits, getting review of those edits from the community, and assuming it
> passes muster, then bringing the edits back to the original page?
>
>
>
> Eric
>
>> On Oct 29, 2019, at 7:00 PM, Ken Krugler wrote:
>>
&g
ve for
> change notifications double-check!)
>
> Thanks
> Nick
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
ent(someImage);
> creator.complete();
>
> It would be consistent with the Tika approach on the read side.
>
> Cheers, Sergey
> On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler wrote:
>
>> If you’re suggesting ways to make it easier to use something like
>> YaHPConver
o the text to PDF
> (for a start, something on top of that transformer), and then may be even
> for other formats ?
>
> Sergey
--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
Cheers,
>
> Tim
------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914482#comment-16914482
]
Ken Krugler commented on TIKA-1599:
---
>From TIKA-2928, an example of text tha
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-1599:
--
Priority: Major (was: Minor)
> Switch from TagSoup to JS
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914481#comment-16914481
]
Ken Krugler commented on TIKA-2928:
---
Hi [~Sargent_D] - thanks for trying this out!
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2928:
--
Issue Type: Improvement (was: Bug)
Priority: Minor (was: Major)
> Less than sign within
[
https://issues.apache.org/jira/browse/TIKA-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913382#comment-16913382
]
Ken Krugler commented on TIKA-2928:
---
The issue isn't that this is &quo
ika.apache.org/
>
> -- Tim Allison, on behalf of the Apache Tika community
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
+1
— Ken
> On Jul 15, 2019, at 2:37 PM, Tim Allison wrote:
>
> Anyone have anything they want to get into 1.22? If not, I’ll kick off the
> regression tests shortly.
>
> Cheers,
> Tim
----------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimite
failed.
>
> In short, this is an area for improvement. I suspect our current
> mechanism would also be pretty awful on UTF-16.
>
> On Tue, Jun 18, 2019 at 4:26 PM Ken Krugler
> wrote:
>>
>> Hi devs,
>>
>> I’m trying to remember the history of how Tika’s cu
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869004#comment-16869004
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - I finally got ar
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856107#comment-16856107
]
Ken Krugler commented on TIKA-2790:
---
[~talli...@apache.org] - I'd have to lo
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856052#comment-16856052
]
Ken Krugler commented on TIKA-2790:
---
Yalder processes the entire string. I tho
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836738#comment-16836738
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - thanks for running
ing wiki migration (from moin to
>> confluence)?
>>>
>>> I can try it via selfservice.a.o if you consent but I'm not sure if I
>> have
>>> enough access to do so. Maybe only Tim as PMC Chair can.
>>>
>>> --
>>> Best regards,
>>> Konstantin Gribov.
>>
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
[
https://issues.apache.org/jira/browse/TIKA-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16812492#comment-16812492
]
Ken Krugler commented on TIKA-2849:
---
Hi [~boris-petrov] - two things here. First
s to do so. Maybe only Tim as PMC Chair can.
>
> --
> Best regards,
> Konstantin Gribov.
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra
at
it then.
Regards,
— Ken
> On Jan 17, 2019, at 1:48 PM, Mike Thomsen wrote:
>
> Ken,
>
> Here's a Gist version of it:
>
> https://gist.github.com/MikeThomsen/84abb89aab903a8b21d64af532cc369b
>
> Thanks,
>
> Mike
>
> On Thu, Jan 17, 2019 at
nl
> ruru
> zhlt
>
> Is there something that needs to be done to enable the detection of Asian
> languages or should I file this as a bug report?
>
> Thanks,
>
> Mike
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.co
for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.20
> [ ] -1 Do not release this package because...
>
> Here's my +1.
>
> Cheers,
>
> Tim
-
[
https://issues.apache.org/jira/browse/TIKA-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710767#comment-16710767
]
Ken Krugler commented on TIKA-2794:
---
Hi [~phallett] - it's better if you f
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707822#comment-16707822
]
Ken Krugler commented on TIKA-2790:
---
[~talli...@apache.org] - I've compared
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707521#comment-16707521
]
Ken Krugler commented on TIKA-2790:
---
Yalder is about 2-2.5x faster than lang
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707343#comment-16707343
]
Ken Krugler commented on TIKA-2790:
---
My concern with OpenNLP is that during a web c
[
https://issues.apache.org/jira/browse/TIKA-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707292#comment-16707292
]
Ken Krugler commented on TIKA-2790:
---
Hi [~talli...@apache.org] - Is there an issue
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658028#comment-16658028
]
Ken Krugler commented on TIKA-2758:
---
[~markus17] - My comment above was about
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657976#comment-16657976
]
Ken Krugler edited comment on TIKA-2758 at 10/20/18 7:5
[
https://issues.apache.org/jira/browse/TIKA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657976#comment-16657976
]
Ken Krugler commented on TIKA-2758:
---
At least for the "detroidnews.html
[
https://issues.apache.org/jira/browse/TIKA-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-2683.
---
Resolution: Fixed
Fixed via [PR
#243|https://github.com/apache/tika/commit
[
https://issues.apache.org/jira/browse/TIKA-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler reassigned TIKA-2683:
-
Assignee: Ken Krugler
> Missing space and inappropriate new-line in Boilerpipe extracted t
[
https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16536396#comment-16536396
]
Ken Krugler commented on TIKA-2648:
---
[~wastl-nagel] - you mentioned that you tho
e target? This would allow us to bake modularity in now.
> Given that I haven't actually tried modularizing/jigsawizing Tika yet, this
> could be a complete disaster, of course. :)
>
> Cheers,
>
> Tim
--
Ken Krugler
+1 530-
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2671:
--
Description:
org.apache.tika.parser.html.HtmlEncodingDetector ignores the document's
metadata. So
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2671:
--
Component/s: detector
> HtmlEncodingDetector doesnt take provided metadata into acco
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516644#comment-16516644
]
Ken Krugler commented on TIKA-2671:
---
Hi [~gbouchar] - I'm curious how much te
[
https://issues.apache.org/jira/browse/TIKA-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514355#comment-16514355
]
Ken Krugler commented on TIKA-2671:
---
Unfortunately there's no great solu
> 2018-05-29 16:11 GMT-03:00 Ken Krugler :
>
>> Thanks for the ref, Tim.
>>
>> I’m curious why SolrCell doesn’t fire up threads when parsing docs with
>> Tika (or use the fork parser), to mitigate issues with hangs & crashes?
>>
>> — Ken
Thanks for the ref, Tim.
I’m curious why SolrCell doesn’t fire up threads when parsing docs with Tika
(or use the fork parser), to mitigate issues with hangs & crashes?
— Ken
> On May 29, 2018, at 11:54 AM, Tim Allison wrote:
>
> All,
>
> Over the weekend, Shawn Heisey very kindly drafted a
[
https://issues.apache.org/jira/browse/TIKA-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493927#comment-16493927
]
Ken Krugler commented on TIKA-2654:
---
Hi Ankit - for problems encountered while buil
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482586#comment-16482586
]
Ken Krugler commented on TIKA-2643:
---
When you've got conflicting jars on the
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481791#comment-16481791
]
Ken Krugler commented on TIKA-2643:
---
Looking at the crash log, I see the follo
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481786#comment-16481786
]
Ken Krugler commented on TIKA-2643:
---
Hi [~fyemaple] - how do you know that Tika 1.5
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479468#comment-16479468
]
Ken Krugler commented on TIKA-2643:
---
[~fyemaple] - yes, but note that {{kill -
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477811#comment-16477811
]
Ken Krugler commented on TIKA-2643:
---
[~talli...@apache.org] - different version
[
https://issues.apache.org/jira/browse/TIKA-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16477513#comment-16477513
]
Ken Krugler commented on TIKA-2643:
---
If I was going to guess, it's that your
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16384242#comment-16384242
]
Ken Krugler commented on TIKA-2592:
---
[~AndreasMeier] - I assume when you said:
{quo
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Attachment: IANA Charset names.txt
> HTML with charset unicode handled as utf-16 instead ut
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Priority: Minor (was: Major)
> HTML with charset unicode handled as utf-16 instead ut
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2592:
--
Issue Type: Improvement (was: Bug)
> HTML with charset unicode handled as utf-16 instead ut
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382330#comment-16382330
]
Ken Krugler commented on TIKA-2592:
---
Before making this kind of change (default &quo
[
https://issues.apache.org/jira/browse/TIKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380874#comment-16380874
]
Ken Krugler commented on TIKA-2592:
---
Hi [~AndreasMeier] - actually "unic
[
https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379747#comment-16379747
]
Ken Krugler commented on TIKA-2576:
---
[~talli...@mitre.org] - After some greppin
[
https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377744#comment-16377744
]
Ken Krugler commented on TIKA-2576:
---
Is this going to trigger more warnings in the
Hi devs,
I’m curious about the occasional use of java.util.logging in Tika:
> ./tika-core/src/main/java/org/apache/tika/config/InitializableProblemHandler.java:import
> java.util.logging.Logger;
> ./tika-core/src/main/java/org/apache/tika/config/LoadErrorHandler.java:import
> java.util.logging.
[
https://issues.apache.org/jira/browse/TIKA-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler resolved TIKA-2539.
---
Resolution: Duplicate
> TagSoup HTML parser is project
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215838#comment-16215838
]
Ken Krugler commented on TIKA-2478:
---
Hi [~talli...@apache.org] - I've attached
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2478:
--
Attachment: mixed-simple
mixed-with-pdf-inline
> MBOX import includes redundant cop
[
https://issues.apache.org/jira/browse/TIKA-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214491#comment-16214491
]
Ken Krugler commented on TIKA-2478:
---
I recently had to dig into extracting text
[
https://issues.apache.org/jira/browse/TIKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213150#comment-16213150
]
Ken Krugler commented on TIKA-2471:
---
Hi [~talli...@apache.org] - I don't th
[
https://issues.apache.org/jira/browse/TIKA-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212870#comment-16212870
]
Ken Krugler commented on TIKA-2482:
---
Hi [~cermar] - in general it's best to f
[
https://issues.apache.org/jira/browse/TIKA-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195386#comment-16195386
]
Ken Krugler commented on TIKA-2472:
---
I had to deal with this before in another pro
a step in the release?
No, I don’t believe so.
> Does it take a few weeks for the sync?
Here’s what I’ve heard (from a forum post):
> Also FYI, mvnrepository.com is unaffiliated with Maven Central, and lags it
> by anywhere from a few hours to a few days
So potentially a few days.
> (Maven/Ant+Ivy/Gradle/SBT/whatever) in their projects,
> so it shouldn't be something bothersome for end user.
>
> What do you think, folks?
>
> [1]: https://issues.apache.org/jira/browse/TIKA-2314
>
> --
>
> Best regards,
> Konstantin Gribov
tracking issue
> [5]: http://checkstyle.sourceforge.net/
> [6]: https://maven.apache.org/plugins/maven-checkstyle-plugin/
>
>
>
> --
>
> Best regards,
> Konstantin Gribov
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
but was:<[
>
> ]>
> Tests in error:
> ODFParserTest.testNullStylesInODTFooter:367 » WriteLimitReached Your
> document ...
>
> ODFParserTest.testParagraphLevelFontStyles:388->TikaTest.getXML:191->TikaTest.getXML:205
> » SAX
--
K
[ ] +1 Release this package as Apache Tika 1.14
> [ ] -1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
-1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
Hi Lewis,
> On Sep 21, 2016, at 2:32pm, lewis john mcgibbney wrote:
>
> Hi Ken,
> Good question. Answer below
>
> On Wed, Sep 21, 2016 at 2:16 PM, wrote:
>
>>
>> From: Ken Krugler
>> To: dev@tika.apache.org
>> Cc:
>> Date: Tu
s.apache.org/jira/browse/INFRA-12186, it will help
> us to reduce major bugs in Tika over time.
> Thanks
> Lewis
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
--
Ken Krugler
+1 530-210-6378
http
his issue doesn't
> affect me directly.
>
> [1]: http://proguard.sourceforge.net/index.html#manual/usage.html
> [2]: http://www.oracle.com/technetwork/java/javase/clopts-139448.html#gbmtm
>
>
> ср, 24 авг. 2016 г. в 21:16, Ken Krugler :
>
>> I think excluding mor
est coverage to ensure common usecases won't be broken, of course.
>
> [1]:
> https://issues.apache.org/jira/browse/TIKA-2007?focusedCommentId=15435206&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15435206
> --
>
> Be
[
https://issues.apache.org/jira/browse/TIKA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423280#comment-15423280
]
Ken Krugler commented on TIKA-2056:
---
Hi [~chrismattmann] - I haven't actually d
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2038:
--
Description:
Currently, Tika uses icu4j for detecting charset encoding of HTML documents as
well as the
org/browse/OSSRH-22250, looks like it’s
https://in.linkedin.com/in/meetabhishekjindal
<https://in.linkedin.com/in/meetabhishekjindal>
— Ken
--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr
[
https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378434#comment-15378434
]
Ken Krugler commented on TIKA-2033:
---
Yes, of course...I was thinking of whether
[
https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378358#comment-15378358
]
Ken Krugler commented on TIKA-2033:
---
Do you have a suggestion for how the text sh
[
https://issues.apache.org/jira/browse/TIKA-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332124#comment-15332124
]
Ken Krugler commented on TIKA-2010:
---
OK - I think then we'll want to escalate [
[
https://issues.apache.org/jira/browse/TIKA-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Krugler updated TIKA-2010:
--
Priority: Minor (was: Major)
Issue Type: Improvement (was: Bug)
> Unable to get value w
1 - 100 of 597 matches
Mail list logo