[GitHub] [tika] uschindler opened a new pull request #318: Add support for forbiddenapis 3.0

2020-04-26 Thread GitBox
uschindler opened a new pull request #318: URL: https://github.com/apache/tika/pull/318 This hides all warnings caused by commons-io not used in all modules (cf. new mojo parameter). See https://github.com/policeman-tools/forbidden-apis/wiki/Changes#version-30-released-2020-04-27

[GitHub] [tika] jusu opened a new pull request #319: Sync for 1.24.1

2020-04-27 Thread GitBox
jusu opened a new pull request #319: URL: https://github.com/apache/tika/pull/319 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [tika] jusu commented on pull request #319: Sync for 1.24.1

2020-04-27 Thread GitBox
jusu commented on pull request #319: URL: https://github.com/apache/tika/pull/319#issuecomment-620215553 Ignore this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [tika] KranthiGV commented on a change in pull request #317: fix for TIKA-3089 contributed by pvanderweerd

2020-06-03 Thread GitBox
KranthiGV commented on a change in pull request #317: URL: https://github.com/apache/tika/pull/317#discussion_r434687071 ## File path: tika-parsers/src/main/java/org/apache/tika/parser/csv/TextAndCSVParser.java ## @@ -306,7 +306,6 @@ private CSVParams getOverride(Metadata meta

[GitHub] [tika] pszemus opened a new pull request #320: tika-mimetypes: Add mimetypes for .mpd, .m3u8 and .m4s

2020-06-10 Thread GitBox
pszemus opened a new pull request #320: URL: https://github.com/apache/tika/pull/320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [tika] deathy opened a new pull request #321: fix for TIKA-3008 contributed by deathy

2020-06-14 Thread GitBox
deathy opened a new pull request #321: URL: https://github.com/apache/tika/pull/321 adds handling of superscript/subscript in Word parsers as described in TIKA-3008 This is an automated message from the Apache Git Service. T

[GitHub] [tika] matthewford opened a new pull request #322: Update PDFParser.properties

2020-06-16 Thread GitBox
matthewford opened a new pull request #322: URL: https://github.com/apache/tika/pull/322 The auto option exists but is not documented This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [tika] tballison merged pull request #322: Update PDFParser.properties

2020-06-16 Thread GitBox
tballison merged pull request #322: URL: https://github.com/apache/tika/pull/322 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison merged pull request #278: TIKA-2830 add heif mimetype support

2020-06-16 Thread GitBox
tballison merged pull request #278: URL: https://github.com/apache/tika/pull/278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison commented on pull request #278: TIKA-2830 add heif mimetype support

2020-06-16 Thread GitBox
tballison commented on pull request #278: URL: https://github.com/apache/tika/pull/278#issuecomment-644886130 @makepanic I'm sorry this took forever. We had to do some unpleasant shimming to upgrade drewnoakes' metadata extractor. We've done this now, and this _should_ just work now. TH

[GitHub] [tika] tballison merged pull request #320: tika-mimetypes: Add MIME types for .mpd, .m3u8 and .m4s

2020-06-16 Thread GitBox
tballison merged pull request #320: URL: https://github.com/apache/tika/pull/320 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison merged pull request #276: Disable external DTD + Stylesheets with the TransformerFactory

2020-06-16 Thread GitBox
tballison merged pull request #276: URL: https://github.com/apache/tika/pull/276 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison merged pull request #272: TIKA-2888 Add wmv2 codec detection for WMV files

2020-06-16 Thread GitBox
tballison merged pull request #272: URL: https://github.com/apache/tika/pull/272 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] nddipiazza opened a new pull request #323: writeLimit and maxEmbeddedResources for recursive parsing - add header

2020-07-03 Thread GitBox
nddipiazza opened a new pull request #323: URL: https://github.com/apache/tika/pull/323 parameters so that this can be customized. This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [tika] nddipiazza commented on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-04 Thread GitBox
nddipiazza commented on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653789371 @lewismc @tballison What do you think about swagger? I want to take what Lewis did here and introduce swagger-annotations + swagger-jaxrs. This would remove the need for the o

[GitHub] [tika] nddipiazza edited a comment on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-04 Thread GitBox
nddipiazza edited a comment on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653789371 @lewismc @tballison What do you think about swagger? I want to take what Lewis did here and introduce swagger-annotations + swagger-jaxrs. This would remove the need fo

[GitHub] [tika] nddipiazza edited a comment on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-04 Thread GitBox
nddipiazza edited a comment on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653789371 @lewismc @tballison What do you think about swagger? I want to take what Lewis did here and introduce swagger-annotations + swagger-jaxrs. This would remove the need fo

[GitHub] [tika] nddipiazza edited a comment on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-04 Thread GitBox
nddipiazza edited a comment on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653789371 @lewismc @tballison What do you think about swagger? I want to take what Lewis did here and introduce swagger-annotations + swagger-jaxrs. This would remove the need fo

[GitHub] [tika] nddipiazza edited a comment on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-04 Thread GitBox
nddipiazza edited a comment on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653789371 @lewismc @tballison What do you think about swagger? I want to take what Lewis did here and put the documentation within swagger-annotations + swagger-jaxrs. This would

[GitHub] [tika] lewismc commented on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-04 Thread GitBox
lewismc commented on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653809226 Hi Nicholas, this work is nearly completed. We will update within the week. We can review then... thank you for your interest. On Sat, Jul 4, 2020 at 10:04 Nicholas DiPia

[GitHub] [tika] nddipiazza commented on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-05 Thread GitBox
nddipiazza commented on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653914555 @lewismc cool! do you mean the openapi yaml work you have in this PR? or do you mean swagger implementation? Thi

[GitHub] [tika] lewismc commented on pull request #315: TIKA-3082 OpenAPI for tika-server

2020-07-05 Thread GitBox
lewismc commented on pull request #315: URL: https://github.com/apache/tika/pull/315#issuecomment-653934091 Both the OpenAPI and the implementation. We will be delivering the jaxrs generated project with the existing tika server implementation ported over. On Sun, Jul 5, 2020 at

[GitHub] [tika] nddipiazza commented on pull request #323: Address TIKA-3126 - add headers to control writeLimit and maxEmbeddedResources for recursive parsing

2020-07-08 Thread GitBox
nddipiazza commented on pull request #323: URL: https://github.com/apache/tika/pull/323#issuecomment-655851164 @tballison just dropping you a ping to see if you get a chance to review this one. This is an automated message

[GitHub] [tika] michaelwda opened a new pull request #324: fix for TIKA-1570 contributed by michaelwda

2020-07-10 Thread GitBox
michaelwda opened a new pull request #324: URL: https://github.com/apache/tika/pull/324 See https://issues.apache.org/jira/browse/TIKA-1570 Add a stop method that will shutdown the watchdog process and terminate the JVM. This is useful for Apache Commons Daemon, allowing a user to de

[GitHub] [tika] clarkperkins opened a new pull request #325: TIKA-3131 -- swap default values of averageCharTolerance and spacingT…

2020-07-10 Thread GitBox
clarkperkins opened a new pull request #325: URL: https://github.com/apache/tika/pull/325 …olerance to match PDFBox defaults This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [tika] nddipiazza closed pull request #323: Address TIKA-3126 - add headers to control writeLimit and maxEmbeddedResources for recursive parsing

2020-07-14 Thread GitBox
nddipiazza closed pull request #323: URL: https://github.com/apache/tika/pull/323 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [tika] nddipiazza commented on pull request #323: Address TIKA-3126 - add headers to control writeLimit and maxEmbeddedResources for recursive parsing

2020-07-14 Thread GitBox
nddipiazza commented on pull request #323: URL: https://github.com/apache/tika/pull/323#issuecomment-658233475 closing - re-opening in a new jira specifically for adding these two headers TIKA-3133 This is an automated messa

[GitHub] [tika] nddipiazza opened a new pull request #326: TIKA-3133 - writeLimit and maxEmbeddedResources for recursive parsing - add header

2020-07-14 Thread GitBox
nddipiazza opened a new pull request #326: URL: https://github.com/apache/tika/pull/326 see https://issues.apache.org/jira/browse/TIKA-3133 and https://issues.apache.org/jira/browse/TIKA-3126 this will add new parameters to `rmeta` rest endpoint `writeLimit` - max number of

[GitHub] [tika] nddipiazza commented on pull request #326: TIKA-3133 - writeLimit and maxEmbeddedResources for recursive parsing - add header

2020-07-14 Thread GitBox
nddipiazza commented on pull request #326: URL: https://github.com/apache/tika/pull/326#issuecomment-658236941 @tballison i think this can be merged now. I disassociated it with TIKA-3126 so that this PR can be 100% focused on not hard coding those values. ---

[GitHub] [tika] tothd91 opened a new pull request #327: fix for TIKA-3134 contributed by tothd

2020-07-15 Thread GitBox
tothd91 opened a new pull request #327: URL: https://github.com/apache/tika/pull/327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [tika] tballison merged pull request #326: TIKA-3133 - writeLimit and maxEmbeddedResources for recursive parsing - add header

2020-07-15 Thread GitBox
tballison merged pull request #326: URL: https://github.com/apache/tika/pull/326 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison commented on pull request #327: fix for TIKA-3134 contributed by tothd

2020-07-15 Thread GitBox
tballison commented on pull request #327: URL: https://github.com/apache/tika/pull/327#issuecomment-658905767 @tothd91 thank you for opening this! It looks like there are quite a few changes that are white-space only. Would it be possible to update so that the diff includes only logic di

[GitHub] [tika] tballison commented on pull request #325: TIKA-3131 -- swap default values of averageCharTolerance and spacingT…

2020-07-15 Thread GitBox
tballison commented on pull request #325: URL: https://github.com/apache/tika/pull/325#issuecomment-658951531 Thank you @clarkperkins ! This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [tika] tballison merged pull request #325: TIKA-3131 -- swap default values of averageCharTolerance and spacingT…

2020-07-15 Thread GitBox
tballison merged pull request #325: URL: https://github.com/apache/tika/pull/325 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] jendabenda opened a new pull request #328: fix for TIKA-3139 contributed by wiwi

2020-07-16 Thread GitBox
jendabenda opened a new pull request #328: URL: https://github.com/apache/tika/pull/328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [tika] tballison merged pull request #328: fix for TIKA-3139 contributed by wiwi

2020-07-16 Thread GitBox
tballison merged pull request #328: URL: https://github.com/apache/tika/pull/328 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison opened a new pull request #329: TIKA-3140

2020-07-16 Thread GitBox
tballison opened a new pull request #329: URL: https://github.com/apache/tika/pull/329 This should work once TIKA-3137 is merged This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [tika] tballison merged pull request #329: TIKA-3140

2020-07-17 Thread GitBox
tballison merged pull request #329: URL: https://github.com/apache/tika/pull/329 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] PeterAlfredLee opened a new pull request #330: Update urls of some bin files

2020-07-17 Thread GitBox
PeterAlfredLee opened a new pull request #330: URL: https://github.com/apache/tika/pull/330 Some bin files' url was updated. This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [tika] PeterAlfredLee opened a new pull request #331: Fix some test error when jvm's default language is not en

2020-07-17 Thread GitBox
PeterAlfredLee opened a new pull request #331: URL: https://github.com/apache/tika/pull/331 Some test's assert expect language is english(e.g. org.apache.tika.parser.sas.SAS7BDAParserTest), these test will fail when jvm's default language is not en. This is a fix to set jvm's default la

[GitHub] [tika] PeterAlfredLee opened a new pull request #332: Fix can't del tmp file in windows

2020-07-17 Thread GitBox
PeterAlfredLee opened a new pull request #332: URL: https://github.com/apache/tika/pull/332 TestCase `org.apache.tika.image.HeifParserTest.testSimple` failed in windows because `TemporaryResources.close()` sometimes fail to delete tmp file. We can make it `deleteOnExit` as it's only

[GitHub] [tika] PeterAlfredLee opened a new pull request #333: Adds github action CI builds on Ubuntu

2020-07-17 Thread GitBox
PeterAlfredLee opened a new pull request #333: URL: https://github.com/apache/tika/pull/333 Adds github action CI builds on Ubuntu This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [tika] THausherr commented on pull request #332: Fix can't del tmp file in windows

2020-07-18 Thread GitBox
THausherr commented on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-660511155 Isn't this moot? https://issues.apache.org/jira/browse/TIKA-3135 This is an automated message from the Apache Gi

[GitHub] [tika] tothd91 commented on pull request #327: fix for TIKA-3134 contributed by tothd

2020-07-20 Thread GitBox
tothd91 commented on pull request #327: URL: https://github.com/apache/tika/pull/327#issuecomment-660851533 Hello @tballison i did it. I hope it's ok now. This is an automated message from the Apache Git Service. To respond

[GitHub] [tika] PeterAlfredLee opened a new pull request #334: Tika-3141 : add empty environment variable handle

2020-07-30 Thread GitBox
PeterAlfredLee opened a new pull request #334: URL: https://github.com/apache/tika/pull/334 Trying to fix Tika-3141 with a empty string check in `TikaConfig` This is an automated message from the Apache Git Service. To respon

[GitHub] [tika] PeterAlfredLee commented on pull request #332: Fix can't del tmp file in windows

2020-07-30 Thread GitBox
PeterAlfredLee commented on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-666331912 Hi @THausherr , sorry for the late reply. I think the fix in [TIKA-3135](https://issues.apache.org/jira/browse/TIKA-3135) is trying to avoid occupying the file, therefore w

[GitHub] [tika] keithrbennett commented on a change in pull request #334: Tika-3141 : add empty environment variable handle

2020-07-30 Thread GitBox
keithrbennett commented on a change in pull request #334: URL: https://github.com/apache/tika/pull/334#discussion_r463103079 ## File path: tika-core/src/main/java/org/apache/tika/config/TikaConfig.java ## @@ -249,11 +249,11 @@ public TikaConfig(ClassLoader loader) public T

[GitHub] [tika] THausherr commented on pull request #332: Fix can't del tmp file in windows

2020-07-30 Thread GitBox
THausherr commented on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-04910 I agree that it shouldn't stop the process. Suggestion: output a log message, because the cause is usually a programming oversight, so that it can be reported and fixed.

[GitHub] [tika] THausherr edited a comment on pull request #332: Fix can't del tmp file in windows

2020-07-30 Thread GitBox
THausherr edited a comment on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-04910 I agree that it shouldn't stop the process. Suggestion: also output a log message, because the cause is usually a programming oversight, so that it can be reported and fixed

[GitHub] [tika] PeterAlfredLee commented on a change in pull request #334: Tika-3141 : add empty environment variable handle

2020-07-31 Thread GitBox
PeterAlfredLee commented on a change in pull request #334: URL: https://github.com/apache/tika/pull/334#discussion_r463905876 ## File path: tika-core/src/main/java/org/apache/tika/config/TikaConfig.java ## @@ -249,11 +249,11 @@ public TikaConfig(ClassLoader loader) public

[GitHub] [tika] PeterAlfredLee commented on pull request #332: Fix can't del tmp file in windows

2020-07-31 Thread GitBox
PeterAlfredLee commented on pull request #332: URL: https://github.com/apache/tika/pull/332#issuecomment-667450631 > Suggestion: also output a log message, because the cause is usually a programming oversight, so that it can be reported and fixed. Just pushed the logging part. :) -

[GitHub] [tika] PeterAlfredLee commented on a change in pull request #334: Tika-3141 : add empty environment variable handle

2020-07-31 Thread GitBox
PeterAlfredLee commented on a change in pull request #334: URL: https://github.com/apache/tika/pull/334#discussion_r463906635 ## File path: tika-core/src/main/java/org/apache/tika/config/TikaConfig.java ## @@ -249,11 +249,11 @@ public TikaConfig(ClassLoader loader) public

[GitHub] [tika] keithrbennett commented on a change in pull request #334: Tika-3141 : add empty environment variable handle

2020-08-01 Thread GitBox
keithrbennett commented on a change in pull request #334: URL: https://github.com/apache/tika/pull/334#discussion_r463971330 ## File path: tika-core/src/main/java/org/apache/tika/config/TikaConfig.java ## @@ -249,11 +249,11 @@ public TikaConfig(ClassLoader loader) public T

[GitHub] [tika] keithrbennett commented on a change in pull request #334: Tika-3141 : add empty environment variable handle

2020-08-01 Thread GitBox
keithrbennett commented on a change in pull request #334: URL: https://github.com/apache/tika/pull/334#discussion_r463971330 ## File path: tika-core/src/main/java/org/apache/tika/config/TikaConfig.java ## @@ -249,11 +249,11 @@ public TikaConfig(ClassLoader loader) public T

[GitHub] [tika] jendabenda opened a new pull request #335: TIKA-307 truncated zip

2020-08-01 Thread GitBox
jendabenda opened a new pull request #335: URL: https://github.com/apache/tika/pull/335 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [tika] PeterAlfredLee commented on a change in pull request #334: Tika-3141 : add empty environment variable handle

2020-08-02 Thread GitBox
PeterAlfredLee commented on a change in pull request #334: URL: https://github.com/apache/tika/pull/334#discussion_r464218911 ## File path: tika-core/src/main/java/org/apache/tika/config/TikaConfig.java ## @@ -249,11 +249,11 @@ public TikaConfig(ClassLoader loader) public

[GitHub] [tika] PeterAlfredLee opened a new pull request #336: Add more judge between Charset Windows-1252 and ISO-8859-1(5)

2020-08-05 Thread GitBox
PeterAlfredLee opened a new pull request #336: URL: https://github.com/apache/tika/pull/336 According to these web pages: [Windows-1252 Chraracter list](https://www.fileformat.info/info/charset/windows-1252/list.htm) , [ISO-8859-1 Chraracter list](http://www.fileformat.info/info/charset/I

[GitHub] [tika] JoaoGFarias opened a new pull request #337: Fixing typo on SAX Parser exception

2020-08-11 Thread GitBox
JoaoGFarias opened a new pull request #337: URL: https://github.com/apache/tika/pull/337 prooblem => problem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [tika] kkrugler merged pull request #337: Fixing typo on SAX Parser exception

2020-08-11 Thread GitBox
kkrugler merged pull request #337: URL: https://github.com/apache/tika/pull/337 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] kkrugler commented on pull request #337: Fixing typo on SAX Parser exception

2020-08-11 Thread GitBox
kkrugler commented on pull request #337: URL: https://github.com/apache/tika/pull/337#issuecomment-671978803 Thanks João! This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [tika] PeterAlfredLee opened a new pull request #338: Tika-2421 : About the encoding of HTML

2020-08-13 Thread GitBox
PeterAlfredLee opened a new pull request #338: URL: https://github.com/apache/tika/pull/338 Seems we can use `charsetdetector.StandardHtmlEncodingDetector` for charset detecting of HTML. I'm wondering why we are not using it? And I stopped treating ISO-8859-1 as Windows-1252. -

[GitHub] [tika] tballison commented on pull request #338: Tika-2421 : About the encoding of HTML

2020-08-13 Thread GitBox
tballison commented on pull request #338: URL: https://github.com/apache/tika/pull/338#issuecomment-673508020 Inertia... I never got around to doing a bakeoff between the two, and, unless there's evidence of improvement, I'm hesitant to make the change as the default detector. --

[GitHub] [tika] PeterAlfredLee commented on pull request #338: Tika-2421 : About the encoding of HTML

2020-08-13 Thread GitBox
PeterAlfredLee commented on pull request #338: URL: https://github.com/apache/tika/pull/338#issuecomment-673823280 Like [TIKA-2421](https://issues.apache.org/jira/browse/TIKA-2421) says , according to [w3 description](https://www.w3.org/International/questions/qa-html-encoding-declaration

[GitHub] [tika] PeterAlfredLee opened a new pull request #339: TIKA-2001: Add TextAndAttributeContentHandler and TextAndAttributeXMLParser

2020-08-17 Thread GitBox
PeterAlfredLee opened a new pull request #339: URL: https://github.com/apache/tika/pull/339 [TIKA-2001](https://issues.apache.org/jira/browse/TIKA-2001) requires a XML parser which can output text and attributes. This PR would like to provide it. User can config `TextAndAttributeXMLPa

[GitHub] [tika] PeterAlfredLee opened a new pull request #340: Modify some arg parse in TikaCLI

2020-08-20 Thread GitBox
PeterAlfredLee opened a new pull request #340: URL: https://github.com/apache/tika/pull/340 1. fix parse arg "--client=" 2. make the way of parse arg "--compare-file-magic" same as others This is an automated message from

[GitHub] [tika] tballison opened a new pull request #341: Branch 2x

2020-08-21 Thread GitBox
tballison opened a new pull request #341: URL: https://github.com/apache/tika/pull/341 TIKA-3166 Tika 2.0.0 that builds at least locally... This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [tika] tballison merged pull request #341: Branch 2x

2020-08-21 Thread GitBox
tballison merged pull request #341: URL: https://github.com/apache/tika/pull/341 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] PeterAlfredLee opened a new pull request #342: Modify TikaCLITest

2020-08-21 Thread GitBox
PeterAlfredLee opened a new pull request #342: URL: https://github.com/apache/tika/pull/342 1.remove unnecessary import 2.reset outContent and errContent if they are not empty,prevent previous TikaCLI.main run output left. 3.add two test case 4.modify previous test case,use method

[GitHub] [tika] PeterAlfredLee opened a new pull request #343: Fix infinite loop in tika-parser-nlp-module

2020-08-24 Thread GitBox
PeterAlfredLee opened a new pull request #343: URL: https://github.com/apache/tika/pull/343 CTAKESParser should not load via the parser service loader because it will cause an infinite loop. If `org.apache.tika.parser.ctakes.CTAKESParser` in file `org.apache.tika.parser.Parser`:

[GitHub] [tika] asfgit merged pull request #343: Fix infinite loop in tika-parser-nlp-module

2020-08-24 Thread GitBox
asfgit merged pull request #343: URL: https://github.com/apache/tika/pull/343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [tika] bobpaulin opened a new pull request #344: TIKA-3178: Adding OSGi headers to tika modules

2020-08-24 Thread GitBox
bobpaulin opened a new pull request #344: URL: https://github.com/apache/tika/pull/344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [tika] PeterAlfredLee opened a new pull request #346: Refactor fillSet in OPCPackageDetector

2020-08-26 Thread GitBox
PeterAlfredLee opened a new pull request #346: URL: https://github.com/apache/tika/pull/346 Method `fillSet` in `OPCPackageDetector` can be more simple. This is an automated message from the Apache Git Service. To respond to

[GitHub] [tika] PeterAlfredLee opened a new pull request #345: Repalce const in OPCPackageDetector

2020-08-26 Thread GitBox
PeterAlfredLee opened a new pull request #345: URL: https://github.com/apache/tika/pull/345 Use const which in class `PackageRelationshipTypes`. Just like the `TODO` says. This is an automated message from the Apache Git S

[GitHub] [tika] PeterAlfredLee opened a new pull request #347: Fix key typo in BatchProcessBuilder

2020-08-26 Thread GitBox
PeterAlfredLee opened a new pull request #347: URL: https://github.com/apache/tika/pull/347 We want to know if string "reporter" it's keyNodes's key instead of if null it's keyNodes's key. So I think this a typo and this PR is a fix for it. -

[GitHub] [tika] PeterAlfredLee opened a new pull request #348: Refactor some code use method Math.max

2020-08-29 Thread GitBox
PeterAlfredLee opened a new pull request #348: URL: https://github.com/apache/tika/pull/348 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [tika] kkrugler commented on a change in pull request #348: Refactor some code use method Math.max

2020-08-29 Thread GitBox
kkrugler commented on a change in pull request #348: URL: https://github.com/apache/tika/pull/348#discussion_r479656172 ## File path: tika-core/src/main/java/org/apache/tika/io/CountingInputStream.java ## @@ -56,7 +56,7 @@ public CountingInputStream(InputStream in) { @Over

[GitHub] [tika] PeterAlfredLee commented on a change in pull request #348: Refactor some code use method Math.max

2020-08-30 Thread GitBox
PeterAlfredLee commented on a change in pull request #348: URL: https://github.com/apache/tika/pull/348#discussion_r479852298 ## File path: tika-core/src/main/java/org/apache/tika/io/CountingInputStream.java ## @@ -56,7 +56,7 @@ public CountingInputStream(InputStream in) {

[GitHub] [tika] kkrugler merged pull request #348: Refactor some code use method Math.max

2020-08-30 Thread GitBox
kkrugler merged pull request #348: URL: https://github.com/apache/tika/pull/348 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] kkrugler commented on pull request #348: Refactor some code use method Math.max

2020-08-30 Thread GitBox
kkrugler commented on pull request #348: URL: https://github.com/apache/tika/pull/348#issuecomment-683516522 Thanks Peter! This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [tika] tballison opened a new pull request #349: TIKA-3179

2020-09-01 Thread GitBox
tballison opened a new pull request #349: URL: https://github.com/apache/tika/pull/349 Refactor parser modules for three classes of parsers: basic, extended, advanced This is an automated message from the Apache Git Service.

[GitHub] [tika] tballison merged pull request #349: TIKA-3179

2020-09-01 Thread GitBox
tballison merged pull request #349: URL: https://github.com/apache/tika/pull/349 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika-docker] schmitch opened a new pull request #1: changed entrypoint to exec format

2020-09-01 Thread GitBox
schmitch opened a new pull request #1: URL: https://github.com/apache/tika-docker/pull/1 this also enables people to specify additional params, which is not possible in the current form: https://docs.docker.com/engine/reference/builder/#entrypoint > The shell form prevents any CMD

[GitHub] [tika] PeterAlfredLee opened a new pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-02 Thread GitBox
PeterAlfredLee opened a new pull request #350: URL: https://github.com/apache/tika/pull/350 Hi all, I noticed that the main branch succeed on excuting command `mvn clean install` now, but failed in `IDEA` . After some debugging, I found what's wrong: 1. Component tika-parsers and

[GitHub] [tika] PeterAlfredLee commented on pull request #342: Modify TikaCLITest

2020-09-03 Thread GitBox
PeterAlfredLee commented on pull request #342: URL: https://github.com/apache/tika/pull/342#issuecomment-686311831 Update : 5.simplify some code in method testExtract, testExtractTgz, testExtractInlineImages. This is an aut

[GitHub] [tika] tballison commented on pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-03 Thread GitBox
tballison commented on pull request #350: URL: https://github.com/apache/tika/pull/350#issuecomment-686565825 This is a great catch. Thank you! I'm really frustrated that I didn't catch this locally and also that the ci didn't catch it because of the snapshot repo. I had in

[GitHub] [tika] tballison commented on pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-03 Thread GitBox
tballison commented on pull request #350: URL: https://github.com/apache/tika/pull/350#issuecomment-686602942 I just pushed a commit that should fix this. I was able to get a clean build after deleting my local tika repo and running the build offline. I use Intellij, too, and I have

[GitHub] [tika] tballison edited a comment on pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-03 Thread GitBox
tballison edited a comment on pull request #350: URL: https://github.com/apache/tika/pull/350#issuecomment-686602942 I just pushed a commit that should fix this. I was able to get a clean build after deleting my local tika repo and running the build offline. I use Intellij, too, and

[GitHub] [tika] tballison merged pull request #342: Modify TikaCLITest

2020-09-03 Thread GitBox
tballison merged pull request #342: URL: https://github.com/apache/tika/pull/342 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] tballison commented on pull request #338: Tika-2421 : About the encoding of HTML

2020-09-03 Thread GitBox
tballison commented on pull request #338: URL: https://github.com/apache/tika/pull/338#issuecomment-686610377 Wait, it turns out I did get around to doing this study... https://github.com/tballison/share/blob/main/slides/Tika_charset_detector_study_201909.docx Let me read it a

[GitHub] [tika] PeterAlfredLee commented on pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-03 Thread GitBox
PeterAlfredLee commented on pull request #350: URL: https://github.com/apache/tika/pull/350#issuecomment-686860503 Hi @tballison > I use Intellij, too, and I have had to run find . -name *.iml -exec rm -rf {} \; && rm -r .idea/ a number of times during development of Tika 2.0 becaus

[GitHub] [tika] tballison commented on pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-04 Thread GitBox
tballison commented on pull request #350: URL: https://github.com/apache/tika/pull/350#issuecomment-687141437 Oooo...fun... thank you! Is the build working for you now? This is an automated message from the Apache Git

[GitHub] [tika] PeterAlfredLee closed pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-04 Thread GitBox
PeterAlfredLee closed pull request #350: URL: https://github.com/apache/tika/pull/350 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [tika] PeterAlfredLee commented on pull request #350: Fix java build errors caused by wrong depenency settings

2020-09-04 Thread GitBox
PeterAlfredLee commented on pull request #350: URL: https://github.com/apache/tika/pull/350#issuecomment-687511400 Yes, It works good ! Thank you ! This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [tika] PeterAlfredLee opened a new pull request #351: Modify FSBatchTestBase

2020-09-06 Thread GitBox
PeterAlfredLee opened a new pull request #351: URL: https://github.com/apache/tika/pull/351 1.modify method tearDown: If delete output directory root fail,try delete on exit. 2.simplify some code use collection.addAll 3.simplify some code use "for" 4.remove unnecessary import ---

[GitHub] [tika] PeterAlfredLee opened a new pull request #352: Modify TabularFormatsTest

2020-09-08 Thread GitBox
PeterAlfredLee opened a new pull request #352: URL: https://github.com/apache/tika/pull/352 As `TODO` says , should check the % format again after https://github.com/epam/parso/issues/28 fixed. Modify `table` use regular expression matching because `testXLS`, `testXLSX` and `testXLS

[GitHub] [tika] tballison merged pull request #344: TIKA-3178: Adding OSGi headers to tika modules

2020-09-08 Thread GitBox
tballison merged pull request #344: URL: https://github.com/apache/tika/pull/344 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [tika] PeterAlfredLee opened a new pull request #353: Fix test fail in SAS7BDATParserTest caused by default language is not english

2020-09-08 Thread GitBox
PeterAlfredLee opened a new pull request #353: URL: https://github.com/apache/tika/pull/353 Use short months in default language to test. This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [tika] PeterAlfredLee commented on pull request #353: Fix test fail in SAS7BDATParserTest caused by default language is not english

2020-09-08 Thread GitBox
PeterAlfredLee commented on pull request #353: URL: https://github.com/apache/tika/pull/353#issuecomment-689243546 This is another fix implemention like PR [#331](https://github.com/apache/tika/pull/331), so that we do not need to modify the jvm args like #331 did. Only eithor one o

[GitHub] [tika] PeterAlfredLee opened a new pull request #354: Remove dependency tika-parsers test-jar in tika-server and tika-examples

2020-09-08 Thread GitBox
PeterAlfredLee opened a new pull request #354: URL: https://github.com/apache/tika/pull/354 The `tika-parsers` is not generating `test-jar` with the recent commit `a504d7e`, but `tika-server` and `tika-examples` are still using it as dependency. This would lead to the building failure, and

[GitHub] [tika] tballison merged pull request #354: Remove dependency tika-parsers test-jar in tika-server and tika-examples

2020-09-09 Thread GitBox
tballison merged pull request #354: URL: https://github.com/apache/tika/pull/354 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

  1   2   3   4   5   6   7   8   9   10   >