Re: [DISCUSS] Moving to Git

2015-11-18 Thread Hong-Thai Nguyen
Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++ > > > > -- --- Hong-Thai NGUYEN Tel.: 06 27 04 86 22

RE: [VOTE] Apache Tika 1.10 Release Candidate #1

2015-08-05 Thread Hong-Thai Nguyen
+1 for me Build on Windows, tested with an internal corpus. There's no regression. Even more, we earned some more ppt documents converted comparing with 1.9 Great job David and others ! Thank Hong-Thai -Message d'origine- De : Tyler Palsulich [mailto:tpalsul...@gmail.com] Envoyé : mar

Re: [VOTE] Release Apache Tika 1.9 Candidate #2

2015-06-09 Thread Hong-Thai Nguyen
hief Architect > > Instrument Software and Science Data Systems Section (398) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 168-519, Mailstop: 168-527 > > Email: chris.a.mattm...@nasa.gov > > WWW: http://sunset.usc.edu/~mattmann/ > > ++ > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > ++ > > > > > > > -- --- Hong-Thai NGUYEN Tel.: 06 27 04 86 22

Re: Java 1.6 support for Tika 1.9?

2015-04-29 Thread Hong-Thai Nguyen
ycastle.org/latest_releases.html). > > -- > Regards, > Konstantin Gribov > > ср, 29 апр. 2015 г. в 16:43, Hong-Thai Nguyen >: > > > Hi forks, > > > > I'm +1 for announcement of ending support JDK1.6 on next 1.9. > > > > FYI, we are havin

RE: Java 1.6 support for Tika 1.9?

2015-04-29 Thread Hong-Thai Nguyen
Hi forks, I'm +1 for announcement of ending support JDK1.6 on next 1.9. FYI, we are having still some legacy dependencies dedicated only on JDK 1.5 (*jdk15*): $ mvn dependency:tree [INFO] Scanning for projects... [INFO] [INFO]

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Hong-Thai Nguyen
Hi, +1 for me. Great work, Tyler ! Hong-Thai -Message d'origine- De : Tyler Palsulich [mailto:tpalsul...@apache.org] Envoyé : lundi 13 avril 2015 19:56 À : dev@tika.apache.org; u...@tika.apache.org Objet : [VOTE] Apache Tika 1.8 Release Candidate #2 Hi Folks, A candidate for the Tika

[jira] [Commented] (TIKA-1600) Unable to parse ODT files because of failed to close temporary resources

2015-04-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492084#comment-14492084 ] Hong-Thai Nguyen commented on TIKA-1600: The root exception is an NPE when par

RE: [VOTE] Release Apache Tika 1.8 Candidate #1

2015-04-13 Thread Hong-Thai Nguyen
Not yet, I'm investigating more on TIKA-1600 today. Hong-Thai -Message d'origine- De : Allison, Timothy B. [mailto:talli...@mitre.org] Envoyé : lundi 13 avril 2015 01:07 À : dev@tika.apache.org Objet : RE: [VOTE] Release Apache Tika 1.8 Candidate #1 I don't think we've solved TIKA-1600,

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-30 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386900#comment-14386900 ] Hong-Thai Nguyen commented on TIKA-1581: And great thank to [~kkrugler] with

Re: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-29 Thread Hong-Thai Nguyen
+1 for 1.8 Hong-Thai > On 28 Mar 2015, at 16:01, Tyler Palsulich wrote: > > Hi Folks, > > Now that TIKA-1581 (JHighlight licensing issues) is resolved, we need to > release a new version of Tika. I'll volunteer to be the release manager > again. > > Should we release this as 1.8 or 1.7.1? >

[jira] [Resolved] (TIKA-1581) jhighlight license concerns

2015-03-27 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1581. Resolution: Fixed > jhighlight license conce

[jira] [Updated] (TIKA-1581) jhighlight license concerns

2015-03-27 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1581: --- Fix Version/s: 1.8 > jhighlight license conce

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-27 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14383827#comment-14383827 ] Hong-Thai Nguyen commented on TIKA-1581: On r1669583, I switched to la

[jira] [Comment Edited] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371432#comment-14371432 ] Hong-Thai Nguyen edited comment on TIKA-1581 at 3/20/15 3:3

[jira] [Comment Edited] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371432#comment-14371432 ] Hong-Thai Nguyen edited comment on TIKA-1581 at 3/20/15 3:1

[jira] [Commented] (TIKA-1581) jhighlight license concerns

2015-03-20 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371432#comment-14371432 ] Hong-Thai Nguyen commented on TIKA-1581: I've contacted also 'g

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Hong-Thai Nguyen
I've checked again some regression tests. Seem fine for me too. So +1 Great job Tyler ! On Fri, Jan 9, 2015 at 11:02 PM, Tyler Palsulich wrote: > Hi All, > > A candidate for the Tika 1.7 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a z

Re: [VOTE] Apache Tika 1.7 Release

2015-01-08 Thread Hong-Thai Nguyen
Seems fine for me: +1 No big regression on our corpus test of 23K docs: 15-01-07 18:19:27 INFO (DocumentConversionErrorPlugin.java : 116) [pool-3-thread-1] Summary of document conversion errors: - pdf (4) * (2) org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apach

[jira] [Commented] (TIKA-1505) chmparser breaks down when extracting from file of CHM format v3

2015-01-05 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264786#comment-14264786 ] Hong-Thai Nguyen commented on TIKA-1505: Can you provide also problem files

[jira] [Resolved] (TIKA-672) Proper error handling in the CHM parser

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-672. --- Resolution: Fixed Check no more System.err/System.out inside CHM parser > Proper er

[jira] [Updated] (TIKA-672) Proper error handling in the CHM parser

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-672: -- Fix Version/s: 1.7 > Proper error handling in the CHM par

[jira] [Updated] (TIKA-1448) CHM parser : defect in file extraction

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1448: --- Fix Version/s: 1.7 > CHM parser : defect in file extract

[jira] [Updated] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1446: --- Fix Version/s: 1.7 > CHM parser : wrong decompression of aligned blo

[jira] [Updated] (TIKA-1430) CHM parser gets faulty text (fix found)

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1430: --- Fix Version/s: 1.7 > CHM parser gets faulty text (fix fo

[jira] [Resolved] (TIKA-1430) CHM parser gets faulty text (fix found)

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1430. Resolution: Fixed > CHM parser gets faulty text (fix fo

[jira] [Updated] (TIKA-1447) CHM parser: wrong directory list

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1447: --- Fix Version/s: 1.7 > CHM parser: wrong directory l

[jira] [Resolved] (TIKA-1448) CHM parser : defect in file extraction

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1448. Resolution: Fixed > CHM parser : defect in file extract

[jira] [Resolved] (TIKA-1447) CHM parser: wrong directory list

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1447. Resolution: Fixed > CHM parser: wrong directory l

[jira] [Resolved] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1446. Resolution: Fixed > CHM parser : wrong decompression of aligned blo

Re: svn commit: r1640017 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java

2014-11-17 Thread Hong-Thai Nguyen
Hi, I've pushed a minor fix to pass this test on Windows. Thanks, On Mon, Nov 17, 2014 at 4:28 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > +1, agreed, Dave would be nice to have one as a default. > > ++ >

Re: Move definitively from SVN to Git ?

2014-11-17 Thread Hong-Thai Nguyen
Yes, that's exactly I'm doing. If we move to Git, we'll avoid all SVN stuff. Anyway, this concerns commiters only. On Mon, Nov 17, 2014 at 12:08 PM, Nick Burch wrote: > On Mon, 17 Nov 2014, Hong-Thai Nguyen wrote: > >> I didn't realize that we could commit/pu

Re: Move definitively from SVN to Git ?

2014-11-17 Thread Hong-Thai Nguyen
I didn't realize that we could commit/push directly into git repo. Could we ? Cheers On Mon, Nov 17, 2014 at 11:46 AM, Nick Burch wrote: > On Mon, 17 Nov 2014, Hong-Thai Nguyen wrote: > >> Git is implemented everywhere and profit many new features. Should we >> abandon S

[jira] [Commented] (TIKA-1447) CHM parser: wrong directory list

2014-11-17 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214535#comment-14214535 ] Hong-Thai Nguyen commented on TIKA-1447: [~binhawking], The work on TIKA-

Move definitively from SVN to Git ?

2014-11-17 Thread Hong-Thai Nguyen
Hi all, Git is implemented everywhere and profit many new features. Should we abandon SVN repo and move to Git forever to facility apply fixes and contribution ? Thanks, -- Hong-Thai

[jira] [Comment Edited] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-12 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208079#comment-14208079 ] Hong-Thai Nguyen edited comment on TIKA-1446 at 11/12/14 2:3

[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-11-12 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208079#comment-14208079 ] Hong-Thai Nguyen commented on TIKA-1446: Hi [~binhawking], I've merge

[jira] [Commented] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196343#comment-14196343 ] Hong-Thai Nguyen commented on TIKA-1463: Thank [~lfcnassif], without

[jira] [Closed] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen closed TIKA-1463. -- Resolution: Fixed > TesseractOCRParser does not work in Wind

[jira] [Updated] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1463: --- Description: STR: * Case 1: ** Setting tesseractPath to a common installation path of

[jira] [Updated] (TIKA-1463) TesseractOCRParser does not work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1463: --- Summary: TesseractOCRParser does not work in Windows (was: TesseractOCRParser does work in

[jira] [Commented] (TIKA-1463) TesseractOCRParser does work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194694#comment-14194694 ] Hong-Thai Nguyen commented on TIKA-1463: Fixed in r1636382 > TesseractOC

[jira] [Created] (TIKA-1463) TesseractOCRParser does work in Windows

2014-11-03 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1463: -- Summary: TesseractOCRParser does work in Windows Key: TIKA-1463 URL: https://issues.apache.org/jira/browse/TIKA-1463 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-10-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181530#comment-14181530 ] Hong-Thai Nguyen commented on TIKA-1446: Thank alot [~binhawking], I've q

Re: svn commit: r1633325 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java test/java/org/apache/tika/parser/mail/RFC822ParserTest.java

2014-10-21 Thread Hong-Thai Nguyen
Hi Chris, Yes, I made a mistake on this commit by missing a renaming file and broke build, the next commit corrected: Revision: 161 Author: thaichat04 Date: mardi 21 octobre 2014 11:47:54 Message: TIKA-1422 - Fixing build & minor refactory of naming test class Modified : /tika/trunk/tika-

[jira] [Comment Edited] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178186#comment-14178186 ] Hong-Thai Nguyen edited comment on TIKA-1422 at 10/21/14 9:4

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-21 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178186#comment-14178186 ] Hong-Thai Nguyen commented on TIKA-1422: Applied latest fix on r1633325 with

Re: 1.7 release?

2014-10-16 Thread Hong-Thai Nguyen
Hi Andrzej, We are impatient for 1.7 release too. I'm having compiling problem of TIKA-1422 on me. If anyone can build successfully on Windows, I have no objection to release 1.7 Thanks, On Thu, Oct 16, 2014 at 10:51 AM, Andrzej Białecki wrote: > Hi, > > Any news on the 1.7 release? or at leas

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-16 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173537#comment-14173537 ] Hong-Thai Nguyen commented on TIKA-1422: I'm not using

[jira] [Commented] (TIKA-1176) ChmDirectoryListingSet does not correctly enumerate directory entries

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169146#comment-14169146 ] Hong-Thai Nguyen commented on TIKA-1176: Hi [~mdgeek], thank for your offe

[jira] [Commented] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169130#comment-14169130 ] Hong-Thai Nguyen commented on TIKA-1422: Strange, I'm unable to build cau

[jira] [Commented] (TIKA-1446) CHM parser : wrong decompression of aligned blocks

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169098#comment-14169098 ] Hong-Thai Nguyen commented on TIKA-1446: Thank [~binhawking], Any change you

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-10-13 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169090#comment-14169090 ] Hong-Thai Nguyen commented on TIKA-1445: Interesting question ! For me, pars

[jira] [Commented] (TIKA-1428) Microsoft Word 97 - 2003 (.doc) footnote references are Unicode Replacement Character

2014-09-25 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147880#comment-14147880 ] Hong-Thai Nguyen commented on TIKA-1428: Thanks [~theoettheo], any chance to

RE: NPE on all *.odt, odp, .ods documents

2014-09-22 Thread Hong-Thai Nguyen
odp, .ods documents > From: Hong-Thai Nguyen > Sent: September 11, 2014 1:40:08pm PDT > To: dev@tika.apache.org > Subject: Re: NPE on all *.odt, odp, .ods documents > > I was wrong when saying that All OpenDocument are failed, some files > passed, but alot of them failed

[jira] [Commented] (TIKA-1412) NPE in OpenDocumentParser

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143043#comment-14143043 ] Hong-Thai Nguyen commented on TIKA-1412: Add a test at r1626706 >

[jira] [Updated] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1421: --- Priority: Blocker (was: Major) > Tika-Parsers tests fail on CentOS6 if tesseract is

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143041#comment-14143041 ] Hong-Thai Nguyen commented on TIKA-1421: Not only CentOS, this test failed als

Re: NPE on all *.odt, odp, .ods documents

2014-09-11 Thread Hong-Thai Nguyen
- pptx (10) - doc (6) - ppt (14) - xls (9) - dwg (4) - odp (2) - pps (2) On Thu, Sep 11, 2014 at 8:55 PM, Ken Krugler wrote: > > > From: Hong-Thai Nguyen > > Sent: September 11, 2014 5:21:41am PDT > > To: dev@tika.apache.org > > Subject: NPE on all *.odt, odp, .ods d

Re: NPE on all *.odt, odp, .ods documents

2014-09-11 Thread Hong-Thai Nguyen
pache.org" > Subject: RE: NPE on all *.odt, odp, .ods documents > > >Probably want to add TIKA-1411. > > > >Nick and all, anything else? > > > >-Original Message- > >From: Hong-Thai Nguyen [mailto:thaicha...@gmail.com] > >Sent: Thursday, S

Re: NPE on all *.odt, odp, .ods documents

2014-09-11 Thread Hong-Thai Nguyen
Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++ > > > > > > > -Original Message- > From: Hong-Thai Nguyen > Reply-To: "dev@tika.apache.org&quo

NPE on all *.odt, odp, .ods documents

2014-09-11 Thread Hong-Thai Nguyen
Hi all, I've tested the conversion Tika 1.6 with our corpus, all OpenOffice document types are failed with NPE. Fix has been done on https://issues.apache.org/jira/browse/TIKA-1412, but available from 1.7. That's a fatal error for me. Should we release a 1.6.1 with the fix of TIKA-1412 ? Tack tr

[jira] [Resolved] (TIKA-1413) OOXML thumbnail name added to body

2014-09-09 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1413. Resolution: Fixed > OOXML thumbnail name added to b

[jira] [Commented] (TIKA-1413) OOXML thumbnail name added to body

2014-09-09 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126949#comment-14126949 ] Hong-Thai Nguyen commented on TIKA-1413: I agree. Fixed in r1623819 and _id

Re: [VOTE] Release Apache Tika 1.6 RC #2

2014-09-01 Thread Hong-Thai Nguyen
27;s not > even referenced in the pom.xml and isn't done yet? > > How about we fix it in 1.7 but give this one a pass? > > > Cheers, > Chris > > -Original Message- > From: Hong-Thai Nguyen > Reply-To: "dev@tika.apache.org" > Date: Monday,

Re: [VOTE] Release Apache Tika 1.6 RC #2

2014-09-01 Thread Hong-Thai Nguyen
-1 for me because tika-dotnet/pom.xml refer to parent pom with a snapshot version. org.apache.tika tika-parent 1.6-SNAPSHOT ../tika-parent/pom.xml On Mon, Sep 1, 2014 at 7:16 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi Folks, > > A candidate for

Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage

2014-08-07 Thread Hong-Thai Nguyen
Nice idea. We could do more than samples. We can generate parser, detecter or translator maven archetype. A kind o templete so that user can have quickly project to develop new one. Regards, Hong-Thai > On 07 Aug 2014, at 18:56, Tyler Palsulich wrote: > > Hi All, > > I think we should add

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-29 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077885#comment-14077885 ] Hong-Thai Nguyen commented on TIKA-1373: Normally it's on next off

[jira] [Resolved] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1373. Resolution: Fixed > AutoDetectParser extracts no text when SourceCodeParser is selec

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073042#comment-14073042 ] Hong-Thai Nguyen commented on TIKA-1373: HtmlParser skips tags generate

[jira] [Comment Edited] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071643#comment-14071643 ] Hong-Thai Nguyen edited comment on TIKA-1373 at 7/23/14 1:4

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071713#comment-14071713 ] Hong-Thai Nguyen commented on TIKA-1373: Yes, I saw the trouble when implemen

[jira] [Commented] (TIKA-1373) AutoDetectParser extracts no text when SourceCodeParser is selected

2014-07-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071643#comment-14071643 ] Hong-Thai Nguyen commented on TIKA-1373: Can you format your description

[jira] [Updated] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1095: --- Labels: pdfbox (was: patch) > Only gibberish extracted from this

[jira] [Updated] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1095: --- Component/s: (was: general) parser > Only gibberish extracted from t

[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061867#comment-14061867 ] Hong-Thai Nguyen commented on TIKA-1095: Event with latest Tika can't con

[jira] [Commented] (TIKA-1332) Create "eval" code

2014-06-26 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044706#comment-14044706 ] Hong-Thai Nguyen commented on TIKA-1332: What you are describing is somet

Build failed

2014-06-24 Thread Hong-Thai Nguyen
Hi all, Sorry about last wrong mail. I'm unable to build latest snapshot on my Windows. Any idea ? Thanks Tests in error: initializationError(org.apache.tika.bundle.BundleIT): Problem starting test co ntainer. Tests run: 1, Failures: 0, Errors: 1, Skipped: 0

Subcribe

2014-06-24 Thread Hong-Thai Nguyen
-- -- Hong-Thai

[jira] [Commented] (TIKA-1350) OutlookPSTParser: Unknown message type: IPM.Note

2014-06-23 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040519#comment-14040519 ] Hong-Thai Nguyen commented on TIKA-1350: Richard Johnson (author of java-ps

[jira] [Commented] (TIKA-1320) extract text from jpeg in solr tika

2014-06-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017473#comment-14017473 ] Hong-Thai Nguyen commented on TIKA-1320: OCR is a solution: TIKA-93. Unfortuna

[jira] [Commented] (TIKA-1308) Support in memory parse mode(don't create temp file): to support run Tika in GAE

2014-05-26 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008704#comment-14008704 ] Hong-Thai Nguyen commented on TIKA-1308: A virtual FileSystem may be a solu

RE: [DISCUSS] Nightly Jenkins Builds for Trunk

2014-05-20 Thread Hong-Thai Nguyen
And for >=Java7, we need a profile to active building 'tika-java7' module. Hong-Thai -Message d'origine- De : Nick Burch [mailto:apa...@gagravarr.org] Envoyé : mercredi 14 mai 2014 18:30 À : dev@tika.apache.org Objet : Re: [DISCUSS] Nightly Jenkins Builds for Trunk On Wed, 14 May 2014,

[jira] [Resolved] (TIKA-1290) Upgrade to PDFBOX 1.8.5

2014-05-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1290. Resolution: Fixed r1592780 > Upgrade to PDFBOX 1.

[jira] [Updated] (TIKA-1290) Upgrade to PDFBOX 1.8.5

2014-05-06 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1290: --- Labels: trivial (was: ) > Upgrade to PDFBOX 1.

[jira] [Created] (TIKA-1290) Upgrade to PDFBOX 1.8.5

2014-05-02 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1290: -- Summary: Upgrade to PDFBOX 1.8.5 Key: TIKA-1290 URL: https://issues.apache.org/jira/browse/TIKA-1290 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1287) Update NetCDF .jar file on Maven Central

2014-05-02 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987521#comment-13987521 ] Hong-Thai Nguyen commented on TIKA-1287: Technically, not difficult to upload

[jira] [Commented] (TIKA-1283) Add "thumbnail" as possible metadata item to TikaCoreProperties

2014-04-28 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983434#comment-13983434 ] Hong-Thai Nguyen commented on TIKA-1283: +1 for me to create a thumbnail fiel

[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1279. Resolution: Fixed Thank [~rgauss] for this good catch. I fixed with more tests in r1589742

[jira] [Resolved] (TIKA-1276) Missing embedded dependencies in tika-bundle

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1276. Resolution: Fixed Thank [~rwesten], added your patch at r1589717 > Missing embed

[jira] [Updated] (TIKA-1276) Missing embedded dependencies in tika-bundle

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1276: --- Fix Version/s: 1.6 > Missing embedded dependencies in tika-bun

[jira] [Resolved] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1279. Resolution: Fixed Fixed at r1589687 > Missing return lines at output of SourceCodePar

[jira] [Commented] (TIKA-1224) Adding Source code (Java, Groovy, C) parser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979614#comment-13979614 ] Hong-Thai Nguyen commented on TIKA-1224: Thank [~ben.12] for feedback. For

[jira] [Created] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Hong-Thai Nguyen (JIRA)
Hong-Thai Nguyen created TIKA-1279: -- Summary: Missing return lines at output of SourceCodeParser Key: TIKA-1279 URL: https://issues.apache.org/jira/browse/TIKA-1279 Project: Tika Issue Type

RE: Tika VM Service

2014-04-10 Thread Hong-Thai Nguyen
Hi Tika members, Thank for this great initiative. I guess that there's some use cases possible when creating such service: 1. Tika exploitation We may create a free accessible Tika Server to parse documents coming from public requests, a kind of demo or free-try document parser to check Tika fe

[jira] [Updated] (TIKA-623) Add support for Outlook PST

2014-04-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-623: -- Assignee: (was: Hong-Thai Nguyen) > Add support for Outlook

[jira] [Resolved] (TIKA-623) Add support for Outlook PST

2014-04-04 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-623. --- Resolution: Fixed Improvement: extract each mail as attachment document. Recursion down to

Unable to commit SVN ?

2014-04-03 Thread Hong-Thai Nguyen
Hi Tika men, I have 500 error when committing to tika SVN. Do you have same problem ? POST request on '/repos/asf/!svn/me' failed: 500 Internal Server Error Thanks, Hong-Thai

RE: Add Outlook/PST files to supported formats on the web site?

2014-04-01 Thread Hong-Thai Nguyen
Yes, but from 1.6: https://issues.apache.org/jira/browse/TIKA-623 I'm finishing return mails as extracted documents as demand, but we'll have this format in 1.6. Hong-Thai -Message d'origine- De : Michael McCandless [mailto:luc...@mikemccandless.com] Envoyé : mardi 1 avril 2014 13:42 À

[jira] [Resolved] (TIKA-1244) Better parsing of Mbox files

2014-03-31 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen resolved TIKA-1244. Resolution: Fixed Fix Version/s: 1.6 Commited on r1583305, thanks [~lfcnassif] I

[jira] [Assigned] (TIKA-1244) Better parsing of Mbox files

2014-03-28 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen reassigned TIKA-1244: -- Assignee: Hong-Thai Nguyen > Better parsing of Mbox fi

  1   2   >