[jira] [Commented] (TIKA-2434) Language detection slow, cpu intensive, CLI interrupts work

2017-08-18 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133468#comment-16133468 ] Mattmann, Chris A (388J) commented on TIKA-2434: Hi Everyone, I wil

[jira] [Updated] (TIKA-1804) Tika use no free json.org

2017-05-30 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mattmann, Chris A (388J) updated TIKA-1804: --- Hi Everyone, I will be out of the office 5/29 – 6/6 on Vacation. During this

[jira] [Commented] (TIKA-1885) Tika MIME updates for *.cdf and *.xar and custom zero length file detector based on TREC-DD-Polar

2016-05-02 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268058#comment-15268058 ] Mattmann, Chris A (388J) commented on TIKA-1885: Hello, I am on vaca

[jira] [Commented] (TIKA-774) ExifTool Parser

2016-03-23 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209108#comment-15209108 ] Mattmann, Chris A (388J) commented on TIKA-774: --- Is this a replacement

[jira] [Commented] (TIKA-1696) Language Identification with Text Processing Toolkit from MITLL

2015-07-23 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639531#comment-14639531 ] Mattmann, Chris A (388J) commented on TIKA-1696: It's fine to dis

[jira] [Commented] (TIKA-1619) SHA1 and MD5 verification hashes for v1.8 still show old v1.7 hashes

2015-04-29 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520536#comment-14520536 ] Mattmann, Chris A (388J) commented on TIKA-1619: Hey Rishi yes it

[jira] [Commented] (TIKA-605) Tika GDAL parser

2014-10-11 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168238#comment-14168238 ] Mattmann, Chris A (388J) commented on TIKA-605: --- Great +1 please updat

Google Summer of Code: GDAL parser

2013-03-23 Thread Mattmann, Chris A (388J)
Hey Guys, I tagged TIKA-605 GDAL parser [1] as a Google Summer of Code 2013 project. I'm available to help mentor. I'm copying the SIS list on this since people like Adam Estrada (SIS, VP), and Joe White have offered to help mentor as well. Note: Uli is going to send the Google Summer of Code pro

Re: Build failed in Jenkins: Tika-trunk » Apache Tika parsers #989

2013-03-22 Thread Mattmann, Chris A (388J)
Forgot one class in TIKA-1096, should be fixed now. Cheers, Chris On 3/22/13 9:48 PM, "Apache Jenkins Server" wrote: >See >/changes> > >Changes: > >[mattmann] Patch for TIKA-1096 CompressorParser: Add support for hand

FW: GSoC 2013

2013-03-18 Thread Mattmann, Chris A (388J)
[Apologies for cross post] Guys, to play in the GSoC 2013 spec, we just need to tag issues in JIRA with the gsoc2013 tag. I'll try and come up with few projects soon :) Cheers, Chris On 3/15/13 11:15 AM, "Luciano Resende" wrote: >On Fri, Mar 15, 2013 at 11:01 AM, Manish Agrawal >wrote: >>

FW: [OPENING] Google Summer of Code Applications

2013-03-10 Thread Mattmann, Chris A (388J)
FYI On 3/10/13 5:10 PM, "Lewis John Mcgibbney" wrote: >I just told a huge lie. >I got my dates mixed up... >Students have from between April 22nd and May 3rd to get proposals in. >Sorry about the mix up. > >Lewis > >On Sun, Mar 10, 2013 at 5:09 PM, Lewis John Mcgibbney < >lewis.mcgibb...@gmail.c

FW: [Tika Wiki] Update of "RecursiveMetadata" by domtheo

2013-03-06 Thread Mattmann, Chris A (388J)
Guys I reverted this spammer but don't know how to block him. Help? Cheers, Chris On 3/6/13 7:12 PM, "Apache Wiki" wrote: >Dear Wiki user, > >You have subscribed to a wiki page or wiki category on "Tika Wiki" for >change notification. > >The "RecursiveMetadata" page has been changed by domtheo:

Re: Jenkins build is back to normal : Tika-trunk #981

2013-02-12 Thread Mattmann, Chris A (388J)
Yay! On 2/12/13 12:01 PM, "Michael McCandless" wrote: >Yay, Java 1.6 :) > >Mike McCandless > >http://blog.mikemccandless.com > >On Tue, Feb 12, 2013 at 2:59 PM, Apache Jenkins Server > wrote: >> See >>

Re: Build failed in Jenkins: Tika-trunk #980

2013-02-12 Thread Mattmann, Chris A (388J)
Thanks Mike! On 2/12/13 10:14 AM, "Michael McCandless" wrote: >Hmm, that didn't work. > >It looks like we have to fix our JAVA_HOME to point to a 1.6+ java: >http://stackoverflow.com/questions/11328677/error-when-using-javac-javac-i >nvalid-flag-s > >OK I managed to log in to builds.apache.org a

Re: [DISCUSS] Should Tika require Java6? (was Re: Build failed in Jenkins: Tika-trunk #977)

2013-02-12 Thread Mattmann, Chris A (388J)
;future". Aha moment !!! >>>> Here is mine +1. >>>> >>>> According to Oracle "In February 2011 Oracle announced the End of >>>>Public >>>> Updates for their Java SE 6 products for July 2012. In February 2012 >>>>Oracle &g

FW: [GSoC Mentors] Google Summer of Code 2013

2013-02-11 Thread Mattmann, Chris A (388J)
[Sorry for cross posting] Guys, FYI please note that you can participate as a mentor from a PMC via Apache as they are a GSoC org. ComDev will coordinate our participation but start thinking about what projects we may want to do. Cheers, Chris From: Carol Smith mailto:car...@google.com>> Date

Re: svn commit: r1443963 - in /tika/trunk/tika-server/src/main/java/org/apache/tika/server: CSVMessageBodyWriter.java JSONMessageBodyWriter.java

2013-02-08 Thread Mattmann, Chris A (388J)
Thanks Mike! On 2/8/13 3:54 AM, "mikemcc...@apache.org" wrote: >Author: mikemccand >Date: Fri Feb 8 11:54:26 2013 >New Revision: 1443963 > >URL: http://svn.apache.org/r1443963 >Log: >comment out @Overrides > >Modified: > >tika/trunk/tika-server/src/main/java/org/apache/tika/server/CSVMessag

[DISCUSS] Should Tika require Java6? (was Re: Build failed in Jenkins: Tika-trunk #977)

2013-02-08 Thread Mattmann, Chris A (388J)
Hey Guys, Just to summarize, the question on the table is whether or not Tika should require Java6. We had some discussions on this previously (if I get time, will dig up the threads -- ok found time ;) ): https://issues.apache.org/jira/browse/TIKA-888 http://mail-archives.apache.org/mod_mbox/ti

Re: Build failed in Jenkins: Tika-trunk #977

2013-02-07 Thread Mattmann, Chris A (388J)
Hey Mike, Weird. I did notice in the patch for: https://issues.apache.org/jira/browse/TIKA-1047 That there were some JDK7 stuff -- I went ahead and fixed it to be JDK6 compat and updated the patch and committed that version as I noted in the issue comments. I wonder if there was something I mis

Re: Crawler-Commons 0.2 released

2013-02-03 Thread Mattmann, Chris A (388J)
Thanks Ken! Cheers, Chris On 2/3/13 7:56 AM, "Ken Krugler" wrote: >Hi Chris, > >On Feb 2, 2013, at 7:34pm, Mattmann, Chris A (388J) wrote: > >> Awesome thanks Ken. Any pointers to the release? > >Sorry, should have included those detailsŠ > > - Project

Re: Crawler-Commons 0.2 released

2013-02-02 Thread Mattmann, Chris A (388J)
Awesome thanks Ken. Any pointers to the release? Cheers, Chris On 2/2/13 7:08 PM, "Ken Krugler" wrote: >Just a heads-up that we released version 0.2. > >This might be of interest to the Tika community, since it contains >parsers for both robots.txt and sitemaps. > >-- Ken > >---

Re: buildbot failure in ASF Buildbot on tika-trunk

2013-01-27 Thread Mattmann, Chris A (388J)
The latest SVN commit in r1439145 fixes this. Cheers, Chris On 1/27/13 11:19 AM, "build...@apache.org" wrote: >The Buildbot has detected a new failure on builder tika-trunk while >building ASF Buildbot. >Full details are available at: > http://ci.apache.org/builders/tika-trunk/builds/1023 > >Bu

Re: [ANNOUNCE] Apache Tika 1.3 Released

2013-01-22 Thread Mattmann, Chris A (388J)
Great job Dave!!! On 1/22/13 12:22 PM, "Dave Meikle" wrote: >The Apache Tika project is pleased to announce the release of Apache Tika >1.3. The release contents have been pushed out to the main Apache release >site and to the Maven Central sync, so the releases should be available as >soon as t

Re: KEYS file and dist.apache.org (Re: [VOTE] Apache Tika 1.3 Release Candidate #1)

2013-01-20 Thread Mattmann, Chris A (388J)
Thanks Jukka for the FYI... Cheers, Chris On 1/20/13 10:11 PM, "Jukka Zitting" wrote: >Hi, > >On Sun, Jan 20, 2013 at 11:24 PM, Mattmann, Chris A (388J) > wrote: >> +1 to that -- Dave feel free to simply copy the one out of dist into the >> RC dir -- or whomev

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Mattmann, Chris A (388J)
ush the release out, we include KEYS. > >Mike McCandless > >http://blog.mikemccandless.com > >On Sun, Jan 20, 2013 at 3:52 PM, Mattmann, Chris A (388J) > wrote: >> Hey Mike, >> >> I found the same thing -- scope the KEYS file here in case you need it: >>

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Mattmann, Chris A (388J)
Hey Mike, I found the same thing -- scope the KEYS file here in case you need it: curl -O http://www.apache.org/dist/tika/KEYS gpg --import < KEYS Cheers, Chris On 1/20/13 3:35 AM, "Michael McCandless" wrote: >+1, but I think you need to add the KEYS file? > >Tests passed from the source rele

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Mattmann, Chris A (388J)
Hey Dave, On 1/18/13 8:30 PM, "Dave Meikle" wrote: >Hi Guys, > >A candidate for the Tika 1.3 release is available at: > >http://people.apache.org/~dmeikle/apache-tika-1.3-rc1/ > >The release candidate is a zip archive of the sources in: > >http://svn.apache.org/repos/asf/tika/tags/tika-1

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-17 Thread Mattmann, Chris A (388J)
Hey Dave, No worries! There is more value in getting more people doing this. So all yours this weekend! :) If you need any help let me know. Cheers, Chris On 1/17/13 7:40 AM, "Dave Meikle" wrote: >Hi Chris, > >On 17 Jan 2013, at 15:31, "Mattmann, Chris A (388J)"

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-17 Thread Mattmann, Chris A (388J)
Hey Jukka, I'll roll an RC #1 for 1.3 by the week-end if that works for everyone. Dave, I know you mentioned you wanted to give it a go. If you do that's fine too. Just saying I have time to do it if you'd like. To start, I've created a 1.4 version in JIRA and moved all unresolved 1.3s to 1.4. S

Re: svn commit: r1431316 - in /tika/trunk: CHANGES.txt tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

2013-01-11 Thread Mattmann, Chris A (388J)
Echo that, thanks Nick! On 1/10/13 7:40 AM, "Michael McCandless" wrote: >Thanks Nick! > >Mike McCandless > >http://blog.mikemccandless.com > >On Thu, Jan 10, 2013 at 7:20 AM, wrote: >> Author: nick >> Date: Thu Jan 10 12:20:56 2013 >> New Revision: 1431316 >> >> URL: http://svn.apache.org/view

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-08 Thread Mattmann, Chris A (388J)
Agreed +1 from me. Dave, I think it would be great for you to rock the release too. Any help I can provide I'd be happy to! Here's the last Tika 1.2 release ANNOUNCE email for pointers: http://s.apache.org/UEA Cheers, Chris On 1/8/13 5:21 PM, "Michael McCandless" wrote: >+1 for a 1.3 release

Re: [jira] [Updated] (TIKA-1048) XMLParser should add whitespace between elements

2012-12-20 Thread Mattmann, Chris A (388J)
+1... Cheers, Chris On 12/20/12 4:23 AM, "Michael McCandless" wrote: >Hi Oleg, > >UIMA could be useful for extracting text from XML (I'm not familiar >enough with it...), but I think we should still fix Tika's own XML >extraction. > >Mike McCandless > >http://blog.mikemccandless.com > >On Thu,

Re: Contribution of parser for FITS file format to Apache Tika

2012-12-06 Thread Mattmann, Chris A (388J)
Hey Rahul, This is great and I'm totally willing to work with you to shepherd this in. The first step would be to create a JIRA issue for your parser, and then to submit a patch to incorporate it into the tika-parsers module. Of course, you can start with changing the namespace to org.apache.* (fr

Re: MimeTypes.java final?

2012-10-29 Thread Mattmann, Chris A (388J)
Thanks Ryan you the man. Appreciate it. I will take a look at the issues and try to help shepherd them in! Cheers, Chris On Oct 29, 2012, at 6:52 PM, Ryan McKinley wrote: > On Mon, Oct 29, 2012 at 2:03 PM, Mattmann, Chris A (388J) > wrote: >> Hi Ryan, >> >> I thi

Re: MimeTypes.java final?

2012-10-29 Thread Mattmann, Chris A (388J)
Hi Ryan, I think #1 has been suggested before, in a thread called "Appending MIME Types": http://s.apache.org/TVe As for #2, I think that's the type of information we're trying to hide through the class interface. I like the adding more URL information and URI stuff to the MIME registry though

Re: Apache CMS for Website?

2012-09-07 Thread Mattmann, Chris A (388J)
Hey Dave, I wouldn't be opposed to it, though I would have to do a better job of learning the CMS than I already have (which hasn't involved a lot of learning on my end ;) ). I'm used to running the couple mvn commands and svn commits to get the site up to date with new release information. But

Re: Standard practice with @author in comments

2012-08-30 Thread Mattmann, Chris A (388J)
Hey Ken, I personally don't care too much about having @author tags, or not having them, but I know there are others more passionate (for example about NOT having them) :) Cheers, Chris On Aug 30, 2012, at 2:03 PM, Ken Krugler wrote: > Hi all, > > I'm wondering if we've got any convention fo

Welcome to our new Tika PMC chair!

2012-08-19 Thread Mattmann, Chris A (388J)
Hey Folks, I decided to step down as chair of the Apache Tika PMC. We have a new chair, who graciously volunteered to step up and handle the chair duties, Dave Meikle. Dave's nomination was recently confirmed at the last Apache board meeting, on recommendation from the Tika PMC. Dave, welcome!

[RESULT] [VOTE] Graduate Apache Any23 from the Apache Incubator

2012-08-16 Thread Mattmann, Chris A (388J)
Hi Folks, This VOTE has passed with the following tallies: Tika PMC +1: Chris Mattmann* Jukka Zitting* Dave Meikle Oleg Tikhonov Any23 community (PPMC + others) +1: Lewis John McGibbney Andy Seaborne* Simone Tripodi Michele Mostarda Tammaso Teofili* * - indicates IPMC I'll now take the VOTE

[VOTE] Graduate Apache Any23 from the Apache Incubator

2012-08-03 Thread Mattmann, Chris A (388J)
Hi Folks, Based on prior positive discussions: http://s.apache.org/W1C http://s.apache.org/dw4 http://s.apache.org/xN I'm now going to call for a community VOTE (before heading to the Incubator to make it official) for Any23 to graduate from the Incubator. VOTEs are open to Any23 and Tika commun

[ANNOUNCE] Welcome Jörg Ehrlich as new Tika PMC member and committer

2012-07-31 Thread Mattmann, Chris A (388J)
Hi Folks, The Tika PMC has VOTEd to elect Jörg Ehrlich to our ranks as a PMC member and committer. Welcome Jörg! Feel free to mention a bit about yourself. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet P

[ANNOUNCE] Welcome Ingo Renner as Tika PMC member and committer

2012-07-31 Thread Mattmann, Chris A (388J)
Hi Folks, The Tika PMC VOTEd to add Ingo Renner to our ranks as a PMC member and committer. Welcome, Ingo! Please feel free to say a bit about yourself. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Prop

[ANNOUNCE] Welcome Sergey Beryozkin as Apache Tika PMC member and committer

2012-07-30 Thread Mattmann, Chris A (388J)
Hi Folks, The Tika PMC has elected to add Sergey Beryozkin as a PMC member and committer. Welcome Sergey! Feel free to say a bit about yourself! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion L

[DISCUSS] Any23 Graduation to TLP

2012-07-26 Thread Mattmann, Chris A (388J)
Hey Tika PMC'ers, The Any23 podling is preparing to hold a graduation VOTE. The community feels that it would be best to graduate to a TLP. We've made a release, added new committers, communicated on list and in the spirit of the Apache way. Since the Tika PMC agreed to sponsor the Any23 projec

[DISCUSS] Including tika-server WAR in 1.3 artifacts?

2012-07-20 Thread Mattmann, Chris A (388J)
Hey Guys, Now that we have tika-server, etc., I was thinking of including it like we do tika-app as a release artifact in 1.3-on. That sound OK? Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion

Fwd: Call for Papers for ApacheCon Europe 2012 now open!

2012-07-19 Thread Mattmann, Chris A (388J)
FYI... Begin forwarded message: > From: Nick Burch > Date: July 19, 2012 1:14:57 PM CDT > To: > Subject: Call for Papers for ApacheCon Europe 2012 now open! > Reply-To: > > Hi All > > We're pleased to announce that the Call for Papers for ApacheCon Europe 2012 > is finally open! > > (For t

[DISCUSS] Tika Hardener?

2012-07-19 Thread Mattmann, Chris A (388J)
Hey Jerome, I noticed on TIKA-815 that you mentioned you had a Tika "hardener" -- would you be willing to contribute that upstream to the Tika project? We appreciate your contributions to date and were just wondering? Thanks! Cheers, Chris +

Re: Can't build javadocs for 1.2 API site docs

2012-07-17 Thread Mattmann, Chris A (388J)
le week workshop. I will try to have a >>> look at it as soon as possible. >>> >>> Regards >>> jörg >>> >>> -Original Message- >>> From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] >>> Sent: Diensta

Re: Can't build javadocs for 1.2 API site docs

2012-07-17 Thread Mattmann, Chris A (388J)
ortunately I am currently in a whole week workshop. I will try to have a >>> look at it as soon as possible. >>> >>> Regards >>> jörg >>> >>> -Original Message- >>> From: Mattmann, Chris A (388J) [mailto:chris.a.mattm.

Can't build javadocs for 1.2 API site docs

2012-07-16 Thread Mattmann, Chris A (388J)
Hey Guys, When I run mvn javadoc:aggregate which normally works fine and builds the API docs for the website for me to push up to the site publish directory, in 1.2 I now get an error: /Users/mattmann/tmp/tika1.2/tika-xmp/src/main/java/org/apache/tika/xmp/XMPMetadata.java:75: warning - Tag @see

[ANNOUNCE] Apache Tika 1.2 released

2012-07-16 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 1.2. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

[RESULT] [VOTE] Apache Tika 1.2 release rc #1

2012-07-16 Thread Mattmann, Chris A (388J)
Hi Everyone, This VOTE has passed with the following tallies: +1 Chris Mattmann* Alex Ott Mike McCandless* Zabrane Mickael Joerg Ehrlich Dave Meikle* Jukka Zitting* Oleg Tikhonov* Ken Krugler* I'll push the bits out and announce the release. Thanks to all who VOTEd! Cheers, Chris * - indicat

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-12 Thread Mattmann, Chris A (388J)
Hey Jukka, On Jul 11, 2012, at 4:48 PM, Jukka Zitting wrote: > Hi, > > On Wed, Jul 11, 2012 at 4:27 PM, Mattmann, Chris A (388J) > wrote: >> On Jul 11, 2012, at 6:43 AM, Michael McCandless wrote: >>> Why are there original-tika-app* files in the RC directory? >&g

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Mattmann, Chris A (388J)
Thanks Mike! On Jul 11, 2012, at 6:43 AM, Michael McCandless wrote: > +1 > > I smoke tested, extracting text for the Lucene in Action PDF (looked > good), and verified TIKA-948 is fixed. > > Why are there original-tika-app* files in the RC directory? Good question: this is the first time I've

[VOTE] Apache Tika 1.2 release rc #1

2012-07-10 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Tika 1.2 release is available at: http://people.apache.org/~mattmann/apache-tika-1.2/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.2/ The SHA1 checksum of the archive is 8146c1161d35e6b1dc670d078a773f

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
es.apache.org/jira/browse/TIKA-930 > > > On Jul 1, 2012, at 7:22 PM, Nick Burch wrote: > >> On Sun, 1 Jul 2012, Mattmann, Chris A (388J) wrote: >>>> It can be a big pain if an in-progress API is suddenly effectively frozen >>>> by the need to be compatible

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Nick, On Jul 1, 2012, at 2:52 PM, Nick Burch wrote: > On Sun, 1 Jul 2012, Mattmann, Chris A (388J) wrote: >> I also plan to spin a 1.2 release candidate at some point in the next week >> or so. I realize the metadata stuff isn't done yet, but it's better to &g

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Jukka, On Jul 1, 2012, at 12:01 PM, Jukka Zitting wrote: > Hi, > > On Sun, Jul 1, 2012 at 6:27 PM, Mattmann, Chris A (388J) > wrote: >> On Jul 1, 2012, at 5:09 AM, Jukka Zitting wrote: >> Sergey Beryozkin (who I'm CC'ing on this email since I'm not

Re: svn commit: r1355947 - /tika/trunk/tika-parent/pom.xml

2012-07-01 Thread Mattmann, Chris A (388J)
Great job Ray!! Cheers, Chris On Jul 1, 2012, at 9:39 AM, wrote: > Author: rgauss > Date: Sun Jul 1 16:39:29 2012 > New Revision: 1355947 > ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pas

Re: svn commit: r1355877 - in /tika/trunk: ./ tika-dll/ tika-dll/src/ tika-dll/src/main/ tika-dll/src/main/csharp/ tika-dll/src/main/csharp/Apache/

2012-07-01 Thread Mattmann, Chris A (388J)
WOW nice Jukka, you did it! Cheers, Chris On Jul 1, 2012, at 6:04 AM, wrote: > Author: jukka > Date: Sun Jul 1 13:04:00 2012 > New Revision: 1355877 > > URL: http://svn.apache.org/viewvc?rev=1355877&view=rev > Log: > TIKA-773: .NET version of Tika > > Add a basic Tika.dll build > > Added:

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Jukka, On Jul 1, 2012, at 5:09 AM, Jukka Zitting wrote: > Hi, > > I looked at tika-server in a bit more detail, and I'm a bit concerned > about the dependency overhead it needs for the JAX-RS support: > > +- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2 > +- org.apache.cxf:cxf-common-

Re: Convert file before Tika processes it?

2012-06-21 Thread Mattmann, Chris A (388J)
+1, great solution, Jukka! Cheers, Chris On Jun 21, 2012, at 8:08 AM, Jukka Zitting wrote: > Hi, > > On Thu, Jun 21, 2012 at 4:35 AM, 122jxgcn wrote: >> Hi, I'm currently working on Tika to properly process custom file type (*.hwp >> file) I have a binary executable file which converts hwp fil

Re: Welcome Ray Gauss as a Tika committer/PMC

2012-06-08 Thread Mattmann, Chris A (388J)
Welcome Ray! Cheers, Chris On Jun 8, 2012, at 8:14 AM, Nick Burch wrote: > Hi All > > Many of you will have seen the JIRAs / patches from Ray Gauss over the last > few months, especially around metadata. I'm pleased to announce that Ray has > now been elected as a Tika committer and PMC membe

Re: svn commit: r1343137 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/pkg/PackageExtractor.java test/java/org/apache/tika/parser/pkg/ArParserTest.java

2012-05-27 Thread Mattmann, Chris A (388J)
s/Josh/John/ Sorry John! Cheers, Chris On May 27, 2012, at 9:18 PM, wrote: > Author: mattmann > Date: Mon May 28 04:18:21 2012 > New Revision: 1343137 > > URL: http://svn.apache.org/viewvc?rev=1343137&view=rev > Log: > - fix for TIKA-935 TikaException thrown when trying to parse archive (*.

[DISCUSS] Apache Tika 1.2 RC?

2012-05-27 Thread Mattmann, Chris A (388J)
Hey Guys, Looking at CHANGES.txt, we've got some nice new features and a few bug fixes. Is it time for a 1.2 RC? I'm mainly interested in releasing Tika Server (yay! :) ), but if folks are actively progressing e.g., on the Metadata reorg, and other fun stuff, I can wait. Just pinging since I hav

Re: A plan to improve the metadata property definitions

2012-05-16 Thread Mattmann, Chris A (388J)
Thanks Nick, +1. I'll try and follow and see if I can help in places. Cheers, Chris On May 16, 2012, at 5:50 AM, Nick Burch wrote: > Hi All > > I've just been brainstorming with Ray Gauss, and we think we've come up with > a way to move towards cleaner and clearer metadata property definition

Re: [metadata] Input on reorganization of Metadata interfaces

2012-05-08 Thread Mattmann, Chris A (388J)
Hi Jörg, On May 8, 2012, at 5:39 AM, Joerg Ehrlich wrote: > Hi Chris, > >> I'm OK with the code-level implications of that, but I will just have to >> scope out the patch and so forth. >> Thanks for pushing this. I really appreciate your help here. > > Sorry, I am not a native speaker: Does th

Re: [metadata] Input on reorganization of Metadata interfaces

2012-05-04 Thread Mattmann, Chris A (388J)
Hi Jörg, On May 4, 2012, at 6:43 AM, Joerg Ehrlich wrote: > Hi, > > I wanted to start submitting patches for the following and would like your > input on that: > > Create one "Core Properties" interface for the Metadata class which contains > just the keys for the properties which should be d

Re: Build failed in Jenkins: Tika-trunk #838

2012-04-27 Thread Mattmann, Chris A (388J)
Hey Jukka, In r1331457, I disabled tika-server build from the pom which should make Jenkins happy for now. I have no clue how to fix the shade plugin since I didn't do it before :), but if no one fixes it by next week (early), I'll research, investigate and address the problem. Take care dood.

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
, at 2:30 PM, Antoni Mylka wrote: > 2012/04/26 Mattmann, Chris A (388J) napisał/wrote: >> Hi Guys, >> >> One comment RE: the below too -- this is precisely where I see >> Any23 coming into play and why there is a strong relationship >> between it and Tika: >>

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
roperty) when a user wants > the entire structured object. > > That approach should be able to maintain backwards compatibility for existing > implementations and allow for structured and namespaced metadata. > > Just a thought, > > Ray > > > On Apr 26, 2012, at

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
Hi Jörg, Thanks for your email, comments below: On Apr 26, 2012, at 3:35 AM, Joerg Ehrlich wrote: > Hi Chris, > > Those are all valid points and I agree that you could do everything with a > Hashmap. > Having the parsers fill the Metadata class and its Hashmap with all needed > information w

Re: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Jörg, On Apr 25, 2012, at 10:27 AM, Joerg Ehrlich wrote: > >> I am not strongly supportive of of changing the HashMap internal >> representation in Metadata out. >> A couple of things I like about the HashMap: >> >> * It's simple. >> * It doesn't require dependency on any external libraries

Git Pull question?

2012-04-25 Thread Mattmann, Chris A (388J)
Hey Guys, I saw a Git pull request come through the other day and followed it to: http://s.apache.org/l9t I commented there asking Kyle if he would be interested in joining our dev list and telling him I'd be happy to figure out how to get his patch in from there. I know Jukka has been working

Re: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Jörg, First off, thanks for taking the time to put your thoughts down on the Wiki. I will try to leverage that for helping push these ideas forward. I am +1 on most of the things you proposed. Regarding: {quote} Use XMP instead of Hashmap in Metadata class The idea is to have just one data

Re: Server component in Jira

2012-04-24 Thread Mattmann, Chris A (388J)
Done! Cheers, Chris On Apr 24, 2012, at 4:00 PM, Ingo Renner wrote: > Hi all, > > could we add "server" as a component in Jira? > > > thanks > Ingo > > -- > Ingo Renner > TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code > > TYPO3 > Open Source Enterprise Content

Re: Pluggable language detection

2012-04-08 Thread Mattmann, Chris A (388J)
Hi Jan, It probably makes sense to provide pluggable language detection in Tika, since it's the lower level library, so I am +1 for figuring out a solution to implement it in Tika ville. If no one has started on this in the next few weeks I'll give it a go. Cheers, Chris On Apr 8, 2012, at 4:

Re: PUT vs. POST in tika-server

2012-04-05 Thread Mattmann, Chris A (388J)
Hi Guys, Yeah, I am happy to annotate the code with @POST too like Max suggested. I opened https://issues.apache.org/jira/browse/TIKA-891 to track this. Thanks! Cheers, Chris On Apr 5, 2012, at 1:29 AM, Jukka Zitting wrote: > Hi, > > I notice the tika-server component (nice work documenting

Re: Metadata situation and XMP support in Tika

2012-04-05 Thread Mattmann, Chris A (388J)
Hi Jörg, Great summary! I would be in favor of option #2 as well, with the caveat that if we take it slow, I think there might be a way to not really have as much of a client/API impact, using deprecations and other techniques as you suggested. Looking forward to your participation! Cheers, C

Re: Build failed in Jenkins: Tika-trunk #821

2012-03-27 Thread Mattmann, Chris A (388J)
Hey Jukka, I rolled it back to 1.5 on the Maven settings, so let's see if it compiles with 1.5. Cheers, Chris On Mar 27, 2012, at 3:17 PM, Jukka Zitting wrote: > Hi, > > On Tue, Mar 27, 2012 at 8:20 PM, Apache Jenkins Server > wrote: >> [INFO] --- maven-compiler-plugin:2.3.2:compile (default

Re: Build failed in Jenkins: Tika-trunk #820

2012-03-26 Thread Mattmann, Chris A (388J)
Hi Max, I will hopefully have a patch in the next day or so that migrates us to CXF with little to no changes (except for the Server and test components of tika-server, as you mentioned). I think this will help out in this regard. Cheers, Chris On Mar 26, 2012, at 9:26 AM, Maxim Valyanskiy wro

[ANNOUNCE] Apache Tika 1.1 released

2012-03-23 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 1.1. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

[RESULT] [VOTE] Apache Tika 1.1 release rc #1

2012-03-23 Thread Mattmann, Chris A (388J)
Hi Everyone, OK, this VOTE has passed with the following tallies: +1 PMC Chris Mattmann Ken Krugler Markus Jelsma Jukka Zitting Mike McCandless Dave Meikle +1 Community Zabrane Mickael Alex Ott Sorry took me a while to tally! :) I'll now push the dists out, and then push to Maven Central and

Re: [VOTE] Apache Tika 1.1 release rc #1

2012-03-07 Thread Mattmann, Chris A (388J)
Hey Ken, Sorry about that! Forgot to include the link to the staged Maven2 repo, here: https://repository.apache.org/content/repositories/orgapachetika-066/ There ya go. Cheers, Chris On Mar 7, 2012, at 4:36 PM, Ken Krugler wrote: > Hi Chris, > > On Mar 7, 2012, at 1:35pm, Mattmann

[VOTE] Apache Tika 1.1 release rc #1

2012-03-07 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Tika 1.1 release is available at: http://people.apache.org/~mattmann/apache-tika-1.1/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.1/ The SHA1 checksum of the archive is d3185bb22fa3c7318488838989af

Fwd: Google Summer of Code 2012 upcoming

2012-03-04 Thread Mattmann, Chris A (388J)
Guys, FYI...in case anyone is thinking of GSoC, deadlines are approaching. Process is described below... Thanks! Cheers, Chris Begin forwarded message: > From: Ulrich Stärk > Date: March 4, 2012 9:01:07 AM PST > To: "p...@apache.org" > Cc: "d...@community.apache.org" > Subject: Google Summe

Re: Tika 1.1 release

2012-03-01 Thread Mattmann, Chris A (388J)
Guys, +1 here. I'll create a 1.1 RC this weekend if no one beats me to it. Thanks! Cheers, Chris On Mar 1, 2012, at 11:51 AM, Jukka Zitting wrote: > Hi, > > On Thu, Mar 1, 2012 at 7:01 PM, Daniel Malmer > wrote: >> First, thanks for all the hard work you've put into this project. My compan

Re: Gdal Integration (TIKA 605)

2012-02-26 Thread Mattmann, Chris A (388J)
evolves around geospatial imagery. Has there been any > discussion about using Tika on any of the geospatial vector formats? I would > think they would go hand in hand, and OGR recognizes many of them. > > Joe > > On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wro

Re: Gdal Integration (TIKA 605)

2012-02-26 Thread Mattmann, Chris A (388J)
get a > test file to recognize any geospatial data, and then we will be off and > running. Great! Cheers, Chris > On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wrote: > >> Hi Joe, >> >> Awesome! Thanks for picking this up and getting interested in this work.

Re: Gdal Integration (TIKA 605)

2012-02-26 Thread Mattmann, Chris A (388J)
Hi Joe, Awesome! Thanks for picking this up and getting interested in this work. Right now, the only use cases we've had so far is to represent lats and lons (WGS84). It would be great to extract more information and come up with a policy for representing more WKTs and so forth. We should probab

TF-IDF parser and ContentHandler?

2012-02-07 Thread Mattmann, Chris A (388J)
Hey Guys, I've been toying around with the idea of writing a simple Tika Parser Decorator that extends the Text Parser, but that generates TDF-IDF metadata maybe top word count (summarized) and frequencies/term map. I was also thinking of then writing a similar ContentHandler as well so it could

Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
FYI Begin forwarded message: > From: Ross Gardler > Date: February 5, 2012 1:45:18 PM PST > To: "d...@community.apache.org" > Subject: RE: [Announce] Google Summer of Code 2012 > Reply-To: "d...@community.apache.org" > > For those new to GSoC you might want to review the roles defined at > ht

Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
Anyone interested in mentoring a GSoC student for Tika? Begin forwarded message: > From: Luciano Resende > Date: February 4, 2012 10:40:03 AM PST > To: "d...@community.apache.org" , code-awards > > Subject: Fwd: [Announce] Google Summer of Code 2012 > Reply-To: "d...@community.apache.org" >

Re: % of different content types out there on the web

2012-01-31 Thread Mattmann, Chris A (388J)
on > those two. However, we also explicitly filter out all/most unwanted suffixes. > We do have a lot of suffixes that we encountered so far. > > On Saturday 28 January 2012 03:01:26 Mattmann, Chris A (388J) wrote: >> (sorry for the cross post) >> >> Hey Guys, >>

% of different content types out there on the web

2012-01-27 Thread Mattmann, Chris A (388J)
(sorry for the cross post) Hey Guys, I'm trying to find a good citation or estimate (if anyone has done one) that estimates the breakout (by % or some other metric) of content types out there out the web (with a whole web crawl or a meaningful representative dataset) that are non HTML. Anyone

Re: [ANNOUNCEMENT][THANKS] Apache ODF Toolkit(Incubating) 0.5-incubating Release

2012-01-16 Thread Mattmann, Chris A (388J)
Congrats guys! Cheers, Chris On Jan 16, 2012, at 4:59 AM, Devin Han wrote: > Hi all, > > Thanks all of the voters from this list. Now there is a result ;) > > The Apache ODF Toolkit(Incubating) team is pleased to announce the release > of 0.5-incubating. This is our first Apache release. > >

InfoQ article on Tika published

2011-12-28 Thread Mattmann, Chris A (388J)
Hey Folks, InfoQ just released an article on the Tika 1.0 release: http://www.infoq.com/news/2011/12/tika-10 Hope everyone is having a nice Holiday season (for those that are celebrating!) Cheers, Chris ++ Chris Mattmann, Ph.D. Se

Re: Pushing parsers upstream

2011-12-13 Thread Mattmann, Chris A (388J)
Hey Jukka, For places like POI and PDFBox I think this could definitely work. And then for places where we have Parsers, but aren't ready to push upstream yet (I can think of two examples of this relevant to me, NetCDF/HDF and GDAL), we can just leave the Parser in tika-parsers I think. In this

[ANNOUNCE] Welcome Jerome Charron as Tika committer + PMC member

2011-12-12 Thread Mattmann, Chris A (388J)
Hi Folks, Please welcome Jerome Charron to the ranks of the Tika PMC and as a Tika committer. He's just been VOTEd in and we're really happy to have him around. Jerome, please feel free to say a bit about yourself. Thanks and welcome aboard! Cheers, Chris +

[ANNOUNCE] Welcome Antoni Mylka as Tika committer + PMC member

2011-12-12 Thread Mattmann, Chris A (388J)
Hi Folks, Please welcome Antoni Mylka to the ranks of the Tika PMC and as a Tika committer. He's just been VOTEd in and we're really happy to have him around. Antoni, please feel free to say a bit about yourself. Thanks and welcome aboard! Cheers, Chris +++

  1   2   3   >