Re: Considering usage of tika-server(s)

2013-03-18 Thread Mattmann, Chris A (388J)
Hi Clemens, Thanks for your questions. Answers below: On 3/13/13 6:22 AM, "Clemens Wyss DEV" wrote: >We have several tomcats (each with several war's) running. At the moment >we use tika "in memory", i.e. extraction is being performed within the >tomcat processes/threads. > >Does a tika-server

Re: Tika-server stability

2013-03-10 Thread Mattmann, Chris A (388J)
Hi Milos, On 3/10/13 4:16 AM, "Milos" wrote: >Mattmann, Chris A (388J writes: > >> >> Hey Milos, >> >> Tika server is the JAX-RS server. >> >> Good differentiation here on stack overflow: >> >> >>http://stackoverf

Re: Tika-server stability

2013-03-09 Thread Mattmann, Chris A (388J)
Hey Milos, Tika server is the JAX-RS server. Good differentiation here on stack overflow: http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode HTH! Cheers, Chris On 3/9/13 3:43 PM, "Milos" wrote: > >Mattmann, Chris A (388J writes: > >> &g

Re: Tika-server stability

2013-03-09 Thread Mattmann, Chris A (388J)
Hi Milos, Are you talking about the Tika JAXRS server, or the Network protocol (lower layer/sockets) one? Cheers, Chris On 3/9/13 1:24 PM, "Milos" wrote: >Hello, >I plan to use tika server application tika-server.jar for my intranet web >site. >I experimented with tika parsers v1.1 before but

Re: Improvement in Metadata Class

2013-03-03 Thread Mattmann, Chris A (388J)
Hey Lewis, RE: #3 — it would be great to get Nutch using Tika's metadata container — I don't think we have anything special in Nutch that prevents it. RE: #2 — I committed your Tika doc patch during ApacheCon NA 2013 so thanks! Thanks! Cheers, Chris From: Lewis John Mcgibbney mailto:lewis.mc

Re: Unsubscirbe

2013-02-20 Thread Mattmann, Chris A (388J)
Send a blank email to user-unsubscr...@tika.apache.org Cheers, Chris From: , Sandor mailto:sandor.djarm...@roesberg.com>> Reply-To: "user@tika.apache.org" mailto:user@tika.apache.org>> Date: Wednesday, February 20, 2013 4:46 AM To: "user@tika.apache.org

Re: Leech crawler 1.3 released!

2013-02-05 Thread Mattmann, Chris A (388J)
Very cool! Good job guys. Cheers, Chris On 2/5/13 9:40 AM, "Christian Reuschling" wrote: >-BEGIN PGP SIGNED MESSAGE- >Hash: SHA1 > >Migrated to Tika 1.3, for those that use Tika and need further crawling >capabilities. > >https://github.com/leechcrawler/leech > >Enjoy! :) > >Christian >

Re: Tika 1.3 server (JAX-WS) usage

2013-02-02 Thread Mattmann, Chris A (388J)
Per our recent conversation will look forward to some JIRA issues on this. Thanks AJ. Cheers, Chris On 2/1/13 10:00 AM, "AJ Weber" wrote: > >> The tika-server only provides two command line options -p for port and >> -h for help message. >> >> >Say we wanted to add some of those additional opt

Re: JAX-WS Tika Server fails to start

2013-02-02 Thread Mattmann, Chris A (388J)
g>> Subject: Re: JAX-WS Tika Server fails to start I will do it on Monday, and will volunteer to help. While it's the first time I'm picking up the Tika source, I've been around the block more than a few times with java. ;-) -Aaron "Mattmann, Chris A (388J)" mai

Re: JAX-WS Tika Server fails to start

2013-02-02 Thread Mattmann, Chris A (388J)
to identify the requested return format would be valuable. Also adding something similar to the metadata service to return in json or xml. -Aaron "Mattmann, Chris A (388J)" mailto:chris.a.mattm...@jpl.nasa.gov>> wrote: Hi AJ, What command are you running the start the s

Re: JAX-WS Tika Server fails to start

2013-02-01 Thread Mattmann, Chris A (388J)
Hi AJ, What command are you running the start the server? Cheers, Chris On 1/31/13 12:01 PM, "AJ Weber" wrote: >Me again, decided to start a new thread with an appropriate subject. > >Having trouble getting the WS Server to start. Just trying to run it >with the defaults, using the latest CXF

Re: Tika 1.3 server (JAX-WS) usage

2013-02-01 Thread Mattmann, Chris A (388J)
Hey Guys, I suggested that we should release the WAR file as part of our Tika releases: http://s.apache.org/qM Let's try and include the WAR file as one of the published and signed artifacts for 1.4. Thoughts? Cheers, Chris On 1/31/13 10:54 AM, "Dave Meikle" wrote: >Hi, > >On 31 Jan 2013, a

Re: [ANNOUNCE] Apache Tika 1.3 Released

2013-01-22 Thread Mattmann, Chris A (388J)
Great job Dave!!! On 1/22/13 12:22 PM, "Dave Meikle" wrote: >The Apache Tika project is pleased to announce the release of Apache Tika >1.3. The release contents have been pushed out to the main Apache release >site and to the Maven Central sync, so the releases should be available as >soon as t

Re: Newlines not escaped in CSV Metadata (Tika Rest Server)

2012-10-27 Thread Mattmann, Chris A (388J)
Hey David, Thanks man for following this up on list and for the blog post -- great work! I love TIKA-593 (our REST server) too! :) Cheers, Chris On Oct 26, 2012, at 2:37 PM, David James wrote: > I have found no evidence that Tika is the problem. I have found reason > to suspect that Ruby 1.9.3

Welcome to our new Tika PMC chair!

2012-08-19 Thread Mattmann, Chris A (388J)
Hey Folks, I decided to step down as chair of the Apache Tika PMC. We have a new chair, who graciously volunteered to step up and handle the chair duties, Dave Meikle. Dave's nomination was recently confirmed at the last Apache board meeting, on recommendation from the Tika PMC. Dave, welcome!

Re: Interest for Tika at ApacheCon Europe

2012-08-03 Thread Mattmann, Chris A (388J)
Hey Jukka, I'm not planning to attend but look forward to hearing about your presentation post meeting (and scoping out your slides!) Cheers, Chris On Aug 3, 2012, at 3:52 AM, Jukka Zitting wrote: > Hi, > > As many of you probably already know, ApacheCon Europe 2012 [1] is > coming up on Novem

[ANNOUNCE] Welcome Jörg Ehrlich as new Tika PMC member and committer

2012-07-31 Thread Mattmann, Chris A (388J)
Hi Folks, The Tika PMC has VOTEd to elect Jörg Ehrlich to our ranks as a PMC member and committer. Welcome Jörg! Feel free to mention a bit about yourself. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet P

[ANNOUNCE] Welcome Ingo Renner as Tika PMC member and committer

2012-07-31 Thread Mattmann, Chris A (388J)
Hi Folks, The Tika PMC VOTEd to add Ingo Renner to our ranks as a PMC member and committer. Welcome, Ingo! Please feel free to say a bit about yourself. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Prop

[ANNOUNCE] Welcome Sergey Beryozkin as Apache Tika PMC member and committer

2012-07-30 Thread Mattmann, Chris A (388J)
Hi Folks, The Tika PMC has elected to add Sergey Beryozkin as a PMC member and committer. Welcome Sergey! Feel free to say a bit about yourself! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion L

Re: Tika Server mode accessing with CURL

2012-07-21 Thread Mattmann, Chris A (388J)
Hi Hayden, On Jul 21, 2012, at 6:24 AM, Mr Havercamp wrote: > Hi Chris > > Thanks for your links, etc. I have successfully built and run Tika JAXRS and > will look to incorporate it into my component so that users can configure and > use it for Tika extraction (currently I have local Tika and

Re: Tika Server mode accessing with CURL

2012-07-20 Thread Mattmann, Chris A (388J)
lrCell, or a remote tika server. In your > opinion, would TikaJAXRS be a viable option for remote tika extraction (for > example, running on a separate server) especially in regards to performance > and security? > > Thanks again > > > Hayden > > On 20/07/

Re: Tika Server mode accessing with CURL

2012-07-20 Thread Mattmann, Chris A (388J)
Hi Hayden, Thanks for your email! Have you tried the Tika JAXRS server, documented here: https://issues.apache.org/jira/browse/TIKA-593 http://wiki.apache.org/tika/TikaJAXRS It first appeared in 1.2 and can also be run on a port (9988 by default) to handle cURL interactions. Cheers, Chris On J

Fwd: Call for Papers for ApacheCon Europe 2012 now open!

2012-07-19 Thread Mattmann, Chris A (388J)
FYI... Begin forwarded message: > From: Nick Burch > Date: July 19, 2012 1:14:57 PM CDT > To: > Subject: Call for Papers for ApacheCon Europe 2012 now open! > Reply-To: > > Hi All > > We're pleased to announce that the Call for Papers for ApacheCon Europe 2012 > is finally open! > > (For t

[ANNOUNCE] Apache Tika 1.2 released

2012-07-16 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 1.2. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

[RESULT] [VOTE] Apache Tika 1.2 release rc #1

2012-07-16 Thread Mattmann, Chris A (388J)
Hi Everyone, This VOTE has passed with the following tallies: +1 Chris Mattmann* Alex Ott Mike McCandless* Zabrane Mickael Joerg Ehrlich Dave Meikle* Jukka Zitting* Oleg Tikhonov* Ken Krugler* I'll push the bits out and announce the release. Thanks to all who VOTEd! Cheers, Chris * - indicat

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Mattmann, Chris A (388J)
Thanks Mike! On Jul 11, 2012, at 6:43 AM, Michael McCandless wrote: > +1 > > I smoke tested, extracting text for the Lucene in Action PDF (looked > good), and verified TIKA-948 is fixed. > > Why are there original-tika-app* files in the RC directory? Good question: this is the first time I've

[VOTE] Apache Tika 1.2 release rc #1

2012-07-10 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Tika 1.2 release is available at: http://people.apache.org/~mattmann/apache-tika-1.2/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.2/ The SHA1 checksum of the archive is 8146c1161d35e6b1dc670d078a773f

Re: Server mode documentation?

2012-07-01 Thread Mattmann, Chris A (388J)
-- Jason > > > On 01/07/2012 02:49, Mattmann, Chris A (388J) wrote: >> Hi Jason, >> >> Try this out: >> >> >> http://wiki.apache.org/tika/TikaJAXRS >> >> >> We'd be totally happy for feedback and thanks for checking

Re: Server mode documentation?

2012-07-01 Thread Mattmann, Chris A (388J)
I'm totally interested in participating but not sure I could do it in person. If you guys do it, could you set up a Google Hangout for me? Thanks Nick! Cheers, Chris On Jul 1, 2012, at 9:37 AM, Nick Burch wrote: > On Sun, 1 Jul 2012, Jason Judge wrote: >> So, feature requests, command line, or

Re: Server mode documentation?

2012-07-01 Thread Mattmann, Chris A (388J)
Hi Guys, I'd say Jason your comments are well taken, and Nick's replies are spot on. I got involved with tika-server after Maxim Valyanskiy built a simple JAX-RS layer in his $dayjob and was willing to contribute it back in TIKA-593. His original contribution used the Jersey JAX-RS libraries an

Re: Server mode documentation?

2012-07-01 Thread Mattmann, Chris A (388J)
Hi Jason, On Jul 1, 2012, at 6:05 AM, Jason Judge wrote: > I see, so tika-app in server mode and tika-server are not the same thing. > tika-app in server mode is just a way of providing an alternative input > stream, but offers no control through that stream over what it actually does. > > I h

Re: Server mode documentation?

2012-06-30 Thread Mattmann, Chris A (388J)
Hi Jason, Try this out: http://wiki.apache.org/tika/TikaJAXRS We'd be totally happy for feedback and thanks for checking this out! Cheers, Chris On Jun 30, 2012, at 12:26 PM, Jason Judge wrote: > Is there any documentation on running tika in server mode? I bought the book, > hoping for some

Re: TIKA-198: Illegal IOException... MP4

2012-06-07 Thread Mattmann, Chris A (388J)
That is great to hear! Cheers, Chris On Jun 7, 2012, at 4:35 AM, Paulini, Matthew CTR USAF AFMC AFRL/RISA wrote: > Thanks, Nick. I swapped in tika-app-1.2-20120528.050306-57.jar and the issue > seems to be resolved. The project is building and testing correctly on Ubuntu > 12.04 x32 and x64 as

Re: Limitations of iWork parsing

2012-04-24 Thread Mattmann, Chris A (388J)
Hi Gabriel, Thanks for bringing these issues to light. We would appreciate you filing issues in our JIRA issue tracker: http://issues.apache.org/jira/browse/TIKA And if you are able to attach sample files that we can use to reproduce what you are seeing that would be great. The Tika version yo

[ANNOUNCE] Apache Tika 1.1 released

2012-03-23 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 1.1. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

[RESULT] [VOTE] Apache Tika 1.1 release rc #1

2012-03-23 Thread Mattmann, Chris A (388J)
Hi Everyone, OK, this VOTE has passed with the following tallies: +1 PMC Chris Mattmann Ken Krugler Markus Jelsma Jukka Zitting Mike McCandless Dave Meikle +1 Community Zabrane Mickael Alex Ott Sorry took me a while to tally! :) I'll now push the dists out, and then push to Maven Central and

[VOTE] Apache Tika 1.1 release rc #1

2012-03-07 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Tika 1.1 release is available at: http://people.apache.org/~mattmann/apache-tika-1.1/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.1/ The SHA1 checksum of the archive is d3185bb22fa3c7318488838989af

Re: Tika Architecture/Documentation Site

2012-02-29 Thread Mattmann, Chris A (388J)
Hi Vineet, Very cool! Is this the online version of the Understand tool? I've used that before in my Software Architecture course. I'll check this out. I did notice on the front page it says "Lucene" though. Thanks I'll check it out! Cheers, Chris On Feb 29, 2012, at 7:08 PM, Vineet Sinha wro

InfoQ article on Tika published

2011-12-28 Thread Mattmann, Chris A (388J)
Hey Folks, InfoQ just released an article on the Tika 1.0 release: http://www.infoq.com/news/2011/12/tika-10 Hope everyone is having a nice Holiday season (for those that are celebrating!) Cheers, Chris ++ Chris Mattmann, Ph.D. Se

Re: LinkCH need Link.getMethod() and .getRel()

2011-12-21 Thread Mattmann, Chris A (388J)
Thanks Markus, I'll take a look! Cheers, Chris On Dec 21, 2011, at 6:58 AM, Markus Jelsma wrote: > Issue with patch. I omitted the method as this applies only to forms and we > might actually not need it. > > https://issues.apache.org/jira/browse/TIKA-824 > > On Wednesday 21 December 2011 11:

[ANNOUNCE] Welcome Jerome Charron as Tika committer + PMC member

2011-12-12 Thread Mattmann, Chris A (388J)
Hi Folks, Please welcome Jerome Charron to the ranks of the Tika PMC and as a Tika committer. He's just been VOTEd in and we're really happy to have him around. Jerome, please feel free to say a bit about yourself. Thanks and welcome aboard! Cheers, Chris +

[ANNOUNCE] Welcome Antoni Mylka as Tika committer + PMC member

2011-12-12 Thread Mattmann, Chris A (388J)
Hi Folks, Please welcome Antoni Mylka to the ranks of the Tika PMC and as a Tika committer. He's just been VOTEd in and we're really happy to have him around. Antoni, please feel free to say a bit about yourself. Thanks and welcome aboard! Cheers, Chris +++

Re: dead API documentation links

2011-11-16 Thread Mattmann, Chris A (388J)
Fixed, thanks for reporting John! Cheers, Chris On Nov 16, 2011, at 6:38 AM, John M wrote: > Hello, > > I noticed that with the release of version 1.0, the API documentation > link, http://tika.apache.org/1.0/api/, returns a 404, as do all other > links concerning the 1.0 API classes. Would so

Re: [VOTE] Apache Tika 1.0 release rc #1

2011-11-09 Thread Mattmann, Chris A (388J)
tion is for the TextDetector.detect() class, which > erroneously reports "text/plain" for AR archives. > > Congratulations and +1 for the release! :-) > > > On Fri, Nov 4, 2011 at 5:42 PM, Mattmann, Chris A (388J) > wrote: > Hi Folks, > > A candidat

[ANNOUNCE] Apache Tika 1.0 released

2011-11-07 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 1.0. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

[RESULT] [VOTE] Apache Tika 1.0 release rc #1

2011-11-07 Thread Mattmann, Chris A (388J)
Hi All, This VOTE has passed! Tallies: +1 PMC: Chris Mattmann Jukka Zitting Michael McCandless Oleg Tikhonov Dave Meikle +1 Community: Zabrane Mickael Alex Ott Christian Goeller Thanks to everyone for VOTE'ing! I'll push the release out to the mirrors and update the website! Cheers, Chris

Re: [VOTE] Apache Tika 1.0 release rc #1

2011-11-04 Thread Mattmann, Chris A (388J)
P.S. Here's my +1. Cheers, Chris On Nov 4, 2011, at 8:42 AM, Mattmann, Chris A (388J) wrote: > Hi Folks, > > A candidate for the Tika 1.0 release is available at: > > http://people.apache.org/~mattmann/apache-tika-1.0/rc1/ > > The release candidate is a zip

[VOTE] Apache Tika 1.0 release rc #1

2011-11-04 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Tika 1.0 release is available at: http://people.apache.org/~mattmann/apache-tika-1.0/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.0/ The SHA1 checksum of the archive is 203d84b56c5b8879ce04b496e9

DZone article on Tika

2011-10-21 Thread Mattmann, Chris A (388J)
Hey Guys, Here's a new DZone article on Tika: http://java.dzone.com/articles/how-retrieveextract-metadata. Thanks to Steve A. for pointing it out to me! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Prop

Re: [ANNOUNCE] Apache Tika 0.10 released

2011-09-30 Thread Mattmann, Chris A (388J)
t; > On Sep 30, 2011, at 8:18 PM, Mattmann, Chris A (388J) wrote: > >> (...apologies for the cross posting...) >> >> The Apache Tika project is pleased to announce the release of Apache Tika >> 0.10 The release contents have been pushed out to the main Apache release &

[ANNOUNCE] Apache Tika 0.10 released

2011-09-30 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 0.10 The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

Re: Weird Eclipse errors?

2011-09-26 Thread Mattmann, Chris A (388J)
Hey Nick, Version 1.1 of ooxml-schemas fixed it. Thanks! Cheers, Chris On Sep 26, 2011, at 3:16 AM, Nick Burch wrote: > On Sun, 25 Sep 2011, Mattmann, Chris A (388J) wrote: >> Description ResourcePathLocationType >> The method getBookmarkStartList() is un

[VOTE] Apache Tika 0.10 release rc #1

2011-09-25 Thread Mattmann, Chris A (388J)
Hi Folks, A first release candidate for the Tika 0.10 release is available at: http://people.apache.org/~mattmann/apache-tika-0.10/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/0.10/ The SHA1 checksum of the archive is 355d0b2f

Re: Weird Eclipse errors?

2011-09-25 Thread Mattmann, Chris A (388J)
46 AM, Nick Burch wrote: > On Fri, 23 Sep 2011, Mattmann, Chris A (388J) wrote: >> Weird. I am not seeing that in my Eclipse .classpath file: >> >> [chipotle:~/tmp] mattmann% grep -R poi $HOME/src/tika/.classpath >> > path="M2_REPO/org/apache/poi

Re: Weird Eclipse errors?

2011-09-23 Thread Mattmann, Chris A (388J)
/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml line 161Java Problem Weird... Cheers, Chris On Sep 23, 2011, at 5:31 PM, Nick Burch wrote: > On Fri, 23 Sep 2011, Mattmann, Chris A (388J) wrote: >> However, Eclipse keeps reporting to me that my Tika project ha

Weird Eclipse errors?

2011-09-23 Thread Mattmann, Chris A (388J)
Hey Guys, I mentioned this before, but I thought it was something wrong with my JVM but now I'm thinking that something else is up. I'm on a Mac OSX 10.6, with Eclipse Helios Service Release 2. I've got Tika up and running with the following .classpath and .project files: http://pastebin.com/

Re: Closing streams (Was: Tika leaves files open)

2011-09-01 Thread Mattmann, Chris A (388J)
You guys rock! Cheers, Chris On Sep 1, 2011, at 8:33 AM, Jukka Zitting wrote: > Hi, > > On Thu, Sep 1, 2011 at 4:59 PM, Michael McCandless > wrote: >> Awesome, thanks Jukka! It looks great, but just a small typo: taken >> should be take in "Unlike most other Tika methods that taken an" > > G

Welcome Mike McCandless to the Tika PMC and as a Tika Committer

2011-08-29 Thread Mattmann, Chris A (388J)
Hi Folks, The Tika PMC just elected Mike McCandless as a Tika PMC member and committer. Mike's made a number of valuable contributions to Tika over the years and is a longtime contributor to the Apache Lucene project. Mike, feel free to say a bit about yourself and welcome aboard! Cheers, Ch

Re: Tika 0.8 failure rates

2011-08-10 Thread Mattmann, Chris A (388J)
Darn right. Thanks so much for contributing this too Charles! Cheers, Chris On Aug 10, 2011, at 10:26 AM, Nick Burch wrote: > On Tue, 9 Aug 2011, Charles wrote: >> FYI, here is a list of apparent Tika 0.8 conversion failures when run >> from Xapian's omindex on a Debian 6 Squeeze 64-bit system w

Fwd: Reminder: TAC Assistance to ApacheCon NA 2011 closes July 8th

2011-07-02 Thread Mattmann, Chris A (388J)
Begin forwarded message: > From: Gavin McDonald > Date: July 2, 2011 5:16:14 PM PDT > To: "p...@apache.org" > Subject: Reminder: TAC Assistance to ApacheCon NA 2011 closes July 8th > > > - > > > Hi All, > > Just a friendly (and final) reminder that applications for financial help > to

Re: Tika as server/daemon for content, metadata and language

2011-06-24 Thread Mattmann, Chris A (388J)
Hi Marian, Tika also now ships with a JAX-RS REST based service, called tika-server. You can check out some (sparse) documentation for it, here: http://wiki.apache.org/tika/TikaJAXRS If you have comments, questions, and/or patches, they are all welcome! :) Cheers, Chris On Jun 24, 2011, at 3:

[ANNOUNCE] Welcome Oleg Tikhonov as a Tika PMC member/committer

2011-04-13 Thread Mattmann, Chris A (388J)
Hi All, Some time ago I nominated Oleg Tikhonov for Tika PMC membership and committership. The VOTEs are in and I'm happy to say the result is that Oleg is now a member of the Tika PMC and a Tika committer! Oleg, feel free to say a bit about yourself, and welcome aboard! Cheers, Chris +

Fwd: [Announce] Now Open: Call for Participation for ApacheCon North America

2011-03-03 Thread Mattmann, Chris A (388J)
All, FYI (apologies for cross-posting) Begin forwarded message: > From: Grant Ingersoll > Date: March 3, 2011 1:52:52 PM MST > To: "u...@mahout.apache.org" , > "java-u...@lucene.apache.org" , > "solr-u...@lucene.apache.org" , > "opennlp-u...@incubator.apache.org" > Subject: Fwd: [Announce] N

Re: Tika metadata extracted per supported document format?

2011-02-25 Thread Mattmann, Chris A (388J)
Hi Andreas, In Tika 0.8+, you can run the --list-met-models command from tika-app: java -jar tika-app-.jar --list-met-models And get a print out of the met keys that Tika supports. Some parsers add their own that aren't part of this met listing, but this is a relatively comprehensive list. Ch

Re: Apache Tika 0.9: failed to compile on OSX Leopard

2011-02-20 Thread Mattmann, Chris A (388J)
Hi Zabrane, Looks like it compiled fine, but failed a unit test: Can you provide the surefire report output? Also would be good to file a JIRA [1] issue about this. Cheers, Chris [1] http://issues.apache.org/jira/browse/TIKA On Feb 20, 2011, at 2:42 AM, Zabrane wrote: > Hi guys, > > Today,

[ANNOUNCE] Apache Tika 0.9 released

2011-02-17 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 0.9 The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

[RESULTS] [VOTE] Apache Tika 0.9 Release Candidate #1

2011-02-16 Thread Mattmann, Chris A (388J)
Hi Folks, Okay this VOTE has passed with the following tallies: Tika PMC: +1s Jukka Zitting Chris A. Mattmann Maxim Valyanskiy Julien Nioche Ken Krugler Tika Community: +1s Zabrane Mickael Michael McCandless Alex Ott I'll push the src distros out to the mirrors and click the button on Nexus

[VOTE] Apache Tika 0.9 Release Candidate #1

2011-02-13 Thread Mattmann, Chris A (388J)
Hi Folks, I have posted a candidate for the Apache Tika 0.9 release. The source code is at: http://people.apache.org/~mattmann/apache-tika-0.9/rc1/ See the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Maven2 release plugin, accordin

[Call for Papers] ICSE Software Engineering for Cloud Computing (SECLOUD) Workshop

2011-01-20 Thread Mattmann, Chris A (388J)
(apologies for the cross posting) *** PLEASE NOTE - the deadline for submitting papers has been extended by 1 week to 1/28/2011! *** Please consider submitting a paper to the ICSE 2011 Software Engineering for Cloud Computing (SECLOUD) Workshop to be held Sunday, May 22, 2011, at the Hilton Ha

[Call for Papers] ICSE Software Engineering for Cloud Computing (SECLOUD) Workshop

2011-01-03 Thread Mattmann, Chris A (388J)
(apologies for the cross posting) Please consider submitting a paper to the ICSE 2011 Software Engineering for Cloud Computing (SECLOUD) Workshop to be held Sunday, May 22, 2011, at the Hilton Hawaiian Village Resort in Waikiki, Honolulu, HI. This workshop focuses on identifying the grand chall

[ANNOUNCE] Apache Tika 0.8 released

2010-11-12 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 0.8 The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

[RESULT] [VOTE] Apache Tika 0.8 Release Candidate #1

2010-11-12 Thread Mattmann, Chris A (388J)
Hi Folks, This VOTE has passed with the following tallies: +1: PMC (binding) Chris Mattmann Jukka Zitting Ken Krugler Community (non-binding) zabrane Mikael I'll go ahead and push the release out to the mirrors, and send an [ANNOUNCE] thread. I'll also let infra@ know that there's no need t

Re: [VOTE] Apache Tika 0.8 Release Candidate #1

2010-11-10 Thread Mattmann, Chris A (388J)
content -t or --textOutput plain text content -T or --text-main Output plain text content (main content only) ... Is there any real difference between "-t" and "-T" options? -- Regards Zabrane 2010/11/9 Mattmann, Chris A (388J) : > Hi Folks, > > I have p

[VOTE] Apache Tika 0.8 Release Candidate #1

2010-11-09 Thread Mattmann, Chris A (388J)
Hi Folks, I have posted a candidate for the Apache Tika 0.8 release. The source code is at: http://people.apache.org/~mattmann/apache-tika-0.8/rc1/ See the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Maven2 release plugin, accordin

[ANNOUNCE] Welcome Maxim Valyanskiy as Tika PMC/Committer

2010-11-07 Thread Mattmann, Chris A (388J)
Hi Folks, A while back the Tika PMC nominated Maxim Valyanskiy for Tika committership and PMC membership. The VOTE tallies in Tika PMC-ville have occurred and I'm happy to announce that Max is now Tika committer! Max, feel free to say a little bit about yourself, and, welcome aboard! Cheers, Chr

My ApacheConNA 2010 slides

2010-11-06 Thread Mattmann, Chris A (388J)
are now posted online at Slideshare.net: http://s.apache.org/2ak Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: c

Re: Tika for mediawiki ?

2010-10-24 Thread Mattmann, Chris A (388J)
Hi Guys, > [...] > Until there is a complete spec for parsing media wiki markup, or a java > library that does a good job of extracting text from documents formatted with > media wiki markup, I don't think extracting text from media wiki markup > documents is in scope for Tika. I'd disagree with

Re: FYI: Tika Integration for TYPO3 CMS

2010-10-14 Thread Mattmann, Chris A (388J)
Wow awesome Ingo! Cheers, Chris On 10/14/10 8:54 AM, "Ingo Renner" wrote: Hi there, just to let you know and for those who might be interested: I have just released an extension for the TYPO3 Open Source CMS which integrates Tika as services to extract meta data, text and detect languages

Re: Supported Metadata Tags

2010-09-29 Thread Mattmann, Chris A (388J)
Hey Grant, In the latest version of Tika 0.8 trunk, I added a utility to print the supported metadata models and associated tags that are part of Tika. If you¹ve built a fresh copy of tika-app, you can do: java ­jar tika-app-0.8-SNAPSHOT.jar --list-met-models That will print you out a list of:

Great 2-part blog article on Apache Tika

2010-09-24 Thread Mattmann, Chris A (388J)
by our very own Nick Burch! :) See here: http://s.apache.org/JMu Awesome job, Nick! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailst

Re: Fails to detect language for UTF-8 file, but it works for ISO-latin

2010-08-24 Thread Mattmann, Chris A (388J)
+1... On 8/24/10 8:00 AM, "Jukka Zitting" wrote: Hi, On Sat, Aug 21, 2010 at 5:55 PM, Jan Høydahl / Cominvent wrote: > Detected as english. The same is true for the other test language files. > It does not detect language for UTF-8 encoded files. The tika-app jar doesn't do language detectio

Re: Support for adding language profiles dynamically.

2010-08-20 Thread Mattmann, Chris A (388J)
Hi Jan, +1, this approach seems sound. Feel free to file an issue and submit a patch, otherwise if I get some time next week I can take a look at it. Cheers, Chris On 8/20/10 1:56 PM, "Jan Høydahl / Cominvent" wrote: Hi, Currently the Tika LanguageIdentifier loads language profiles thorugh

Feathercast podcast on Tika

2010-08-14 Thread Mattmann, Chris A (388J)
Hi Guys, In April 2010, I did a podcast about Tika when we went TLP. Rich Bowen, an ASF member and the creator of Feathercast [1], an unofficial podcast about ASF projects, interviewed me about Tika and it's now online. You can find it here [2]. Thanks! Cheers, Chris [1] http://feathercast.org/

Re: How to use Tika library jars?

2010-08-09 Thread Mattmann, Chris A (388J)
Hi Mark, No worries. You can use the tika-app-X.Y.jar, or you can include the jars specifically according to here: http://tika.apache.org/0.7/gettingstarted.html HTH, Chris On 8/9/10 8:50 PM, "Mark Kerzner" wrote: Sorry, stupid question - it's all in the documentation. Mark On Mon, Aug 9,

FW: [ESIP-all] Announcement AGU Session Earth and Space Science Informatics IN10: Open Source Remote Sensing for Environmental Mapping and Analysis

2010-08-04 Thread Mattmann, Chris A (388J)
(apologies for the cross posting) All, FYI below is some information on two special sessions of AGU this December in San Francisco, CA. The first involves open source software and remote sensing. If you are using any Apache software in the area of remote sensing, you might consider submitting to t

Post link to Tika in Action book on Tika website?

2010-08-02 Thread Mattmann, Chris A (388J)
Hi Tika community, Jukka Zitting and I are working on the Tika in Action book [1]. How would everyone feel about us posting a link to it on the Tika website [2]? If so, I'll prepare a patch and update the website shortly. Cheers, Chris [1] http://manning.com/mattmann/ [2] http://tika.apache.org

Re: Test suite for Tika?

2010-07-09 Thread Mattmann, Chris A (388J)
Hi David, The unit tests for the tika-parsers modules contains the test documents in the directory here: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/ HTH, Chris On 7/9/10 8:32 PM, "David Kovar" wrote: Good evening, Is there an available set of

Short developerworks article on Tika

2010-06-16 Thread Mattmann, Chris A (388J)
Hi All, Oleg Tikhonov and I recently published a short IBM developerworks article on Tika. We wrote the article last year, and we've been working with the editor to put it online. You can check it out here: http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/ Thanks, Chris +

Tika in Action

2010-06-11 Thread Mattmann, Chris A (388J)
Hi Folks, Just wanted to give you an FYI that the book that Jukka Zitting and I are writing on Tika titled "Tika in Action" is now available through Manning's Early Access Program [1]. Feedback, comments welcome. Thanks! Cheers, Chris [1] http://www.manning.com/mattmann/ +

Welcome Julien Nioche, new Tika PMC member and committer

2010-06-05 Thread Mattmann, Chris A (388J)
Hi Folks, In recognition of his contributions to the Tika project, the Tika PMC has voted to make Julien Nioche a Tika PMC member and committer, and Julien has accepted! Julien, please feel free to say a few words about yourself, and most importantly, welcome aboard! Cheers, Chris +

FW: [Travel Assistance] - Applications Open for ApacheCon NA 2010

2010-05-17 Thread Mattmann, Chris A (388J)
(apologies for the cross posting) -- Forwarded Message From: "Gav..." Date: Sun, 16 May 2010 16:23:11 -0700 To: Subject: [Travel Assistance] - Applications Open for ApacheCon NA 2010 The Travel Assistance Committee is now taking in applications for those wanting to attend ApacheCon North Am

TLP project website moved

2010-05-11 Thread Mattmann, Chris A (388J)
Hi All, I've been given notice that we have an area set up at tika.apache.org for our new Tika TLP website. I've moved the existing Lucene website files over to tika.apache.org and expect that a sync will occur at some point soon. So, please update your bookmarks to reflect the new TLP website. W

Tika now listed on projects.a.o

2010-05-11 Thread Mattmann, Chris A (388J)
Hi All, Apache Tika is now listed on projects.a.o: http://projects.apache.org/projects/tika.html Yay! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office:

Mailing lists moved

2010-05-11 Thread Mattmann, Chris A (388J)
Hi All, INFRA-2645 [1] is complete and we now have official tika user and dev mailing lists. Please update your bookmarks to use: d...@tika.apache.org user@tika.apache.org Instead of tika-...@lucene.apache.org tika-u...@lucene.apache.org All existing subscribers to these lists should have been