Re: [DISCUSS] 1.6 Release?

2014-07-18 Thread Michael McCandless
+1 to release 1.6, thanks Chris! Mike McCandless http://blog.mikemccandless.com On Fri, Jul 18, 2014 at 1:51 AM, Mattmann, Chris A (3980) wrote: > There have been discussions in the past about release notes and > CHANGE log - in reality the release notes page is generated from > the change log

Re: [VOTE] Release Apache Tika 1.9 Candidate #1

2015-06-01 Thread Michael McCandless
+1 to release: I smoke tested tika-app-1.9.jar by running the Lucene in Action manuscript (MS Word, PDF, a bit of RTF) and spot checked the output. TIKA-1562 (adding all examples from the "Tika in Action" book) is very cool: thank you! This is something we never quite succeeded in doing with Luce

Jirasearch for Tika, Lucene, Solr, Infra issues: you can now drill down by attachment type

2017-05-29 Thread Michael McCandless
For users searching for Tika, Lucene, Solr and Infra issues at http://jirasearch.mikemccandless.com, I just improved the "Attachment?" facet field so you can now drill down to find issues according to what's attached to the issue (e.g. Patch, JAR, Image, etc.). For example, all Lucene issues that

Re: [Tika Wiki] Trivial Update of "ReleaseProcess" by MikeMcCandless

2011-08-14 Thread Michael McCandless
You're welcome! Mike McCandless http://blog.mikemccandless.com On Sun, Aug 14, 2011 at 12:24 PM, Mattmann, Chris A (388J) wrote: > Thanks Mike! > > Cheers, > Chris > > On Aug 14, 2011, at 4:08 AM, Apache Wiki wrote: > >> Dear Wiki user, >> >> You have subscribed to a wiki page or wiki category

Re: Issue in text extraction in Solr / Tika

2011-08-19 Thread Michael McCandless
Can you post some example docs that don't extract correctly? Or, better, open a Jira issue(s) and attach the documents there? Thanks, Mike McCandless http://blog.mikemccandless.com On Fri, Aug 19, 2011 at 7:49 AM, nirnaydewan wrote: > I am using Solr 3.3.0 using the attached jetty server. Whe

Re: Issue in text extraction in Solr / Tika

2011-08-19 Thread Michael McCandless
I ran Tika to get the text: > java -jar ./tika-app/target/tika-app-1.0-SNAPSHOT.jar -T 2011-01-23-7-22-09_sample.doc And it produces this output for me: +9245114107060 (M) E-Mail: coolgaas.1...@rediffmail.com To enhance the organizational development by self development and motivation from the

Re: Issue in text extraction in Solr / Tika

2011-08-20 Thread Michael McCandless
OK one correction: I ran the TikaCLI tool with the -T option, which extracts "main content only"; when I re-ran with the -t (lowercase) option, which outputs all plain text, then it looks like all text appears correctly (phew!). On moving to 0.9, that's your call -- I'm not sure what's changed sin

Re: Issue in text extraction in Solr / Tika

2011-08-20 Thread Michael McCandless
a bug and I think I know why it's happening... I'll open an issue. Mike McCandless http://blog.mikemccandless.com On Sat, Aug 20, 2011 at 6:40 AM, Michael McCandless wrote: > OK one correction: I ran the TikaCLI tool with the -T option, which > extracts "main content only&q

Re: Issue in text extraction in Solr / Tika

2011-08-20 Thread Michael McCandless
ds (see xml whitespace > rules). > > Uwe > -- > Uwe Schindler > H.-H.-Meier-Allee 63, 28213 Bremen > http://www.thetaphi.de > > > > Michael McCandless schrieb: > > One thing I still don't like is with the XML (-x) or XHTML (-h) > output, the result f

Re: Issue in text extraction in Solr / Tika

2011-08-20 Thread Michael McCandless
On Sat, Aug 20, 2011 at 10:19 AM, Uwe Schindler wrote: >> Hmm, actually: the element allows text, in addition to child elements? > So >> shouldn't any whitespace within the ... be treated as significant > (part of >> the content)? > > This is very indeed very complicated. For mixed content elemen

Re: Issue in text extraction in Solr / Tika

2011-08-20 Thread Michael McCandless
because this is strange? If you look at > XHTMLContentHandler it does not. So the newline must come from somewhere > else. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Origi

Re: Welcome Mike McCandless to the Tika PMC and as a Tika Committer

2011-08-29 Thread Michael McCandless
Thanks Chris! Here's a quick intro: I now work at IBM, who has (generously: thank you!) sponsored my contributions to Lucene/Solr for a long time now (like 5 years, wow!). Before that I was co-founder of a startup called iPhrase Technologies, selling enterprise search software; we didn't use Luc

Re: Welcome Mike McCandless to the Tika PMC and as a Tika Committer

2011-08-29 Thread Michael McCandless
now met you here! Welcome! > > > > > On Mon, Aug 29, 2011 at 6:14 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Thanks Chris! >> >> Here's a quick intro: >> >> I now work at IBM, who has (generously: thank you!) spon

Jira karma

2011-08-30 Thread Michael McCandless
Could someone (jira admin) please give me (mikemccand in Jira) enough karma so I can assign issues to myself? Thanks! Mike McCandless http://blog.mikemccandless.com

Re: Jira karma

2011-08-30 Thread Michael McCandless
Thank you! Mike McCandless http://blog.mikemccandless.com On Tue, Aug 30, 2011 at 1:25 PM, Jukka Zitting wrote: > Hi, > > On Tue, Aug 30, 2011 at 7:20 PM, Michael McCandless > wrote: >> Could someone (jira admin) please give me (mikemccand in Jira) enough >> karma s

Re: svn commit: r1163336 - in /tika/trunk/tika-parsers/src/test: java/org/apache/tika/parser/rtf/ resources/test-documents/

2011-08-30 Thread Michael McCandless
Ahh OK I will fix! Mike McCandless http://blog.mikemccandless.com On Tue, Aug 30, 2011 at 5:35 PM, Jukka Zitting wrote: > Hi, > > On Tue, Aug 30, 2011 at 9:07 PM,   wrote: >> +        assertContains("zażółć gęślÄ… jaźń", content); >> +        assertContains("ZAŻÓŠĆ GĘŚLÄ„ JAŹŃ",

when Tika closes InputStreams

2011-08-31 Thread Michael McCandless
On digging more into this I hit some questions/confusion: I think there are actually times when the other parse methods do close the input, eg if the parser wraps the incoming InputStream in a TikaInputStream, and then uses .getFile(), we copy the file contents to a temp file and close the origina

Re: svn commit: r1163336 - in /tika/trunk/tika-parsers/src/test: java/org/apache/tika/parser/rtf/ resources/test-documents/

2011-09-01 Thread Michael McCandless
On Tue, Aug 30, 2011 at 5:35 PM, Jukka Zitting wrote: > Hi, > > On Tue, Aug 30, 2011 at 9:07 PM,   wrote: >> +        assertContains("zażółć gęślÄ… jaźń", content); >> +        assertContains("ZAŻÓŠĆ GĘŚLÄ„ JAŹŃ", content); > > I think it would be best if we used \u escapes for

Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/

2011-09-01 Thread Michael McCandless
Can we just remove (not deprecate) TemporaryFiles...? (We are not at 1.0 release yet). Mike McCandless http://blog.mikemccandless.com On Thu, Sep 1, 2011 at 5:38 AM, wrote: > Author: jukka > Date: Thu Sep  1 09:38:04 2011 > New Revision: 1163970 > > URL: http://svn.apache.org/viewvc?rev=11639

Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/

2011-09-01 Thread Michael McCandless
pands into descriptive prose). Mike McCandless http://blog.mikemccandless.com On Thu, Sep 1, 2011 at 7:33 AM, Jukka Zitting wrote: > Hi, > > On Thu, Sep 1, 2011 at 12:23 PM, Michael McCandless > wrote: >> Can we just remove (not deprecate) TemporaryFiles...? >> (We are n

Re: Resource management patterns (Was: Tika leaves files open)

2011-09-01 Thread Michael McCandless
On Thu, Sep 1, 2011 at 7:26 AM, Jukka Zitting wrote: > Hi, > > [update subject, move to dev@] > > On Thu, Sep 1, 2011 at 12:41 PM, Uwe Schindler wrote: >> With our internal Lucene IOUtils it's even simplier, see javadocs :-) > > Yep, Lucene's version is certainly better. > >> It's just a few line

Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/

2011-09-02 Thread Michael McCandless
On Thu, Sep 1, 2011 at 12:39 PM, Jukka Zitting wrote: > Hi, > > On Thu, Sep 1, 2011 at 5:08 PM, Michael McCandless > wrote: >> We might want to mark APIs like TemporaryResources "internal" in the >> javadocs, ie, that we reseve the right to suddenly change t

Re: 1.0 RC in next 2 weeks

2011-09-16 Thread Michael McCandless
+1 Mike McCandless http://blog.mikemccandless.com On Fri, Sep 16, 2011 at 4:32 AM, Jukka Zitting wrote: > Hi, > > On Fri, Sep 16, 2011 at 5:09 AM, Mattmann, Chris A (388J) > wrote: >> That said, I'm happy when the dev community of Tika is ready >> to cut a release, and will gladly RC it. It's

Re: Release date of tika 1.0 or 0.10

2011-09-23 Thread Michael McCandless
I think before we release 0.10 we should address TIKA-712? I don't think we should hold the release... I think we should just turn off the new functionality (to extract text from master slides) for the time being, until we work out how to fix it more correctly, because right now it's always extrac

Re: Release date of tika 1.0 or 0.10

2011-09-24 Thread Michael McCandless
emccandless.com On Fri, Sep 23, 2011 at 6:03 PM, Mattmann, Chris A (388J) wrote: > Hey Mike, > > That's fine by me. If you could turn it off and commit before this weekend I'd > appreciate it. > > Cheers, > Chris > > On Sep 23, 2011, at 12:26 PM, Michael McCa

Re: Release date of tika 1.0 or 0.10

2011-09-24 Thread Michael McCandless
OK committed! Release away :) Mike McCandless http://blog.mikemccandless.com On Sat, Sep 24, 2011 at 6:30 AM, Michael McCandless wrote: > OK I will do that... I *think* it's just a matter of fixing XSLF and > HSLF parsers to not visit the master slide. > > I'll commit

Re: Release date of tika 1.0 or 0.10

2011-09-24 Thread Michael McCandless
Thanks Nick. I'll keep digging on TIKA-712 as to how we can figure out which master elements should and should not be extracted... Mike McCandless http://blog.mikemccandless.com On Sat, Sep 24, 2011 at 7:07 AM, Nick Burch wrote: > On Sat, 24 Sep 2011, Michael McCandless wrote: >>

Re: [VOTE] Apache Tika 0.10 release rc #1

2011-09-26 Thread Michael McCandless
+1 to release! I verified the signatures, and smoke tested the JAR on a few docs (should we name it apache-tika-app-NN.jar in the future? Ie, add apache- in front), and ran "mvn clean install test" from the src zip. Mike McCandless http://blog.mikemccandless.com On Mon, Sep 26, 2011 at 11:44 A

Re: apache-tika-app? (Was: [VOTE] Apache Tika 0.10 release rc #1)

2011-09-26 Thread Michael McCandless
On Mon, Sep 26, 2011 at 12:20 PM, Jukka Zitting wrote: > Hi, > > On Mon, Sep 26, 2011 at 6:03 PM, Michael McCandless > wrote: >> (should we name it apache-tika-app-NN.jar in the future?  Ie, add >> apache- in front) > > I don't think that's needed,

Re: Jenkins build became unstable: Tika-trunk » Apache Tika parsers #657

2011-10-01 Thread Michael McCandless
Sorry, this was my bad -- failed to add the test file for the new test case. Should be fixed now... Mike McCandless http://blog.mikemccandless.com On Sat, Oct 1, 2011 at 7:07 AM, Apache Jenkins Server wrote: > See >

Re: Build failed in Jenkins: Tika-trunk #664

2011-10-05 Thread Michael McCandless
Ugh, my bad: Java 1.6 only code. I'll fix... Mike McCandless http://blog.mikemccandless.com On Wed, Oct 5, 2011 at 7:13 AM, Apache Jenkins Server wrote: > See > > Changes: > > [mikemccand] TIKA-742: extract paragraphs inside PDF pages > >

Re: Updating CHANGES.txt?

2011-10-19 Thread Michael McCandless
Sorry, I've been updating CHANGEs as I go (habit carried over from Lucene-land!). I think I favor update-as-you-go for various reasons: * The person who made the fix knows best what to say, while it's still fresh on their mind, vs RM who has to re-interpret, from a distance, some time l

Re: Updating CHANGES.txt?

2011-10-19 Thread Michael McCandless
On Wed, Oct 19, 2011 at 9:25 AM, Jukka Zitting wrote: > Hi, > > On Wed, Oct 19, 2011 at 1:06 PM, Nick Burch wrote: >> Quick query - I notice that some people are updating CHANGES.txt when they >> close out issues. I had thought that part of the release process was going >> through the FIXED list

Re: Updating CHANGES.txt?

2011-10-19 Thread Michael McCandless
On Wed, Oct 19, 2011 at 9:59 AM, Mattmann, Chris A (388J) wrote: > Yah I agree with Jukka here, but don't worry too much Mike if you're verbose > (or > anyone else for that matter). The RM (aka "moi" ;) ) always can take a look > at CHANGES.txt at the end of a release cycle (speaking of which, 1

Re: Updating CHANGES.txt?

2011-10-20 Thread Michael McCandless
On Wed, Oct 19, 2011 at 4:42 PM, Jukka Zitting wrote: >> Or... are you saying CHANGES should not include "minor" issues?  Only >> "big" ones?  And users need to go to Jira to get a complete list? > > Yes, basically. If you're doing something that you think the average > user of Tika should be awa

Re: Google's Compact Language Detector

2011-10-24 Thread Michael McCandless
I've only scratched the surface in figuring out how CLD works... excising the code and exposing a Python wrapper is much easier than actually understanding it! It has some neat features, like passing in three possible "hints": * domain extension (fr boosts French) * declared encoding * de

Re: Google's Compact Language Detector

2011-10-24 Thread Michael McCandless
On Mon, Oct 24, 2011 at 2:15 PM, Ken Krugler wrote: > Sounds like a great idea - see the recent comment thread on > https://issues.apache.org/jira/browse/TIKA-431 for some related discussions. > > And there's also https://issues.apache.org/jira/browse/TIKA-539 Those do look related (if you swap

Re: Google's Compact Language Detector

2011-10-25 Thread Michael McCandless
ndless.com On Mon, Oct 24, 2011 at 4:53 PM, Michael McCandless wrote: > On Mon, Oct 24, 2011 at 2:15 PM, Ken Krugler > wrote: > >> Sounds like a great idea - see the recent comment thread on >> https://issues.apache.org/jira/browse/TIKA-431 for some related di

Re: Google's Compact Language Detector

2011-10-25 Thread Michael McCandless
On Tue, Oct 25, 2011 at 12:32 PM, Robert Muir wrote: > On Tue, Oct 25, 2011 at 12:12 PM, Michael McCandless > wrote: > >> Tika seems to have a lot of trouble with Spanish (confuses w/ >> Galician) and Danish (confuses with Dutch). > > s/Dutch/Norwegian/ Woops, tha

Re: Tika is waiting for ODFToolkit to improve ODF file format processing

2011-10-25 Thread Michael McCandless
On Mon, Oct 24, 2011 at 9:17 AM, Rob Weir wrote: > On Mon, Oct 24, 2011 at 4:54 AM, Devin Han wrote: >> I saw this issue in Tika: OpenOffice parser: master footer text isn't >> extracted https://issues.apache.org/jira/browse/TIKA-736 >> >> The current ODF parser of Tika doesn't touch the styles p

Re: Tika is waiting for ODFToolkit to improve ODF file format processing

2011-10-26 Thread Michael McCandless
On Tue, Oct 25, 2011 at 5:40 PM, Rob Weir wrote: > Is there a list of the complete set of tags you use, or a schema or something? Hmm, I think technically any tags that are valid XHTML is fair game, but in practice the parsers seems to use a very limited set of tags (table/td/tr, a, img, p, br,

Re: Updating CHANGES.txt?

2011-10-27 Thread Michael McCandless
On Wed, Oct 26, 2011 at 2:03 PM, Jukka Zitting wrote: > On Thu, Oct 20, 2011 at 2:26 PM, Michael McCandless > wrote: >> But I think API changes, issues a user has hit, new features, changes >> in behavior, we really should include. Generally, when I'm unsure, I >&

Re: Updating CHANGES.txt?

2011-10-27 Thread Michael McCandless
On Thu, Oct 27, 2011 at 10:05 AM, Jukka Zitting wrote: > Hi, > > On Thu, Oct 27, 2011 at 2:52 PM, Michael McCandless > wrote: >> Maybe, we can inline the references to the issues, so the user knows >> which is which? > > Sounds good, +1. > > BTW, I also mad

Re: A problem in the right-to-left languages

2011-11-01 Thread Michael McCandless
On Tue, Nov 1, 2011 at 8:48 AM, Robert Muir wrote: > I really think tika should include the parts of icu4j it depends on. > Often open source projects are hesitant to include icu jar because of > its size, but thats silly since the size is just a catch-all. > We can use the webapp to make a small

Re: [VOTE] Apache Tika 1.0 release rc #1

2011-11-04 Thread Michael McCandless
+1 to release Tested on Fedora 13, passed tests from the src zip, verified MD5 sums, extracted text from a few files using the app.jar. Looks good! Mike McCandless http://blog.mikemccandless.com On Fri, Nov 4, 2011 at 11:42 AM, Mattmann, Chris A (388J) wrote: > Hi Folks, > > A candidate for t

Re: Multilingual Tika

2011-11-05 Thread Michael McCandless
I would love to see better integration w/ dynamic languages! I can help on the Python side. Can we simply wrap Tika's APIs using jcc, to expose in Python? Ooh, it's already been done: http://redmine.djity.net/projects/pythontika/wiki Mike McCandless http://blog.mikemccandless.com 2011/11/5 Jé

Re: Updating CHANGES.txt?

2011-11-10 Thread Michael McCandless
Ooh, very nice :) Mike McCandless http://blog.mikemccandless.com On Thu, Nov 10, 2011 at 4:34 AM, Jukka Zitting wrote: > Hi, > > The effort spent on CHANGES.txt is clearly paying off. See for example > [1] where the information is nicely being spread to a wider audience. > > [1] http://java.dzo

Re: Possible re-opening of resolved issue TIKA-738?

2011-11-26 Thread Michael McCandless
Yes please go ahead and reopen TIKA-738... sounds like something is wrong! Thanks. Mike McCandless http://blog.mikemccandless.com On Fri, Nov 25, 2011 at 9:25 PM, John M wrote: > Hello, > > When I use the latest build of the Tika application jar's CLI with the > -h option to parse testAnnotati

Re: Possible re-opening of resolved issue TIKA-738?

2011-11-26 Thread Michael McCandless
later, which I > think we should avoid. > > Cheers, > Chris > > On Nov 26, 2011, at 3:56 AM, Michael McCandless wrote: > >> Yes please go ahead and reopen TIKA-738... sounds like something is wrong! >> >> Thanks. >> >> Mike McCandless >> >

Re: [ANNOUNCE] Welcome Jerome Charron as Tika committer + PMC member

2011-12-12 Thread Michael McCandless
Welcome Jerome! Mike McCandless http://blog.mikemccandless.com On Mon, Dec 12, 2011 at 1:26 PM, Mattmann, Chris A (388J) wrote: > Hi Folks, > > Please welcome Jerome Charron to the ranks of the Tika PMC and as a Tika > committer. > He's just been VOTEd in and we're really happy to have him aro

Re: [ANNOUNCE] Welcome Antoni Mylka as Tika committer + PMC member

2011-12-12 Thread Michael McCandless
Welcome Antoni! Mike McCandless http://blog.mikemccandless.com On Mon, Dec 12, 2011 at 1:19 PM, Antoni Mylka wrote: > W dniu 2011-12-12 17:58, Mattmann, Chris A (388J) pisze: > >> Hi Folks, >> >> Please welcome Antoni Mylka to the ranks of the Tika PMC and as a Tika >> committer. >> He's just b

Re: Pushing parsers upstream

2011-12-13 Thread Michael McCandless
+0 I agree, logically, parsers "belong" with their upstream project,since as that project improves how the document format is cracked,they can also make the matching fixes to Tika's parser.  As long asthere's enough love / advocate / testing for the Tika parser in thatproject... My only concern is

Re: [VOTE] Apache Tika 1.1 release rc #1

2012-03-09 Thread Michael McCandless
+1 to release. I used tika-app-1.1.jar to successfully extract all text from the Lucene in Action 2nd ed manuscript (PDF and MS Word). CHANGES looks good too. Mike McCandless http://blog.mikemccandless.com On Wed, Mar 7, 2012 at 4:35 PM, Mattmann, Chris A (388J) wrote: > Hi Folks, > > A candi

Re: Pluggable language detection

2012-03-21 Thread Michael McCandless
On Wed, Mar 21, 2012 at 12:55 PM, Ken Krugler wrote: > > On Mar 21, 2012, at 8:51am, Julien Nioche wrote: > >> Hi guys, >> >> Just wondering about the best way to make the language detection pluggable >> instead of having it hard-wired as it is now. We now that the resources >> that are currently

Re: Interested in being a commiter to tika

2012-04-23 Thread Michael McCandless
That's wonderful! Apache is a meritocracy, meaning one works their way towards becoming a committer by doing stuff -- posting patches, reviewing patches, commenting on issues, answering user's questions, posting ideas to the dev list, etc. Perhaps a good place to start would be to browse through

Re: [DISCUSS] Apache Tika 1.2 RC?

2012-05-28 Thread Michael McCandless
+1 to release 1.2! The more frequent releases the better :) Thanks Chris. Mike McCandless http://blog.mikemccandless.com On Mon, May 28, 2012 at 12:20 AM, Mattmann, Chris A (388J) wrote: > Hey Guys, > > Looking at CHANGES.txt, we've got some nice new features and a few bug fixes. > Is it time

Re: Welcome Ray Gauss as a Tika committer/PMC

2012-06-08 Thread Michael McCandless
Welcome aboard Ray! Happy committing, Mike McCandless http://blog.mikemccandless.com On Fri, Jun 8, 2012 at 11:14 AM, Nick Burch wrote: > Hi All > > Many of you will have seen the JIRAs / patches from Ray Gauss over the last > few months, especially around metadata. I'm pleased to announce tha

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Michael McCandless
+1 I smoke tested, extracting text for the Lucene in Action PDF (looked good), and verified TIKA-948 is fixed. Why are there original-tika-app* files in the RC directory? Also, we used to name it apache-tika-1.1.src.* but now we dropped the apache- prefix? Is that intentional? (tika-app jar ha

Re: Welcome to our new Tika PMC chair!

2012-08-20 Thread Michael McCandless
Welcome Dave! Mike McCandless http://blog.mikemccandless.com On Sun, Aug 19, 2012 at 1:14 PM, Mattmann, Chris A (388J) wrote: > Hey Folks, > > I decided to step down as chair of the Apache Tika PMC. We have a new chair, > who > graciously volunteered to step up and handle the chair duties, Dav

Tika's Jenkins builds

2012-09-03 Thread Michael McCandless
It looks like Tika is configured to kick off a build only on a commit? But then, if the build fails, will it kick off another one at some point...? It looks like the build failed after my last commit, but it looks like spurious failure (StreamCorruptedException): https://builds.apache.org/jo

Re: Tika's Jenkins builds

2012-09-03 Thread Michael McCandless
On Mon, Sep 3, 2012 at 3:29 PM, Dave Meikle wrote: > Hi Mike, > > On 3 Sep 2012, at 12:15, Michael McCandless wrote: > >> ... >> Separately, could I please get a Jenkins web login? It looks like our >> PMC Chair can do this: >> >>

Re: Jenkins build became unstable: Tika-trunk #926

2012-10-09 Thread Michael McCandless
Woops, I committed a fix ... Mike McCandless http://blog.mikemccandless.com On Tue, Oct 9, 2012 at 1:20 PM, Apache Jenkins Server wrote: > See >

Re: Build failed in Jenkins: Tika-trunk #928

2012-10-10 Thread Michael McCandless
Hmmm ... looks like we (Apache) have this bug open on Jenkins for this StreamCorruptedException: https://issues.jenkins-ci.org/browse/JENKINS-13395 I kicked off another build Mike McCandless http://blog.mikemccandless.com On Wed, Oct 10, 2012 at 8:19 AM, Apache Jenkins Server wrote: >

Re: Build failed in Jenkins: Tika-trunk #934

2012-10-29 Thread Michael McCandless
It looks like "new DecimalFormatSymbols(Locale)" is Java-6 only API? Mike McCandless http://blog.mikemccandless.com On Sun, Oct 28, 2012 at 10:04 PM, Apache Jenkins Server wrote: > See > > Changes: > > [rgauss] TIKA-984: JpegParserTest fail

Re: Build failed in Jenkins: Tika-trunk #934

2012-10-29 Thread Michael McCandless
util.Locale) > > > On Oct 29, 2012, at 6:48 AM, Michael McCandless > wrote: > >> It looks like "new DecimalFormatSymbols(Locale)" is Java-6 only API? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Sun, Oct 28,

Re: Build failed in Jenkins: Tika-trunk #943

2012-11-18 Thread Michael McCandless
Looks like another Jenkins hiccup: Nov 19, 2012 12:30:20 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run SEVERE: I/O error in channel channel java.io.StreamCorruptedException at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1332) at java.io.ObjectInpu

Re: Build failed in Jenkins: Tika-trunk #948

2012-12-02 Thread Michael McCandless
Hmm: message : Failed to execute goal org.apache.rat:apache-rat-plugin:0.7:check (default) on project tika-app: Too many unapproved licenses: 1 cause : Too many unapproved licenses: 1 I can't see in this output which license is invalid ... does anyone know what's going on? Separately, Je

Re: svn commit: r1416195 - /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java

2012-12-02 Thread Michael McCandless
Woops, thanks Jukka! Mike McCandless http://blog.mikemccandless.com On Sun, Dec 2, 2012 at 11:15 AM, wrote: > Author: jukka > Date: Sun Dec 2 16:15:26 2012 > New Revision: 1416195 > > URL: http://svn.apache.org/viewvc?rev=1416195&view=rev > Log: > TIKA-1031: TikaCLI doesn't create sub-dirs wh

Re: Build failed in Jenkins: Tika-trunk #948

2012-12-02 Thread Michael McCandless
On Sun, Dec 2, 2012 at 11:17 AM, Jukka Zitting wrote: > Hi, > > On Sun, Dec 2, 2012 at 2:50 PM, Michael McCandless > wrote: >> Hmm: >> >> message : Failed to execute goal >> org.apache.rat:apache-rat-plugin:0.7:check (default) on project >>

Re: Build failed in Jenkins: Tika-trunk #948

2012-12-02 Thread Michael McCandless
On Sun, Dec 2, 2012 at 11:41 AM, Jukka Zitting wrote: > Hi, > > On Sun, Dec 2, 2012 at 6:26 PM, Michael McCandless > wrote: >> On Sun, Dec 2, 2012 at 11:17 AM, Jukka Zitting >> wrote: >>> Looks like there's a subdir/foo.txt file in the tika-app director

Re: [jira] [Updated] (TIKA-1048) XMLParser should add whitespace between elements

2012-12-20 Thread Michael McCandless
y be consider using of UIMA ("the rule engine") ? > > BR, > Oleg > > > > On Thu, Dec 20, 2012 at 1:05 PM, Michael McCandless (JIRA) > wrote: > >> >> [ >> https://issues.apache.org/jira/browse/TIKA-1048?page=com.atlassian.jira.plugin.syst

build hung?

2012-12-23 Thread Michael McCandless
I think this build is stuck? https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java6/15738/console TestStressVersions.testStressGetRealtimeVersions beating heart for 47656 sec... Mike McCandless http://blog.mikemccandless.com -- Forwarded message -- From: Michael

Re: build hung?

2012-12-23 Thread Michael McCandless
ler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Sunday, December 23, 2012 12:59 PM >> To: dev@tika.

Re: Tika Parser 1.2 - MP4Parser.java Query

2013-01-07 Thread Michael McCandless
Tika pulls this library in as a dependency; see tika-parses/pom.xml: com.googlecode.mp4parser isoparser 1.0-RC-1 Mike McCandless http://blog.mikemccandless.com On Mon, Jan 7, 2013 at 9:58 AM, Sharon Corbett wrote: > Hi; > > I have a question regarding the MP4Parser.

Re: Tika Parser 1.2 - MP4Parser.java Query

2013-01-08 Thread Michael McCandless
Hi Sharon (CC'd directly in my response), We answered this question yesterday (eg see http://lucene.472066.n3.nabble.com/Tika-Parser-1-2-MP4Parser-java-Query-td4031254.html) but I think you must not be subscribed to the list so you missed it? Mike McCandless http://blog.mikemccandless.com On Tu

Re: Tika Parser 1.2 - MP4Parser.java Query

2013-01-08 Thread Michael McCandless
Super, welcome! Mike McCandless http://blog.mikemccandless.com On Tue, Jan 8, 2013 at 11:07 AM, Sharon Corbett wrote: > Thank you for the response and follow-up! > > I'm now subscribed:-) > > Regards, > Sharon > > -Original Message----- > From

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-08 Thread Michael McCandless
+1 for a 1.3 release! Mike McCandless http://blog.mikemccandless.com On Tue, Jan 8, 2013 at 4:56 PM, Dave Meikle wrote: > Hi All, > > We have got some new features and bugs fixed with a couple of outstanding > binary compatibility ones (TIKA-962, TIKA-963) fixed on trunk, so I was > wondering

Re: svn commit: r1431313 - /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

2013-01-10 Thread Michael McCandless
Can you add a CHANGES entry for this? Thanks. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 10, 2013 at 7:11 AM, wrote: > Author: nick > Date: Thu Jan 10 12:11:43 2013 > New Revision: 1431313 > > URL: http://svn.apache.org/viewvc?rev=1431313&view=rev > Log: > Tika-1055 patch from

Re: svn commit: r1431316 - in /tika/trunk: CHANGES.txt tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

2013-01-10 Thread Michael McCandless
Thanks Nick! Mike McCandless http://blog.mikemccandless.com On Thu, Jan 10, 2013 at 7:20 AM, wrote: > Author: nick > Date: Thu Jan 10 12:20:56 2013 > New Revision: 1431316 > > URL: http://svn.apache.org/viewvc?rev=1431316&view=rev > Log: > Remove three duplicated mimetype entries (keeping the

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Michael McCandless
+1, but I think you need to add the KEYS file? Tests passed from the source release, and I smoke tested the tika-app-1.3 JAR extracting text from the Lucene in Action 2 manuscript ... it looks good. Thanks Dave! Mike McCandless http://blog.mikemccandless.com On Fri, Jan 18, 2013 at 11:30 PM, D

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Michael McCandless
u need it: > > curl -O http://www.apache.org/dist/tika/KEYS > gpg --import < KEYS > > Cheers, > Chris > > On 1/20/13 3:35 AM, "Michael McCandless" wrote: > >>+1, but I think you need to add the KEYS file? >> >>Tests passed from the source release,

Re: KEYS file and dist.apache.org (Re: [VOTE] Apache Tika 1.3 Release Candidate #1)

2013-01-21 Thread Michael McCandless
Thanks Jukka. I had thought we were supposed to ship the KEYS file next to all release bits. As far as I can tell Lucene and Tika have done this for their past releases, eg: http://www.eng.lsu.edu/mirrors/apache/tika and http://apache.mesi.com.ar/lucene/java/4.0.0 But it sounds like this is actu

Re: KEYS file and dist.apache.org (Re: [VOTE] Apache Tika 1.3 Release Candidate #1)

2013-01-21 Thread Michael McCandless
On Mon, Jan 21, 2013 at 7:13 AM, Jukka Zitting wrote: > Hi, > > On Mon, Jan 21, 2013 at 1:39 PM, Michael McCandless > wrote: >> I had thought we were supposed to ship the KEYS file next to all release >> bits. > > Yes, my point is just that it's better if it&#

Re: Build failed in Jenkins: Tika-trunk #977

2013-02-07 Thread Michael McCandless
Hmm it looks like the Tika build is failing on Jenkins due to this: [ERROR] /home/jenkins/jenkins-slave/workspace/Tika-trunk/trunk/tika-server/src/main/java/org/apache/tika/server/CSVMessageBodyWriter.java:[51,3] method does not override a method from its superclass [ERROR] /home/jenkins/jenkins

Re: Build failed in Jenkins: Tika-trunk #977

2013-02-08 Thread Michael McCandless
On Thu, Feb 7, 2013 at 3:51 PM, Nick Burch wrote: > On Thu, 7 Feb 2013, Michael McCandless wrote: >> >> Hmm it looks like the Tika build is failing on Jenkins due to this: >> >> [ERROR] >> /home/jenkins/jenkins-slave/workspace/Tika-trunk/trunk/tika-server/src

Re: [DISCUSS] Should Tika require Java6? (was Re: Build failed in Jenkins: Tika-trunk #977)

2013-02-08 Thread Michael McCandless
-888 >> >> http://mail-archives.apache.org/mod_mbox/tika-dev/201011.mbox/%3CC8F38B50.2 >> 3828%25chris.a.mattm...@jpl.nasa.gov%3E >> >> >> I'm +1 for it. Seems like so is Mike, and also Ken K. Any objections from >> others to require Java6? >> >>

Re: svn commit: r1443963 - in /tika/trunk/tika-server/src/main/java/org/apache/tika/server: CSVMessageBodyWriter.java JSONMessageBodyWriter.java

2013-02-08 Thread Michael McCandless
Sure! Mike McCandless http://blog.mikemccandless.com On Fri, Feb 8, 2013 at 7:02 PM, Mattmann, Chris A (388J) wrote: > Thanks Mike! > > On 2/8/13 3:54 AM, "mikemcc...@apache.org" wrote: > >>Author: mikemccand >>Date: Fri Feb 8 11:54:26 2013 >>New Revision: 1443963 >> >>URL: http://svn.apache.

Re: [DISCUSS] Should Tika require Java6? (was Re: Build failed in Jenkins: Tika-trunk #977)

2013-02-12 Thread Michael McCandless
://blog.mikemccandless.com On Sat, Feb 9, 2013 at 9:52 AM, Dave Meikle wrote: > +1 from me. > > Cheers, > Dave > > On 8 Feb 2013, at 17:49, Michael McCandless wrote: > >> +1 >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> >

Re: [DISCUSS] Should Tika require Java6? (was Re: Build failed in Jenkins: Tika-trunk #977)

2013-02-12 Thread Michael McCandless
t it, we're good. I think there haven't been any objections and > the discussion has sort of died down. > > Let's commit it! > > Cheers, > Chris > > > On 2/12/13 3:15 AM, "Michael McCandless" wrote: > >>Seems like this passes? >> >

Re: Build failed in Jenkins: Tika-trunk #980

2013-02-12 Thread Michael McCandless
Hmm, that didn't work. It looks like we have to fix our JAVA_HOME to point to a 1.6+ java: http://stackoverflow.com/questions/11328677/error-when-using-javac-javac-invalid-flag-s OK I managed to log in to builds.apache.org and change the JVM to 1.6 latest (it was on 1.5 latest), then kicked off a

Re: Jenkins build is back to normal : Tika-trunk #981

2013-02-12 Thread Michael McCandless
Yay, Java 1.6 :) Mike McCandless http://blog.mikemccandless.com On Tue, Feb 12, 2013 at 2:59 PM, Apache Jenkins Server wrote: > See >

Re: Build failed in Jenkins: Tika-trunk #986

2013-02-22 Thread Michael McCandless
Hmmm: ERROR: Maven JVM terminated unexpectedly with exit code 143 I think that means the JVM was killed with SIGTERM. I'll kick off a new build Mike McCandless http://blog.mikemccandless.com On Fri, Feb 22, 2013 at 3:30 PM, Apache Jenkins Server wrote: > See

Re: Build failed in Jenkins: Tika-trunk #994

2013-05-01 Thread Michael McCandless
I just kicked off another build ... (it's queued). Mike McCandless http://blog.mikemccandless.com On Wed, May 1, 2013 at 5:12 PM, Ray Gauss II wrote: > Looks like a possible build server problem. Does anyone have access to > manually trigger another build? > > Regards, > > Ray > > On May 1,

Re: [DISCUSS] Apache Tika 1.4 RC?

2013-05-27 Thread Michael McCandless
+1, thanks Chris! Mike McCandless http://blog.mikemccandless.com On Mon, May 27, 2013 at 1:06 PM, Mattmann, Chris A (398J) wrote: > Hey Guys, > > I have some free cycles this week -- and the energy to produce a Tika 1.4 > RC. Sound good? I cleaned up JIRA and got all resolved (22) issues done

Re: Parser does not produce proper sentence breaks?

2013-06-03 Thread Michael McCandless
First off, those 3 *'s you see are annoying :) They are coming from the master slide, due to this issue: https://issues.apache.org/jira/browse/TIKA-1067 Would be nice to figure out how to stop this "false text" from coming out. Second, I think the PPT/X parsers do not put any information ab

Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Michael McCandless
On Sun, Jun 16, 2013 at 6:21 AM, Dave Meikle wrote: > Hi, > > On 16 June 2013 04:52, Chris Mattmann wrote: >> >> Please vote on releasing this package as Apache Tika 1.4. >> The vote is open for the next 72 hours and passes if a majority of at >> least three +1 Tika PMC votes are cast. >> >>

Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Michael McCandless
On Sun, Jun 16, 2013 at 7:00 AM, Michael McCandless wrote: > On Sun, Jun 16, 2013 at 6:21 AM, Dave Meikle wrote: >> Hi, >> >> On 16 June 2013 04:52, Chris Mattmann wrote: >>> >>> Please vote on releasing this package as Apache Tika 1.4. >>> The vo

Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Michael McCandless
On Sun, Jun 16, 2013 at 9:00 AM, Dave Meikle wrote: > Hi Mike, > > On 16 June 2013 12:05, Michael McCandless wrote: > >> OK I committed that fix but I haven't tested on Windows (I don't have >> quick access to a Windows box). Can someone confirm that the tes

Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Michael McCandless
On Sun, Jun 16, 2013 at 10:13 AM, Uwe Schindler wrote: > I can setup a windows build on the well-known "Policeman Jenkins" server with > the famous random JDK versions and many more features, running Lucene tests > in 24/7 :-) > http://goo.gl/qnxlJ for the talk > http://jenkins.thetaphi.de/ +1!

Re: [VOTE] Apache TIka 1.4 Release Candidate #1

2013-06-16 Thread Michael McCandless
here. > > RC #2 coming shortly. > > Cheers, > Chris > > > > > > -Original Message- > From: Dave Meikle > Reply-To: "dev@tika.apache.org" > Date: Sunday, June 16, 2013 6:00 AM > To: "dev@tika.apache.org" > Subject: Re: [V

  1   2   3   4   5   6   >