Re: tika-dotnet Module

2014-07-28 Thread Jukka Zitting
solution. It would be great if someone wanted to carry on with further improvements. -- Jukka Zitting

Re: Multiple parsers for the same MIME type

2015-01-02 Thread Jukka Zitting
sm to correctly detect such files. To avoid the extra work, you could simply mark your new parser as being able to handle all files of the more generic type, and then in your parser include a fallback option to call the original Tika parser when encountering a file the new parser can't handle. BR, Jukka Zitting

Re: Multiple parsers for the same MIME type

2015-01-02 Thread Jukka Zitting
tection to the parsing phase [2]. [1] https://tika.apache.org/1.6/api/org/apache/tika/io/TikaInputStream.html#getOpenContainer() [2] https://github.com/apache/tika/blob/1.6/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java#L385 BR, Jukka Zitting

Re: comparing Tika's file detect with other tools?

2015-04-22 Thread Jukka Zitting
Hi, Copyright also covers databases, so we'll need to honor the license terms equally when copying file's code or detection patterns. Luckily file (from http://www.darwinsys.com/file/) comes under a BSD license, so reusing the code or data is quite simple from a licensing perspective. In fact we'v

New moderators needed

2016-01-15 Thread Jukka Zitting
tor, so it would be good to have one or two new volunteers. See http://apache.org/dev/committers.html#mailing-list-moderators for more details. Best, Jukka Zitting

Re: xmpcore in Maven Central?

2016-07-15 Thread Jukka Zitting
Searching for a groupId in https://issues.sonatype.org/projects/OSSRH can also help in other similar cases. Best, Jukka Zitting

Re: Preview of Rich Documents

2011-08-21 Thread Jukka Zitting
uch images also as a simple preview mechanism. BR, Jukka Zitting

Re: Tika 0.9 integration in Solr 3.3.0

2011-08-22 Thread Jukka Zitting
If you want to use a more recent POI version, you need to use the latest Tika 1.0-SNAPSHOT version from svn trunk. [1] http://tika.apache.org/0.9/gettingstarted.html BR, Jukka Zitting

Re: Preview of Rich Documents

2011-08-22 Thread Jukka Zitting
ng? Correct, you'd need to store the preview somewhere. Note that with the TeeContentHandler class you can get both a text-only output for indexing and an XHTML output for preview from a single parsing pass through Tika. BR, Jukka Zitting

Re: Preview of Rich Documents

2011-08-25 Thread Jukka Zitting
t designed to preserve the full formatting of the original document. For a more accurate document preview feature you'll need to look for solutions beyond Tika. BR, Jukka Zitting

Re: Jira karma

2011-08-30 Thread Jukka Zitting
Hi, On Tue, Aug 30, 2011 at 7:20 PM, Michael McCandless wrote: > Could someone (jira admin) please give me (mikemccand in Jira) enough > karma so I can assign issues to myself? Done. BR, Jukka Zitting

Re: svn commit: r1163336 - in /tika/trunk/tika-parsers/src/test: java/org/apache/tika/parser/rtf/ resources/test-documents/

2011-08-30 Thread Jukka Zitting
s. Our Maven build already standardizes to UTF-8, but there's no guarantee that someone who later edits the file uses the correct encoding settings. BR, Jukka Zitting

Re: when Tika closes InputStreams

2011-08-31 Thread Jukka Zitting
worries about temporary files, so it won't close the possible temporary stream created in getFile(). And even more worryingly the getFile() or afterRead() methods of the temporary TikaInputStream instance could still end up closing the underlying stream even though that's exactly what we're trying to avoid with this construct. BR, Jukka Zitting

Resource management patterns (Was: Tika leaves files open)

2011-09-01 Thread Jukka Zitting
tions. I think that's too high a price to pay for the IMHO rather marginal benefits. Let's wait for the upgrade to Java 7 and do it properly then. BR, Jukka Zitting

Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/

2011-09-01 Thread Jukka Zitting
tible class there. I filed TIKA-703 [1] to track the removal of all deprecated parts of our public API. [1] https://issues.apache.org/jira/browse/TIKA-703 BR, Jukka Zitting

Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/

2011-09-01 Thread Jukka Zitting
major version upgrades like the 0.x to 1.x jump we're about to make. BR, Jukka Zitting

Re: svn commit: r1165230 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ooxml/ test/java/org/apache/tika/parser/microsoft/ test/resources/test-documents/

2011-09-05 Thread Jukka Zitting
onality with some more specific checks in revision 1165259, and the resulting code should now work correctly with all the test documents we have. Improvements welcome, as I'm no expert on POI or the Office file format. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #614

2011-09-05 Thread Jukka Zitting
s/trunk/.svn/lock'>: > Permission denied Not sure what's the problem there. As a workaround I simply configured the Tika-trunk build to not use the solaris2 build slave where this problem occurs. BR, Jukka Zitting

Re: svn commit: r1165230 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ooxml/ test/java/org/apache/tika/parser/microsoft/ test/resources/test-documents/

2011-09-05 Thread Jukka Zitting
Hi, 2011/9/5 Maxim Valyanskiy : > 05.09.2011, в 16:23, Jukka Zitting написал(а): >> This was my attempt at properly handling the embedded PDF in >> TestWithPdf.docx. It was included in an OLE object with the PDF >> document as it's "CONTENTS" entry. I restored

Re: 1.0 RC in next 2 weeks

2011-09-16 Thread Jukka Zitting
nk to get the latest code out while we wait for 1.0 to be ready for release. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk » Apache Tika core #629

2011-09-18 Thread Jukka Zitting
Hi, On Sun, Sep 18, 2011 at 12:06 PM, Apache Jenkins Server wrote: > mojoFailed org.apache.felix:maven-bundle-plugin:2.3.5(default-bundle) My mistake, fixed in revision 1172241. The maven-bundle-plugin version 2.3.5 has a dependency to Java 6, version 2.3.4 works also with Java 5. BR, Ju

Re: svn commit: r1173743 - /tika/trunk/tika-bundle/pom.xml

2011-09-21 Thread Jukka Zitting
Hi, On Wed, Sep 21, 2011 at 6:18 PM, wrote: > TIKA-716 Fix tika-bundle dependency list following apache-Mime4J upgrade Good catch, thanks! BR, Jukka Zitting

Re: Release date of tika 1.0 or 0.10

2011-09-21 Thread Jukka Zitting
1.0 release. I think the trunk is pretty much ready to be released already, so I'd suggest we cut the release already this week, for example over the weekend. Chris, do you want to take care of it? I should also have some spare cycles to cut the release if needed. BR, Jukka Zitting

Re: Support for Open Graph meta tags

2011-09-23 Thread Jukka Zitting
l probably need to extend the Metadata class to handle things like namespaces and structured values. BR, Jukka Zitting

Re: Support for Open Graph meta tags

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 3:06 PM, Ken Krugler wrote: > On Sep 23, 2011, at 3:24am, Jukka Zitting wrote: >> In any case it would still be good to mapRDFa tags also to the >> Metadata object. To do that properly (and to open the way to better >> XMP integration, m

Re: Support for Open Graph meta tags

2011-09-23 Thread Jukka Zitting
o position Tika more prominently on their radars. The Any23 proposal that Chris is championing is one good chance for this. Also, now that I work at Adobe, my XMP itch has been growing quite a bit, so I wouldn't be surprised if I ended up working on better XMP (and thus RDF) support soon after Tika 1.0 is out. BR, Jukka Zitting

Re: Jenkins build is still unstable: Tika-trunk #645

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 11:16 PM, Nick Burch wrote: > I'm fairly sure it's not related to my changes, but happy to be corrected if > it is! Looks like the culprit is my change to the way the attributes are resolved. I'm just fixing it. BR, Jukka Zitting

Re: Jenkins build is still unstable: Tika-trunk #645

2011-09-23 Thread Jukka Zitting
Hi, On Fri, Sep 23, 2011 at 11:18 PM, Jukka Zitting wrote: > Looks like the culprit is my change to the way the attributes > are resolved. I'm just fixing it. Fixed in revision 1175043. BR, Jukka Zitting

Re: [VOTE] Apache Tika 0.10 release rc #1

2011-09-26 Thread Jukka Zitting
ither way (recut or update the RC) is fine by me. BR, Jukka Zitting

Re: [VOTE] Apache Tika 0.10 release rc #1

2011-09-26 Thread Jukka Zitting
n/apache-tika-0.10/rc1/CHANGES-0.10.txt [2] http://www.apache.org/dist/tika/CHANGES-0.9.txt BR, Jukka Zitting

Re: commons-codec dependency

2011-09-26 Thread Jukka Zitting
5. BR, Jukka Zitting

apache-tika-app? (Was: [VOTE] Apache Tika 0.10 release rc #1)

2011-09-26 Thread Jukka Zitting
jar name as short as possible. Ideally we'd even drop the -app part, but that would make the Maven setup a bit awkward. BR, Jukka Zitting

Re: Newb: IDE + Maven?

2011-10-03 Thread Jukka Zitting
for the tika-bundle-it component. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #664

2011-10-05 Thread Jukka Zitting
tml BR, Jukka Zitting

Re: Download-Link to tika-app-0.10.jar doesn't work

2011-10-05 Thread Jukka Zitting
issue tracker to add a new bug? In the upper right corner of https://issues.apache.org/jira/browse/TIKA you should see a login link. If you don't already have an account, you can register one by following the link on the login screen. BR, Jukka Zitting

Re: Jenkins build became unstable: Tika-trunk » Apache Tika parsers #683

2011-10-14 Thread Jukka Zitting
Hi, On Fri, Oct 14, 2011 at 12:10 AM, Apache Jenkins Server wrote: > See > <https://builds.apache.org/job/Tika-trunk/org.apache.tika$tika-parsers/683/changes> Sorry, my mistake. Fixed in revision 1183239. BR, Jukka Zitting

Re: TikaConfig.getDetector?

2011-10-17 Thread Jukka Zitting
ghts? Why not just use Tika.getDetector()? Or new DefaultDetector()? TikaConfig doesn't currently have anything to do with Detectors, so my instinct would be to avoid such an extra method unless we actually want to add some sort of a custom detector configuration mechanism. BR, Jukka Zitting

Re: TikaConfig.getDetector?

2011-10-17 Thread Jukka Zitting
ess to the underlying functionality, and the Tika constructors allow complete customization of these component instances, including by specifying a custom TikaConfig. BR, Jukka Zitting

Re: TikaConfig.getDetector?

2011-10-18 Thread Jukka Zitting
tDetector() method to TikaConfig. BR, Jukka Zitting

Re: Updating CHANGES.txt?

2011-10-19 Thread Jukka Zitting
ting new features that otherwise might get lost in the noise. BR, Jukka Zitting

Re: Updating CHANGES.txt?

2011-10-19 Thread Jukka Zitting
what happened than a detailed listing of each individual change would have done. BR, Jukka Zitting

Re: Updating CHANGES.txt?

2011-10-26 Thread Jukka Zitting
never there's a good chance for that. We haven't really lived up to such an ideal lately, but big +1 for bringing this up and leading the way! BR, Jukka Zitting

Re: Updating CHANGES.txt?

2011-10-27 Thread Jukka Zitting
recent revisions for the details. BR, Jukka Zitting

Re: Tika 1.0 RC?

2011-10-27 Thread Jukka Zitting
r the release process, and we should then have the release out nicely just in time for the ApacheCon. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk » Apache Tika OSGi bundle #703

2011-10-31 Thread Jukka Zitting
tests to a separate java6 profile. > [WARNING] File encoding has not been set, using platform encoding > ANSI_X3.4-1968, i.e. build is platform dependent! Hmm, looks like we should set source encoding explicitly to UTF-8... BR, Jukka Zitting

Re: A problem in the right-to-left languages

2011-11-01 Thread Jukka Zitting
ntly come out corrupted if you don't have > this in your classpath. +1 BR, Jukka Zitting

Re: Tika 1.0 RC?

2011-11-01 Thread Jukka Zitting
Hi, On Thu, Oct 27, 2011 at 6:42 PM, Jukka Zitting wrote: > How about if we leave the trunk open still for the weekend, and cut > the 1.0 release candidate at the beginning of next week? With TIKA-565 and TIKA-763 resolved the trunk is now ready for release as far as I'm concerned.

Re: Tika 1.0 RC?

2011-11-02 Thread Jukka Zitting
Hi, On Tue, Nov 1, 2011 at 6:10 PM, Nick Burch wrote: > On Tue, 1 Nov 2011, Jukka Zitting wrote: >> TIKA-764 is currently marked for 1.0. Nick, is it ready to be resolved or >> should we postpone it to a later release? > > We should maybe split it and resolve the first par

Re: [VOTE] Apache Tika 1.0 release rc #1

2011-11-04 Thread Jukka Zitting
Hi, On Fri, Nov 4, 2011 at 4:42 PM, Mattmann, Chris A (388J) wrote: > Please vote on releasing this package as Apache Tika 1.0.    [x] +1 Release this package as Apache Tika 1.0    [ ] -1 Do not release this package because... Signatures, build, etc. OK. Thanks! BR, Jukka Zitting

Multilingual Tika

2011-11-04 Thread Jukka Zitting
py, Tika.rb, Tika.js, Tika.pm and Tika.php bindings (plus whatever else people may be interested in) that just reflect the key functionality found in Tika.java. Anyone interested in joining such an effort? Any pointers to existing work along similar lines? BR, Jukka Zitting

Re: Updating CHANGES.txt?

2011-11-10 Thread Jukka Zitting
Hi, The effort spent on CHANGES.txt is clearly paying off. See for example [1] where the information is nicely being spread to a wider audience. [1] http://java.dzone.com/news/apache-tika-10-solidifies BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #720

2011-11-11 Thread Jukka Zitting
lity error. That's such a minor issue that I just explicitly excluded the enum types from the clirr check in revision 1200889. BR, Jukka Zitting

Re: tika's beta dependency

2011-12-01 Thread Jukka Zitting
tadata-extractor BR, Jukka Zitting

Pushing parsers upstream

2011-12-13 Thread Jukka Zitting
hich I'm a member), but I suppose we should be able to come up with an arrangement where Tika committers can commit directly to the Tika parser implementation in PDFBox. It would be cool if we could do the same thing also with POI. WDYT? [1] https://issues.apache.org/jira/browse/PDFBOX-1132 BR, Jukka Zitting

Re: JIRA rights.

2011-12-13 Thread Jukka Zitting
of the tika-developers group which grants full admin access to the TIKA project in Jira. I just added you and Jérôme to the this group. Enjoy! BR, Jukka Zitting

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
nk the tradeoff favored focusing our work on Tika itself, but now with stable 1.0 APIs I think the time may be ripe to start reducing the size of tika-parsers (which has been growing pretty much, see [1]). [1] https://www.ohloh.net/p/tika/analyses/latest BR, Jukka Zitting

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
y upgrading the relevant parser libraries if they face problems with a particular document. BR, Jukka Zitting

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
settings. >   - when we release new tika version, old pdfbox may not work >     with it until the next release We're explicitly committed to maintaining backwards compatiblity (see https://issues.apache.org/jira/browse/TIKA-699) until Tika 2.0, so any case where a new Tika release breaks an existing upstream parser should be treated as a bug and fixed. BR, Jukka Zitting

Re: Pushing parsers upstream

2011-12-16 Thread Jukka Zitting
as just thinking of stuff like that a parser should preferably use XMP schemas when exposing metadata, not about inventing our own schemas. BR, Jukka Zitting

Re: I would like to join this mailing list

2011-12-29 Thread Jukka Zitting
Hi Adam, Welcome! To subscribe, send a message to dev-subscr...@tika.apache.org. For more details, see http://tika.apache.org/mail-lists.html. BR, Jukka Zitting

Re: Sharing metadata logic between parsers

2012-01-30 Thread Jukka Zitting
Hi, On Mon, Jan 30, 2012 at 3:40 PM, Nick Burch wrote: > What do people think is the best way to handle this sort of thing? I'd go with XMPDM, as that's already a dependency of the described piece of code. BR, Jukka Zitting

Re: Sharing metadata logic between parsers

2012-01-30 Thread Jukka Zitting
uments). Opening the Metadata class for convenience methods like these can be a Pandora's box, but it would also simplify quite a bit of code both on the client and the parser side. BR, Jukka Zitting

Re: Sharing metadata logic between parsers

2012-01-30 Thread Jukka Zitting
Hi, On Mon, Jan 30, 2012 at 4:20 PM, Nick Burch wrote: > On Mon, 30 Jan 2012, Jukka Zitting wrote: >> What we might also consider as an extra convenience, are Metadata methods >> like: [...] > > If we're doing that sort of thing, then I'd rather we put the logic on

Re: buildbot failure in ASF Buildbot on tika-trunk

2012-02-17 Thread Jukka Zitting
Hi, On Fri, Feb 17, 2012 at 7:26 PM, wrote: > The Buildbot has detected a new failure on builder tika-trunk Sorry, my config handling change apparently broke OSGi service loading. I'll fix that later tonight. BR, Jukka Zitting

Re: Tika 1.1 release

2012-03-01 Thread Jukka Zitting
retty soon. In fact given the time that has passed since 1.0, I think it would be a good idea to push for a 1.1 release already this month. BR, Jukka Zitting

Re: [VOTE] Apache Tika 1.1 release rc #1

2012-03-09 Thread Jukka Zitting
Hi, On Wed, Mar 7, 2012 at 10:35 PM, Mattmann, Chris A (388J) wrote: > Please vote on releasing this package as Apache Tika 1.1.   [x] +1 Release this package as Apache Tika 1.1 Thanks! BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #821

2012-03-27 Thread Jukka Zitting
ated when the build is run in a Java 6+ environment. BR, Jukka Zitting

PUT vs. POST in tika-server

2012-04-05 Thread Jukka Zitting
IMO a more appropriate verb to use is POST, that's meant (among other things) for: "Providing a block of data [...] to a data-handling process;" ... which is what tika-server does. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #836

2012-04-20 Thread Jukka Zitting
configuration or adding an explicit exclude rule to the rat plugin configuration. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #838

2012-04-27 Thread Jukka Zitting
n tika-server. Anyone? BR, Jukka Zitting

Re: JIRA links in CHANGES.txt are broken

2012-05-26 Thread Jukka Zitting
] https://marketplace.atlassian.com/plugins/com.sourcelabs.jira.plugin.report.contributions BR, Jukka Zitting

Re: HTML styles and tags are ignored

2012-06-04 Thread Jukka Zitting
mean the native list formatting of those document types? The Tika parsers for PDF and Office documents could/should automatically map such formatting to equivalent XHTML constructs, but I don't think they currently do. You'll need to look into the source code to see how to make that happen. BR, Jukka Zitting

Re: TikaInputStream customization

2012-06-06 Thread Jukka Zitting
Stream(stream, 1000)); However, see the concern in TIKA-307 [2]. Passing a truncated stream to Tika may produce unexpected results. [1] http://commons.apache.org/io/api-release/org/apache/commons/io/input/BoundedInputStream.html [2] https://issues.apache.org/jira/browse/TIKA-307 BR, Jukka Zitting

Re: TikaInputStream customization

2012-06-08 Thread Jukka Zitting
Hi, On Wed, Jun 6, 2012 at 2:15 PM, Baranee wrote: > Can u pls tell me how to use the beforeRead() method in TikaInputStream to > set readlimit for reading bytes from a stream. http://people.apache.org/~hossman/#xyproblem Why do you want to use TikaInputStream like this? BR, Jukka Zitting

Re: Convert file before Tika processes it?

2012-06-21 Thread Jukka Zitting
then invoke the standard XMLParser on the result. BR, Jukka Zitting

ZipContainerDetector and TikaInputStream.getFile()

2012-06-29 Thread Jukka Zitting
oaches where 1) is used to ensure contractual correctness and 2) to prevent too eager spooling of streams (and to act as a failsafe in case some code fails to honor requirement 1). WDYT? BR, Jukka Zitting

Re: buildbot failure in ASF Buildbot on tika-trunk

2012-06-29 Thread Jukka Zitting
Hi, On Fri, Jun 29, 2012 at 11:21 PM, wrote: > The Buildbot has detected a new failure on builder tika-trunk while building > ASF Buildbot. Oops, sorry about that. Fixed in revision 1355579. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #882

2012-06-30 Thread Jukka Zitting
dded a workaround in revision 1355746. [1] http://jira.codehaus.org/browse/MSHADE-23 BR, Jukka Zitting

JAX-RS overhead in tika-server

2012-07-01 Thread Jukka Zitting
brary [1]. [1] http://hc.apache.org/httpcomponents-core-ga/ BR, Jukka Zitting

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Jukka Zitting
Hi, On Sun, Jul 1, 2012 at 6:27 PM, Mattmann, Chris A (388J) wrote: > On Jul 1, 2012, at 5:09 AM, Jukka Zitting wrote: > Sergey Beryozkin (who I'm CC'ing on this email since I'm not sure > he's subscribed to dev@) helped by providing guidance on the CXF > side

Re: JAX-RS overhead in tika-server

2012-07-02 Thread Jukka Zitting
ts, > advanced search capabilities, OAuth2, seem to be of possible use in the > project. That's all fine, but do we really need such features in Tika? For example, what could tika-server possibly need OAuth2 for? BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #888

2012-07-02 Thread Jukka Zitting
(com/adobe/xmp/XMPException.class) > class file has wrong version 50.0, should be 49.0 Hmm, looks like Java 6 is needed for the xmpcore dependency. For now I simply solved this issue by moving the tika-xmp module to a separate java6 profile in revision 1356510, but I think we need some better

Re: Build failed in Jenkins: Tika-trunk #889

2012-07-03 Thread Jukka Zitting
Hi, On Tue, Jul 3, 2012 at 8:57 AM, Apache Jenkins Server wrote: > cause : Too many unapproved licenses: 1 That was the tika-dotnet/.gitignore file I added earlier. It's no longer needed, so I removed it in revision 1356619. BR, Jukka Zitting

Re: svn commit: r1355877 - in /tika/trunk: ./ tika-dll/ tika-dll/src/ tika-dll/src/main/ tika-dll/src/main/csharp/ tika-dll/src/main/csharp/Apache/

2012-07-03 Thread Jukka Zitting
o access Tika features without having to spawn a separate Java process for that. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #888

2012-07-03 Thread Jukka Zitting
Hi, On Tue, Jul 3, 2012 at 4:03 PM, Joerg Ehrlich wrote: > A new version of XMPCore compiled for JDK 1.5 has been uploaded to Maven > Central: 5.1.2 Great! In revision 1356776 I upgraded the XMPCore dependency and moved tika-xmp back to the main build. BR, Jukka Zitting

Re: buildbot failure in ASF Buildbot on tika-trunk

2012-07-04 Thread Jukka Zitting
Hi, On Wed, Jul 4, 2012 at 12:13 PM, wrote: > BUILD FAILED: failed compile Looks like a Buildbot error. The build works fine for me locally. BR, Jukka Zitting

FYI: text/plain and text/html media types now come with charset info

2012-07-08 Thread Jukka Zitting
c available in the media type registry. With the isInstanceOf helper method I just added this becomes: String type = metadata.get(Metadata.CONTENT_TYPE); MediaTypeRegistry registry = ...; if (registry.isInstanceOf(type, MediaType.TEXT_HTML)) { ... } BR, Jukka Zitting

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Jukka Zitting
Hi, On Tue, Jul 10, 2012 at 10:29 PM, Mattmann, Chris A (388J) wrote: > Please vote on releasing this package as Apache Tika 1.2. [x] +1 Release this package as Apache Tika 1.2 BR, Jukka Zitting

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Jukka Zitting
nce that's the pattern we've been following also in Jackrabbit, based originally on examples from HTTP Server and Lucene. BR, Jukka Zitting

Re: Fixing the problem of TIKA-895 and TIKA-914

2012-07-14 Thread Jukka Zitting
unky character would seem like the best workaround. BR, Jukka Zitting

Re: Fixing the problem of TIKA-895 and TIKA-914

2012-07-16 Thread Jukka Zitting
ature (automatically ignoring empty content). What's the SAX library you're using to serialize the output from Tika? You may also want to try the ToXMLContentHandler class in o.a.t.sax. It can serialize SAX events and doesn't suffer from this problem. BR, Jukka Zitting

Re: Can't build javadocs for 1.2 API site docs

2012-07-17 Thread Jukka Zitting
Hi, On Tue, Jul 17, 2012 at 12:10 PM, Ray Gauss II wrote: > Should I merge this to tags/1.2? It's a bad idea to modify tags that have already been released. But it should be fine to apply the patch manually before building the 1.2 javadocs for inclusion on the web site. BR, Jukka Zitting

Re: Build failed in Jenkins: Tika-trunk #906

2012-08-01 Thread Jukka Zitting
Hi, On Wed, Aug 1, 2012 at 4:22 PM, Ray Gauss II wrote: > Anyone have ideas on this one? Is it really something I did? Looks like a Jenkins problem. The Jenkins setup at Apache has been quite unstable over the last few months. BR, Jukka Zitting

Tika at ApacheCon

2012-08-03 Thread Jukka Zitting
Hi, Did someone already submit a talk about Tika to ApacheCon Europe [1]? If not, I'll submit one. [1] http://www.apachecon.eu/ BR, Jukka Zitting

Re: Tika at ApacheCon

2012-08-03 Thread Jukka Zitting
. I'll ask on the users@ list if there are people planning to attend the conference for more input on topics to cover. BR, Jukka Zitting

Re: [VOTE] Graduate Apache Any23 from the Apache Incubator

2012-08-06 Thread Jukka Zitting
inding) BR, Jukka Zitting

Re: TIKA-431 and CONTENT_ENCODING

2012-08-09 Thread Jukka Zitting
still be clients out there that expect this information to be present as CONTENT_ENCODING. In fact, unless the abuse of that field is actively harmful (i.e. clients need to add extra workarounds to clean up the metadata), I'd keep the field in place all the way until Tika 2.0. BR, Jukka Zitting

Re: AutoDetectParser is not parsing UTF-16 content types

2012-08-29 Thread Jukka Zitting
to automatically detect the correct encoding and use it if the declared one is obviously incorrect. BR, Jukka Zitting

Re: Question about XPath Matcher code & MatchingContentHandler

2012-09-03 Thread Jukka Zitting
ntHandler is only interested in stuff inside the element, not outside it. BR, Jukka Zitting

Re: Question about XPath Matcher code & MatchingContentHandler

2012-09-04 Thread Jukka Zitting
"body"); // no match, ignore startElement("p"); // match, call super.startElement("p") endElement("p"); // match, call super.endElement("p") endElement("body"); // no match, ignore endElement("html"); // no match, ignore BR, Jukka Zitting

  1   2   3   4   5   6   7   8   >