On 16/12/11 15:12, Jukka Zitting wrote:
As mentioned by Antoni, in the end the metadata keys are just strings,
so with a little coordination we don't need to delay the introduction
of new keys over multiple releases.
Hmm, they're not quite just strings - with the new Property stuff they
can al
W dniu 2011-12-16 20:32, Jukka Zitting pisze:
Hi,
On Fri, Dec 16, 2011 at 7:45 PM, Antoni Mylka wrote:
The moment upstream libraries start depending in tika-core, they stop being
upstream libraries and become "side-stream" libraries. Putting POI between
core and parsers in the dependency chain
Hi,
On Fri, Dec 16, 2011 at 8:04 PM, Antoni Mylka wrote:
> I don't want to start new flames and understand that the current status quo
> is probably the best possible, given all requirements, yet let's not get
> carried away about creating yet another ultimate solution.
I was just thinking of st
Hi,
On Fri, Dec 16, 2011 at 7:45 PM, Antoni Mylka wrote:
> The moment upstream libraries start depending in tika-core, they stop being
> upstream libraries and become "side-stream" libraries. Putting POI between
> core and parsers in the dependency chain will bring all sorts of issues due
> to in
W dniu 2011-12-16 16:12, Jukka Zitting pisze:
* Consistency - both or markup and metadata keys will be harder to
ensure when it isn't in the same codebase
Yep, that can be a problem. I guess the ultimate solution to this
would be to come up with a well documented definition of what a parser
s
scope
"import".
In general pushing parsers "upstream" brings:
- graceful degradation with missing dependencies
- ability to use a later pdfbox without updating tika
- "social" benefits of putting that code closer to people who'll
know most about how to make
Hi,
On Tue, Dec 13, 2011 at 6:05 PM, Michael McCandless
wrote:
> It's true users could directly upgrade their PDFBox w/owaiting for a
> Tika release but I suspect most users don't do that...
Currently people don't do that because it's so easy to break things by
upgrading a parser library in sync
Hi,
On Tue, Dec 13, 2011 at 12:23 PM, Nick Burch wrote:
> A couple of issues do spring to mind with this plan:
Good points.
> * Metadata keys - if a parser enhancement or new feature needs a new
> metadata key, then you end up having to wait for a new tika release to
> get it (so you can add
W dniu 2011-12-13 18:05, Michael McCandless pisze:
Would it somehow be possible for Tika to ship an unreleased PDFBox?Or
does Maven fully tie our hands here?
That's the issue. Would it? AFAIU it's impossible. Tika can only depend
on jars in maven central. Is it possible to push a snapshot jar
+0
I agree, logically, parsers "belong" with their upstream project,since
as that project improves how the document format is cracked,they can
also make the matching fixes to Tika's parser. As long asthere's
enough love / advocate / testing for the Tika parser in thatproject...
My only concern is
Hey Jukka,
For places like POI and PDFBox I think this could definitely work. And then for
places where we have Parsers, but aren't ready to push upstream yet (I can
think of two examples of this relevant to me, NetCDF/HDF and GDAL),
we can just leave the Parser in tika-parsers I think.
In this
ot; trunks, sometimes trunks with my
patches. See for instance
http://aperture.sourceforge.net/maven/org/apache/poi/poi/
This would clearly work for an "internal" project, but didn't work too
well for an open source project. It also takes lots of work.
With Tika such a solutio
On Tue, 13 Dec 2011, Jukka Zitting wrote:
To avoid this issue I propose that we start moving some of our parser
implementations to upstream projects. Now with Tika 1.0 out we have a
stable Parser and Detector interfaces and related APIs that upstream
libraries could implement directly without u
Hi,
As you know, we see a lot of questions about version mismatches (which
POI or PDFBox version should go with this Tika version) and there's a
long queue of patches that are waiting for new official releases of
our upstream dependencies to become available.
To avoid this issue I propose that we
14 matches
Mail list logo