[
https://issues.apache.org/jira/browse/TIKA-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095422#comment-13095422
]
Paul Jakubik commented on TIKA-701:
---
This is a very important fix. Will it be rele
Hi,
I was looking at http://tika.apache.org/0.8/formats.html and found several
issues with it:
- Says that it lists the formats supported by Tika 0.6 instead of 0.8.
- Says that it has links to parser class javadocs when it doesn't.
- Though the page promises that the parser class java d
On Sun, Oct 31, 2010 at 8:16 PM, Jukka Zitting wrote:
> I don't think the time is right yet for upgrading Tika's platform
> requirement from Java 5 to 6.
>
>
Java 6 has been out for almost 4 years (
http://en.wikipedia.org/wiki/Java_version_history). When will it be the
right time to require Java
Hi,
I'm wondering if there is a way to turn off character set detection when
parsing with the AutoDetectParser, or if there is a way to speed up
character set detection.
I ran a test that converted 52,717 documents to text. The documents were
emails embedded in a .tar file.
With character set de
Hi,
A while ago I added the http://wiki.apache.org/tika/MetadataDiscussion page
to the Tika wiki.
Since then, with the help of Jukka Zitting, a solution has been described
for using the current Tika library to capture nested document metadata and
associate that with the text extracted for each ne
I have added Juka Zitting's recursive metadata example to the Tika wiki at
http://wiki.apache.org/tika/RecursiveMetadata. I also added some notes on
what I did so I could get the metadata for a nested document along with the
text for that document.
Finally, I modified the http://wiki.apache.org/ti
Thank you for this example! Is there any chance this example could be
added to the Tika wiki?
On Fri, Jul 16, 2010 at 1:30 AM, Jukka Zitting wrote:
> Hi,
>
> On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik
> wrote:
> > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting >
On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting wrote:
> The way I recommend is to pass a custom Parser implementation through
> the ParseContext. This gives you detailed access to each component
> document.
>
>
I looked at the code a little further, and I don't see exactly how I can do
this.
I am
On Thu, Jul 15, 2010 at 6:30 AM, Nick Burch wrote:
>
> Having looked through your proposed solutions, I can't see easy ways to
> implement these use cases:
> * enumerate all the Metadata objects at this depth
> eg top level has one Metadata object (for the parent file), 1 level
> down may have
On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting wrote:
> The way I recommend is to pass a custom Parser implementation through
> the ParseContext. This gives you detailed access to each component
> document.
>
> You noted that this approach wouldn't work for recursive metadata. Why?
>
>
I didn't th
On Mon, Jul 12, 2010 at 10:37 AM, Nick Burch wrote:
> Assuming I've got all of the above correct, it might be worth creating a
> wiki page for this (probably + referencing jira entry), and start trying to
> work up a proposed solution that'll handle all the above problems and use
> cases.
>
I cre
On Mon, Jul 12, 2010 at 12:59 PM, Alex Ott wrote:
>
> May be it worth to separate metadata of top-level objects from metadata of
> embedded objects? And allow to traverse through hierarchy of embedded
> objects? And provide several implementations, something like: collector of
> metadata for all
On Mon, Jul 12, 2010 at 10:37 AM, Nick Burch wrote:
> On Mon, 12 Jul 2010, Paul Jakubik wrote:
>
>> I'm using tika to parse packages (zip, tar.gz, tar.bz2, etc.) and I'd like
>> to get access to the metadata for the individual files inside of the
>> packa
Hi,
I'm using tika to parse packages (zip, tar.gz, tar.bz2, etc.) and I'd like
to get access to the metadata for the individual files inside of the
package.
It looks like there has been some discussion about how to provide the
metadata, and from looking at the code I don't think any of the propos
14 matches
Mail list logo