[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950218#comment-13950218
]
Anurag Indu commented on TIKA-93:
-
Hello All, I tried to use tesseract to extract all the ima
[
https://issues.apache.org/jira/browse/TIKA-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reassigned TIKA-1010:
-
Assignee: Tim Allison
> Embedded documents in RTF are not extracted
>
Hi,
On Thu, Mar 27, 2014 at 6:21 PM, Stefano Fornari
wrote:
> 1. is the use of PDF2XHTML necessary? why is the pdf turned into an XHTML?
> for the purpose of indexing, wouldn't just the text be enough?
The XHTML output allows us to annotate the extracted text with
structural information (like "t
that worked! thanks.
Ste
On Thu, Mar 27, 2014 at 11:24 PM, Jukka Zitting wrote:
> Hi,
>
> On Thu, Mar 27, 2014 at 6:07 PM, Stefano Fornari
> wrote:
> > I am not sure tstream.hasFile() can ever be true, from my understanding
> of
> > the code it can be only false.
>
> It's true if you call the
Hi,
On Thu, Mar 27, 2014 at 6:07 PM, Stefano Fornari
wrote:
> I am not sure tstream.hasFile() can ever be true, from my understanding of
> the code it can be only false.
It's true if you call the parser like this:
InputStream stream = TikaInputStream.get(file);
try {
parser.pars
Hi,
I have two more questions on PDFParser:
1. is the use of PDF2XHTML necessary? why is the pdf turned into an XHTML?
for the purpose of indexing, wouldn't just the text be enough?
2. I need to limit the index of the content to files whose size is below to
a certain threshold; I was wondering if
Hi All,
I am using lucene in an embedded environment and I need to keep use of
memory under control. In investigating a problem with big pdf files (a few
Mb), I noticed that Parse.parse takes an InputStream as parameter but then
PDFParser has the following code:
TikaInputStream tstream = TikaInput
On Thu, 27 Mar 2014, Konstantin Gribov wrote:
Some containers (like matroska/mkv) tags audio and subtitle streams with
language tag and some comment. From mplayer console output:
[lavf] stream 0: video (h264), -vid 0
[lavf] stream 1: audio (aac), -aid 0, -alang rus, Rus BaibaKo.tv
[lavf] stream
Hello, Nick.
Some containers (like matroska/mkv) tags audio and subtitle streams with
language tag and some comment. From mplayer console output:
> [lavf] stream 0: video (h264), -vid 0
> [lavf] stream 1: audio (aac), -aid 0, -alang rus, Rus BaibaKo.tv
> [lavf] stream 2: audio (ac3), -aid 1, -ala
Hi All
Does anyone know if we have a recommended way / plan of a way to handle
video files with possibly multiple audio streams?
Most of the multimedia container formats support video and zero or one
audio streams, and a fair number support video and multiple audio streams.
A few can actuall
[
https://issues.apache.org/jira/browse/TIKA-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949463#comment-13949463
]
Nick Burch commented on TIKA-1112:
--
The checksum warning is now fixed upstream, and should
[
https://issues.apache.org/jira/browse/TIKA-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949460#comment-13949460
]
Nick Burch commented on TIKA-1079:
--
I think this might be the same problem as reported in
12 matches
Mail list logo