[
https://issues.apache.org/jira/browse/TIKA-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042428#comment-14042428
]
Tim Allison commented on TIKA-758:
--
Y, my grand plan after TIKA-1302 is in place would be t
[
https://issues.apache.org/jira/browse/TIKA-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044668#comment-14044668
]
Tim Allison commented on TIKA-1302:
---
Agreed.
If there's a grad student with some time on
[
https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044682#comment-14044682
]
Tim Allison commented on TIKA-1332:
---
To my mind, there are three families of things that
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1300:
--
Attachment: tika_1_6_ClassicsVsNonSeq.zip
The attached shows the results of running Tika 1.6 trunk with
[
https://issues.apache.org/jira/browse/TIKA-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison closed TIKA-1298.
-
Resolution: Fixed
Turned test back on in PDFParser test. Thank you [~tilman]!
> testEmbeddedPDFEmbedding
[
https://issues.apache.org/jira/browse/TIKA-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044881#comment-14044881
]
Tim Allison commented on TIKA-1233:
---
Hindsight and current eval methodology turn out to b
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045881#comment-14045881
]
Tim Allison commented on TIKA-1300:
---
[~tilman], [~tboehme] and [~msahyoun], thank you all
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046972#comment-14046972
]
Tim Allison commented on TIKA-1300:
---
Don't think so. I'd recommend the 1000 zips vs 1m fi
[
https://issues.apache.org/jira/browse/TIKA-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047556#comment-14047556
]
Tim Allison commented on TIKA-1300:
---
[~tilman], I'm sorry for not responding to your earl
[
https://issues.apache.org/jira/browse/TIKA-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054995#comment-14054995
]
Tim Allison commented on TIKA-1364:
---
Are you getting the same problem if you only include
[
https://issues.apache.org/jira/browse/TIKA-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison closed TIKA-1372.
-
Resolution: Fixed
[~tilman], thank you for notifying us. Y, that was Tika's (well, my) fault. I
fixed t
Tim Allison created TIKA-1374:
-
Summary: Need to add code to look for OS-specific keys for
embedded files within PDFs
Key: TIKA-1374
URL: https://issues.apache.org/jira/browse/TIKA-1374
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1374:
--
Description:
Embedded files in PDFs can be found by the general all purpose key we
currently use via P
Tim Allison created TIKA-1375:
-
Summary: Decrease memory consumption when extracting images from
PDFs
Key: TIKA-1375
URL: https://issues.apache.org/jira/browse/TIKA-1375
Project: Tika
Issue Type
Tim Allison created TIKA-1376:
-
Summary: Improve embedded file name extraction in PDFParser
Key: TIKA-1376
URL: https://issues.apache.org/jira/browse/TIKA-1376
Project: Tika
Issue Type: Improveme
[
https://issues.apache.org/jira/browse/TIKA-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074306#comment-14074306
]
Tim Allison commented on TIKA-1375:
---
I ran four versions of Tika against a random selecti
[
https://issues.apache.org/jira/browse/TIKA-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074306#comment-14074306
]
Tim Allison edited comment on TIKA-1375 at 7/25/14 11:43 AM:
-
I
[
https://issues.apache.org/jira/browse/TIKA-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074306#comment-14074306
]
Tim Allison edited comment on TIKA-1375 at 7/25/14 11:49 AM:
-
I
[
https://issues.apache.org/jira/browse/TIKA-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison closed TIKA-1375.
-
Resolution: Fixed
fixed r1613395.
Thank you, [~tilman] and [~lehmi] for your work on PDFBOX-2101 and advi
[
https://issues.apache.org/jira/browse/TIKA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison closed TIKA-1376.
-
Resolution: Fixed
r1613444
> Improve embedded file name extraction in PDFParser
> ---
[
https://issues.apache.org/jira/browse/TIKA-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison closed TIKA-1374.
-
Resolution: Fixed
r1613501.
> Need to add code to look for OS-specific keys for embedded files within PDF
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082488#comment-14082488
]
Tim Allison commented on TIKA-1380:
---
A test in CLI needs a small change:
{noformat}
testE
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1380:
--
Attachment: TIKA-1380b.patch
This includes Nick's patch, the small change in CLITest and a small change
[
https://issues.apache.org/jira/browse/TIKA-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082662#comment-14082662
]
Tim Allison commented on TIKA-1372:
---
Looked at the stacktrace a bit more closely. This w
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084537#comment-14084537
]
Tim Allison commented on TIKA-1380:
---
Thank you, Nick! I just noticed that we should bump
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084583#comment-14084583
]
Tim Allison commented on TIKA-1380:
---
Added simple tests for comments in xls and xlsx: r16
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084817#comment-14084817
]
Tim Allison commented on TIKA-1380:
---
[~gagravarr] and all, would you have an objection to
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084820#comment-14084820
]
Tim Allison commented on TIKA-1380:
---
Also, [~gagravarr], should we bump codec to 1.9 to s
[
https://issues.apache.org/jira/browse/TIKA-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1317:
--
Attachment: TIKA-1317.patch
If there are no objections, I'll commit this to the 1.6 branch and trunk
sh
[
https://issues.apache.org/jira/browse/TIKA-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison closed TIKA-1317.
-
Resolution: Fixed
Fix Version/s: 1.6
Committed in trunk: r1615667
Committed in 1.6 branch: r1615675
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084820#comment-14084820
]
Tim Allison edited comment on TIKA-1380 at 8/4/14 9:05 PM:
---
Also,
[
https://issues.apache.org/jira/browse/TIKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086123#comment-14086123
]
Tim Allison commented on TIKA-1275:
---
I just tested on trunk, and all tests pass once we a
[
https://issues.apache.org/jira/browse/TIKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086143#comment-14086143
]
Tim Allison commented on TIKA-1275:
---
Sounds good. I added a tukaani.version param next t
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened TIKA-1380:
---
The 1.6 branch and trunk are failing one test on Windows.
testExtract(org.apache.tika.cli.TikaCLITest): F
[
https://issues.apache.org/jira/browse/TIKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086162#comment-14086162
]
Tim Allison commented on TIKA-1275:
---
Got it. Thank you.
> Upgrade Commons compress to 1
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086189#comment-14086189
]
Tim Allison commented on TIKA-1380:
---
Something along these lines:
{noformat}
if (type ==
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086213#comment-14086213
]
Tim Allison commented on TIKA-1380:
---
In POI 3.10-final, this particular attachment threw
[
https://issues.apache.org/jira/browse/TIKA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1275.
---
Resolution: Fixed
Fix Version/s: 1.6
upgraded in 1.6 branch: 1615923
in trunk: 1615926
> Upgra
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1380:
--
Attachment: TIKA-1380_nullOLELabel.patch
Proposed modification.
> Upgrade to Apache POI 3.11 beta 1
> -
[
https://issues.apache.org/jira/browse/TIKA-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1380.
---
Resolution: Fixed
patch applied in:
1.6 branch: r1615970
trunk: r1615980
> Upgrade to Apache POI 3.11
[
https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1329:
--
Attachment: test_recursive_embedded.docx
TIKA-1329v2.patch
Got this error on review boar
[
https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1329:
--
Description:
Jukka and Nick have a great demo of parsing metadata recursively on the
[wiki|http://wiki.
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097278#comment-14097278
]
Tim Allison commented on TIKA-1396:
---
In 1.5, Tika only extracts "attachments" from pdfs.
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1396:
--
Component/s: (was: cli)
parser
> Embedded images in PDF documents
> ---
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099306#comment-14099306
]
Tim Allison commented on TIKA-1396:
---
Latest app build is available
[here|https://builds.
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1396.
---
Resolution: Fixed
Fix Version/s: 1.6
Feature available in 1.6.
> Embedded images in PDF docume
[
https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119745#comment-14119745
]
Tim Allison commented on TIKA-1330:
---
Looks like ballpark estimate on time for processing
[
https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121454#comment-14121454
]
Tim Allison commented on TIKA-1330:
---
Started documentation on the [wiki|https://wiki.apac
[
https://issues.apache.org/jira/browse/TIKA-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1232:
--
Fix Version/s: 1.6
> Add PDF version to PDFParser output
> ---
>
>
[
https://issues.apache.org/jira/browse/TIKA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128398#comment-14128398
]
Tim Allison commented on TIKA-1268:
---
These should do it, no?
Either with svn commandline
[
https://issues.apache.org/jira/browse/TIKA-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14128398#comment-14128398
]
Tim Allison edited comment on TIKA-1268 at 9/10/14 12:13 PM:
-
T
[
https://issues.apache.org/jira/browse/TIKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131386#comment-14131386
]
Tim Allison commented on TIKA-1414:
---
>From TIKA-1396:
bq. As a hack, you can also change
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133832#comment-14133832
]
Tim Allison commented on TIKA-1396:
---
I just tested the tika 1.6 app jar on "testPDF_child
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133841#comment-14133841
]
Tim Allison commented on TIKA-1396:
---
Now that we are using PDFBox 1.8.6, we might conside
[
https://issues.apache.org/jira/browse/TIKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133851#comment-14133851
]
Tim Allison commented on TIKA-1414:
---
[~tpalsulich], any interest in adding an example for
[
https://issues.apache.org/jira/browse/TIKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134700#comment-14134700
]
Tim Allison commented on TIKA-1414:
---
[~tpalsulich], great. Thank you! It might make sen
[
https://issues.apache.org/jira/browse/TIKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134702#comment-14134702
]
Tim Allison commented on TIKA-1414:
---
[~damiano], I'll close this out shortly unless I hea
Tim Allison created TIKA-1418:
-
Summary: Add TikaConfigDumperExample to example package
Key: TIKA-1418
URL: https://issues.apache.org/jira/browse/TIKA-1418
Project: Tika
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/TIKA-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1418:
--
Attachment: TikaConfigDumper.patch
> Add TikaConfigDumperExample to example package
> ---
[
https://issues.apache.org/jira/browse/TIKA-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1418:
--
Attachment: tika-config-SNAPSHOT-1.7_20140919.xml
For posterity, this is what the tika-config file looks
[
https://issues.apache.org/jira/browse/TIKA-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1418.
---
Resolution: Fixed
Fix Version/s: 1.7
Added the example and test. I also added the --config= opt
Tim Allison created TIKA-1419:
-
Summary: Upgrade to PDFBox 1.8.7
Key: TIKA-1419
URL: https://issues.apache.org/jira/browse/TIKA-1419
Project: Tika
Issue Type: Improvement
Reporter: Ti
[
https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1329.
---
Resolution: Fixed
r1626300
> Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataPars
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1419:
--
Attachment: compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.csv
> Upgrade to PDFBox 1.8.7
>
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143588#comment-14143588
]
Tim Allison commented on TIKA-1419:
---
I just finished the run on 50,000 random pdfs from g
Tim Allison created TIKA-1424:
-
Summary: Clear PDFont's resources after each file to prevent
memory leak
Key: TIKA-1424
URL: https://issues.apache.org/jira/browse/TIKA-1424
Project: Tika
Issue T
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143588#comment-14143588
]
Tim Allison edited comment on TIKA-1419 at 9/23/14 1:23 AM:
I j
Tim Allison created TIKA-1426:
-
Summary: Let's allow users to specify a tika config file on the
commandline for tika-app and tika-server
Key: TIKA-1426
URL: https://issues.apache.org/jira/browse/TIKA-1426
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145369#comment-14145369
]
Tim Allison commented on TIKA-1396:
---
Thank you for attaching a test file! I'll take a lo
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145381#comment-14145381
]
Tim Allison commented on TIKA-1419:
---
Yes, absolutely. I'm sorry for appearing to be (wel
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened TIKA-1396:
---
> Embedded images in PDF documents
>
>
> Key: TIKA-1396
>
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145420#comment-14145420
]
Tim Allison commented on TIKA-1396:
---
When I run your file through a modified version of a
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145420#comment-14145420
]
Tim Allison edited comment on TIKA-1396 at 9/23/14 8:53 PM:
Whe
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146230#comment-14146230
]
Tim Allison commented on TIKA-1396:
---
Ah, ok. Y, pls open another issue. I should also a
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison closed TIKA-1396.
-
Resolution: Not a Problem
> Embedded images in PDF documents
>
>
>
[
https://issues.apache.org/jira/browse/TIKA-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1297.
---
Resolution: Fixed
Fix Version/s: 1.6
> Images not being extracted from PDFs
> --
[
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146283#comment-14146283
]
Tim Allison commented on TIKA-1422:
---
While work is going on to get the TesseractOCRParser
[
https://issues.apache.org/jira/browse/TIKA-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1424.
---
Resolution: Fixed
r1627304
> Clear PDFont's resources after each file to prevent memory leak
> ---
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1419.
---
Resolution: Fixed
r1627308
> Upgrade to PDFBox 1.8.7
> ---
>
> Key
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146294#comment-14146294
]
Tim Allison commented on TIKA-1419:
---
Happy to help (and again my apologies for the post-h
[
https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146537#comment-14146537
]
Tim Allison commented on TIKA-1422:
---
Sorry, user error. Needed to force update. Thank y
[
https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146588#comment-14146588
]
Tim Allison commented on TIKA-1396:
---
Y, I can think of a few options. We still need to a
[
https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1330:
--
Attachment: TIKA-1330v1-patch.zip
This is the first version of tika-batch. Much cleanup remains.
This f
[
https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121454#comment-14121454
]
Tim Allison edited comment on TIKA-1330 at 9/25/14 4:18 PM:
Sta
[
https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147922#comment-14147922
]
Tim Allison commented on TIKA-1330:
---
[~tilman], I leave it as an exercise to implement a
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151614#comment-14151614
]
Tim Allison commented on TIKA-1419:
---
Thank you! Let me know when I should run 1.8.8 v. 1
Tim Allison created TIKA-1433:
-
Summary: Extract documents embedded within annotations in PDFs
Key: TIKA-1433
URL: https://issues.apache.org/jira/browse/TIKA-1433
Project: Tika
Issue Type: New Fe
[
https://issues.apache.org/jira/browse/TIKA-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1433.
---
Resolution: Fixed
r1628350
> Extract documents embedded within annotations in PDFs
> -
[
https://issues.apache.org/jira/browse/TIKA-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1414.
---
Resolution: Not a Problem
> How to extract embedded images from PDFs?
> ---
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1427.
---
Resolution: Fixed
r1628354.
Let me know if the markup is sufficient for your needs.
> PDF Images don'
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened TIKA-1427:
---
Assignee: Tim Allison
Will modify to behave exactly as msoffice PDF Images don't appear in structure
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1427.
---
Resolution: Fixed
r1628707.
Made inline image tags equivalent to those created by Word parser. Let me
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158170#comment-14158170
]
Tim Allison commented on TIKA-1427:
---
We're currently iterating through the images once we
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158223#comment-14158223
]
Tim Allison commented on TIKA-1427:
---
On at least one test doc, I'm getting correct behavi
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158223#comment-14158223
]
Tim Allison edited comment on TIKA-1427 at 10/3/14 5:33 PM:
On
[
https://issues.apache.org/jira/browse/TIKA-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160312#comment-14160312
]
Tim Allison edited comment on TIKA-1437 at 10/6/14 2:04 PM:
No
[
https://issues.apache.org/jira/browse/TIKA-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160312#comment-14160312
]
Tim Allison commented on TIKA-1437:
---
No encoding detector will be perfect.
Are you sur
[
https://issues.apache.org/jira/browse/TIKA-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165065#comment-14165065
]
Tim Allison commented on TIKA-1439:
---
Hi [~sunxingzhe359],
Thanks to your post with test
[
https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1419:
--
Attachment: pdfbox_1_8_6V1_8_8-SNAPSHOT.zip
[~tilman], sorry for my delay. This contrasts Tika 1.7 trunk
[
https://issues.apache.org/jira/browse/TIKA-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165592#comment-14165592
]
Tim Allison commented on TIKA-1427:
---
Hmmm...I'm not able to grab the wmf embedded image f
1 - 100 of 9796 matches
Mail list logo