[jira] [Commented] (TIKA-3464) Is it possible to extract individual pdf pages using Tika Server?

Tim Allison (Jira) Wed, 07 Jul 2021 08:12:05 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376633#comment-17376633
 ]


Tim Allison commented on TIKA-3464:
-----------------------------------

Can you use the xhtml output?  That marks page breaks with <div/> elements.

> Is it possible to extract individual pdf pages using Tika Server?
> -----------------------------------------------------------------
>
>                 Key: TIKA-3464
>                 URL: https://issues.apache.org/jira/browse/TIKA-3464
>             Project: Tika
>          Issue Type: Wish
>          Components: server
>            Reporter: Sal
>            Priority: Trivial
>
> I was wondering if there exists the ability to call the Tika Server and get 
> back the text as individual pages, instead of all grouped together in a 
> single text file.  I just need to know where each pdf page begins and ends in 
> the output and it's not obvious from the text output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TIKA-3464) Is it possible to extract individual pdf pages using Tika Server?

Reply via email to