[ 
https://issues.apache.org/jira/browse/TIKA-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087648#comment-18087648
 ] 

Tilman Hausherr edited comment on TIKA-4730 at 6/9/26 12:27 PM:
----------------------------------------------------------------

commoncrawl3/VD/VDC5WE3CODN23B5LIHUZ7BEUUWQNL6ZN

I ran the command line app in main and tika3, both have an exception, but tika3 
has a really nice HTML. It's possible that the file is broken, I tried two 
online readers and one shows only the title, the other one shows nothing.

(Same effect with other files of the same type, it's possible you've mentioned 
it somewhere, sorry)

 

commoncrawl3/P6/P6SEZFPJIUEKEVX2VTHLTPYPKNODUO42  charset problem

commoncrawl3/TE/TEC4JHGXSBFKKM3OBIWSRA6LZVM7YLXF charset improvement

commoncrawl3/JE/JE5YIFWTNTJSWSMCOJVHOHTQVSNZWICA charset improvement


was (Author: tilman):
commoncrawl3/VD/VDC5WE3CODN23B5LIHUZ7BEUUWQNL6ZN

I ran the command line app in main and tika3, both have an exception, but tika3 
has a really nice HTML. It's possible that the file is broken, I tried two 
online readers and one shows only the title, the other one shows nothing.

(Same effect with other files of the same type, it's possible you've mentioned 
it somewhere, sorry)

 

commoncrawl3/P6/P6SEZFPJIUEKEVX2VTHLTPYPKNODUO42  charset problem

commoncrawl3/TE/TEC4JHGXSBFKKM3OBIWSRA6LZVM7YLXF charset improvement

> Prep for 4.0.0-beta-1 release
> -----------------------------
>
>                 Key: TIKA-4730
>                 URL: https://issues.apache.org/jira/browse/TIKA-4730
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: reports-2020609.tgz, reports.tar.gz
>
>
> We made a number of important fixes to the published artifacts in ASF's dist 
> repo, maven central and docker.
> I think we're set on changing APIs for 4.x generally.
> Is there anything else we need for this beta release?
> I propose starting the 4.0.0-beta-1 release in two weeks. WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to