that's wonderful. thanks for that.
i'm concentrating on finishing up tika-pipes so i can get the removal PR
started.
getting very close - maybe set up a zoom sometime to chat

On Mon, Jan 27, 2025 at 9:50 AM Tim Allison <talli...@apache.org> wrote:

> I'm kicking off the regression tests for 3.x.
>
> Nicholas, I merged TIKA-4303 and cherry-picked it back to 3.x. I hope
> that's ok.
>
> On Fri, Jan 24, 2025 at 2:25 AM Tilman Hausherr <thaush...@t-online.de>
> wrote:
>
> > Hi,
> >
> > No opinion re release schedule but a comment on the PDFBox update:
> >
> > tl;dr: ignore the PDF differences this time.
> >
> > The new version includes the /ActualText support:
> > https://issues.apache.org/jira/browse/PDFBOX-5868
> >
> > It is always enabled. In most cases the extraction is better. But
> > sometimes content is lost because the feature is used for obfuscation
> > (see example in the issue above).
> >
> > Another major change is the detection of the space width:
> > https://issues.apache.org/jira/browse/PDFBOX-5920
> > It has been improved, however this will result in many differences with
> > angled texts if angle detection isn't enabled. Some scientific texts
> > with superscript prefix will also look different, "1 Coupled" will
> > extract as "1Coupled". This is because these fonts don't have a space
> > and the fallback we are using sucks.
> >
> > Tilman
> >
> > On 16.01.2025 14:20, Tim Allison wrote:
> > > Sorry, on second thought, a small tweak:
> > >
> > > I propose that we release 3.1.0 after PDFBox 3.x is released. I further
> > > propose that we make a 2.9.3 release at some point after the 3.1.0
> > release
> > > IF we get requests for a 2.x release...otherwise we'll do a final 2.x
> EOL
> > > release in April, 2025.
> > >
> > > On Thu, Jan 16, 2025 at 8:15 AM Tim Allison <talli...@apache.org>
> wrote:
> > >
> > >> All,
> > >>    It has been a while since we last released 2.x (April 2024) and 3.x
> > >> (October 2024). We've had a number of dependency updates. PDFBox is on
> > the
> > >> cusp of a new 3.x release.
> > >>    I propose that we release 3.1.0 after PDFBox 3.x is released and
> > that we
> > >> make a 2.9.3 release the following week.
> > >>    WDYT?
> > >>
> > >>              Best,
> > >>
> > >>                   Tim
> > >>
> >
> >
>

Reply via email to