Thanks, Tilman. "This happens twice during build tests, of over 100 text extractions."
Thanks for explaining... From the code comment, I thought this was a general behavior of JDK7+ sort, but it sounds like it is only a problem in a rare edge case of specific compares. How exciting. Cheers, K On Tue, Dec 17, 2024, 2:22 AM Tilman Hausherr <thaush...@t-online.de> wrote: > Hi, > > Thank you, the change has been committed. > re 1: we'll see what happens... re "but it is code that needs to be > maintained" - that is a general problem. Sometimes it's even difficult > to maintain ones own code. > re 2: No because most of the time, the faster built-in sort works fine. > The slower mergesort is only used when the exception is thrown. This > happens twice during build tests, of over 100 text extractions. > > Tilman > > On 16.12.2024 15:55, Kevin Day wrote: > > I am attaching the patch file. > > > > And yes, this patch is simply PDFBOX-3774 as an option, a small > > cosmetic change to use idiomatic Java for PDFBOX-5487, and a unit test > > that demonstrates the overlapping. > > > > > > A couple of additional thoughts: > > > > 1. I feel that PDFBOX-5487 isn't doing very much. The PDFBOX-3774 > > feature will address the problem fixed by PDFBOX-5487, and the > > "problem" of having a space glyph entirely within the previous > > character is a very restricted edge-case. In the end, the performance > > hit is not a big deal, but it is code that needs to be maintained. I > > thought I'd mention it in case the PDFBOX-5487 requester would be > > happy with PDFBOX-3774 as a solution. > > > > 2. I noticed that there is a note about JDK7+ sorting > > requiring transitive comparators. Given that the build requires > > JDK8+, I wonder if it is time to remove the Collections.sort path (and > > get rid of an exception throw, etc...)? > > > > - K > > > > > > > > On Mon, Dec 16, 2024 at 6:21 AM Tilman Hausherr > > <thaush...@t-online.de> wrote: > > > > On 16.12.2024 14:02, Kevin Day wrote: > > > I just realized that there is an incorrect note in the > getter/setter > > > Javadocs about the setting only taking effect if sorting is > enabled. > > > > > > That note can be removed. The new setting is valid regardless of > > whether > > > sorting is enabled. > > > > Hi, > > > > Could you please resend the patch as text attachment? Somehow the > > mail > > program messed this up. > > > > From what I understand, the patch is the suggestion from > > PDFBOX-3774but > > as an option, plus a test. The other change (re PDFBOX-5487) is a > > (useful) cosmetic change. I wonder why I missed that when I > > committed it. > > > > Tilman > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail:users-unsubscr...@pdfbox.apache.org > > For additional commands, e-mail:users-h...@pdfbox.apache.org > >