Apologies for being absent... The xsb issue is why we haven't upgraded to 5.x on Tika yet. I _think_ we'd like to avoid the ooxml-full jar, but if that's the most robust option, we'll have to go with that.
I'm also happy to grab new files, or run against our corpus if that'd be of any use. Many thanks! Best, Tim On Tue, Oct 19, 2021 at 11:57 AM Dominik Stadler <dominik.stad...@gmx.at> wrote: > > Hi, > > You are right, we discussed the reasons before, I am sorry. > > LG... Dominik. > > On Mon, Oct 18, 2021 at 9:06 PM Andreas Beeker <kiwiwi...@apache.org> wrote: > > > Hi Dominik, > > > > > Would be nice to know what changes made them required now as the > > documents themselves did not change. > > > > I think you know, but to be safe, I mentioned it again. > > Originally only the classes where filtered, but I've added the .xsb > > filtering also a while ago, to minimize the lite jar further, > > because they are a substantial part of the jar. > > > > I could iterate through all document parts and try to see if those parts > > are parsed and check the structure recursive. > > > > On the other hand, I've optimized the xmlbeans schema generation, so the > > schema shrunk from 19mb to 13mb. > > There would be the option to split the schema jar to the sub schemas > > including the necessary dependencies to the base/abstract schemas - but I > > guess this will be even more confusing for the user base. > > > > Regarding the 107 documents, it's fairly easy to add the documents locally > > and see which deltas, i.e. not yet loaded .xsbs, are added from file to > > file, to identify the files which needed to be added to the integration > > directory. > > > > Andi. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org > > For additional commands, e-mail: dev-h...@poi.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org