+1 I just finished the run against 38k documents. We're getting more attachments from doc files, and ~251 ppt files are no longer throwing exceptions.
I did discover a potential multithreading issue in ppts, but I can only reproduce it so far with tika-app in batch mode when I run against files sorted by mime type (all ppts at once). I can reproduce it for 3.13 with the same set up (tika-app, batch mode with a list of files sorted by mime type). I can't reproduce it yet in junit. I'll open an issue on our tracker for that. Cheers, Tim