Copilot commented on code in PR #2885:
URL: https://github.com/apache/tika/pull/2885#discussion_r3380637875
##########
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java:
##########
@@ -614,7 +614,13 @@ private void collectPictureSlides(ShapeContainer
container, int slideNum,
}
for (HSLFShape shape : shapes) {
if (shape instanceof HSLFPictureShape) {
- HSLFPictureData pd = ((HSLFPictureShape)
shape).getPictureData();
+ HSLFPictureData pd;
+ try {
+ pd = ((HSLFPictureShape) shape).getPictureData();
+ } catch (Exception e) {
+ // corrupt Escher BSE record -- skip page anchoring for
this shape
+ continue;
+ }
Review Comment:
Catching a blanket `Exception` here will also swallow unexpected failures
from POI/Tika and makes debugging harder; it also silently ignores the error.
Since the reported issue is an `IndexOutOfBoundsException` from corrupt picture
records, catch that specific exception and record it via `EmbeddedDocumentUtil`
before continuing (consistent with nearby embedded-resource handling).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]