We are running into the problem described 
here: 
http://dspace.2283337.n4.nabble.com/Filter-media-on-PDFs-exported-from-Outlook-causes-a-TikaException-error-and-prevents-Items-from-inde-td4683489.html
 
, where the *.pdf.txt files output by the PDF Text Extractor media filter 
for a couple of PDFs in our repository causes indexing to fail for not just 
the PDF full text, but all the associated metadata. (In our case, the PDFs 
were not output from Microsoft Outlook mail folders, but I'm seeing the 
same "org.apache.tika.exception.TikaException: Failed to parse an email 
message" in the dspace log file.)

The posting at the URL above refers to a work-around by creating an 
exclusion list for filter-media. But I can find any documentation on how to 
create an exclusion list. Can someone point me to that?

Thanks, Kerry

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/85e9b754-31d4-4558-8bde-071facdf9d0bn%40googlegroups.com.

Reply via email to