On Tue, Dec 16, 2014 at 2:11 AM, Arnold, Garth <[email protected]> wrote: > Hello Marius - thank you for the detailed reply. My goal is (2) - to find all > documents with a .7z attachment, where those attachments include file(s) > containing "foo". If I read your email correctly, Tika 1.6 (5) is root cause > for my failure to search successfully for text within the files contained in > a .7z attachment. I am successful with my search when using a .zip file as > the attachment - so we will instruct wiki users to avoid .7z attachments.
Yes, at least until we upgrade to Tika 1.7. Thanks, Marius > > Garth > >> -----Original Message----- >> Message: 2 >> Date: Thu, 11 Dec 2014 08:42:20 +0200 >> From: Marius Dumitru Florea <[email protected]> >> To: XWiki Users <[email protected]> >> Subject: Re: [xwiki-users] XWiki search/Solr support for additional >> filetypes >> Message-ID: >> <CALZcbBbprk=SJjhqGKKX1tx-TcMQbcq+qby6ZfnQqXZ- >> [email protected]> >> Content-Type: text/plain; charset=UTF-8 >> >> It depends what you mean by "search attachments that are 7-Zip .7z >> archives": >> >> (1) Give me all the documents that have an attachment of mime type >> application/x-7z-compressed >> (2) Give me all the documents that have a 7-Zip archive attached that >> includes a file that contains the word "foo" >> >> If you use Solr, the default search engine for XWiki 6.2.4, then the >> code that is responsible for indexing the attachments is >> AttachmentSolrMetadataExtractor [1]. This is a component so it can be >> overridden as per [2]. The current implementation uses Tika [3] to: >> >> (1) detect the mime type of the attachment >> (2) extract indexable content from the attachment (whatever its mime >> type may be) >> >> For (1) Tika supports detecting the 7-Zip mime type since version 1.2 >> [4]. For (2) judging by [5] it seems Tika also supports reading 7-ZIP >> archives but there were some issues in 1.6 that have been fixed in >> 1.7. We are currently using Tika 1.6 in XWiki. We should probably >> upgrade. >> >> Hope this helps, >> Marius >> >> [1] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform- >> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform- >> search-solr- >> api/src/main/java/org/xwiki/search/solr/internal/metadata/AttachmentSolr >> MetadataExtractor.java >> [2] >> http://extensions.xwiki.org/xwiki/bin/view/Extension/Component+Module >> #HOverrides >> [3] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform- >> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform- >> search-solr- >> api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMet >> adataExtractor.java#L458 >> [4] https://issues.apache.org/jira/browse/TIKA-940 >> [5] https://issues.apache.org/jira/browse/TIKA-1411 >> >> On Wed, Dec 10, 2014 at 9:20 PM, Arnold, Garth <[email protected]> wrote: >> > Hello - is it possible to enable searching of additional filetypes within >> > XWiki >> 6.2.4? Specifically I would like to be able to search attachments that are >> 7-Zip >> .7z archives. It looks to me as though the underlying library (Commons >> Compress) supports this filetype, but I am a new XWiki user and non-java >> programmer so I may be assuming too much. >> > >> > Thanks in advance for your thoughts on this - >> > >> > Garth Arnold > > > ________________________________ > > GHC Confidentiality Statement > > This message and any attached files might contain confidential information > protected by federal and state law. The information is intended only for the > use of the individual(s) or entities originally named as addressees. The > improper disclosure of such information may be subject to civil or criminal > penalties. If this message reached you in error, please contact the sender > and destroy this message. Disclosing, copying, forwarding, or distributing > the information by unauthorized individuals or entities is strictly > prohibited by law. > > _______________________________________________ > users mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/users
