On Tue, Dec 16, 2014 at 2:11 AM, Arnold, Garth <[email protected]> wrote:
> Hello Marius - thank you for the detailed reply. My goal is (2) - to find all 
> documents with a .7z attachment, where those attachments include file(s) 
> containing "foo". If I read your email correctly, Tika 1.6 (5) is root cause 
> for my failure to search successfully for text within the files contained in 
> a .7z attachment. I am successful with my search when using a .zip file as 
> the attachment - so we will instruct wiki users to avoid .7z attachments.

Yes, at least until we upgrade to Tika 1.7.

Thanks,
Marius

>
> Garth
>
>> -----Original Message-----
>> Message: 2
>> Date: Thu, 11 Dec 2014 08:42:20 +0200
>> From: Marius Dumitru Florea <[email protected]>
>> To: XWiki Users <[email protected]>
>> Subject: Re: [xwiki-users] XWiki search/Solr support for additional
>>                 filetypes
>> Message-ID:
>>                 <CALZcbBbprk=SJjhqGKKX1tx-TcMQbcq+qby6ZfnQqXZ-
>> [email protected]>
>> Content-Type: text/plain; charset=UTF-8
>>
>> It depends what you mean by "search attachments that are 7-Zip .7z
>> archives":
>>
>> (1) Give me all the documents that have an attachment of mime type
>> application/x-7z-compressed
>> (2) Give me all the documents that have a 7-Zip archive attached that
>> includes a file that contains the word "foo"
>>
>> If you use Solr, the default search engine for XWiki 6.2.4, then the
>> code that is responsible for indexing the attachments is
>> AttachmentSolrMetadataExtractor [1]. This is a component so it can be
>> overridden as per [2]. The current implementation uses Tika [3] to:
>>
>> (1) detect the mime type of the attachment
>> (2) extract indexable content from the attachment (whatever its mime
>> type may be)
>>
>> For (1) Tika supports detecting the 7-Zip mime type since version 1.2
>> [4]. For (2) judging by [5] it seems Tika also supports reading 7-ZIP
>> archives but there were some issues in 1.6 that have been fixed in
>> 1.7. We are currently using Tika 1.6 in XWiki. We should probably
>> upgrade.
>>
>> Hope this helps,
>> Marius
>>
>> [1] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
>> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
>> search-solr-
>> api/src/main/java/org/xwiki/search/solr/internal/metadata/AttachmentSolr
>> MetadataExtractor.java
>> [2]
>> http://extensions.xwiki.org/xwiki/bin/view/Extension/Component+Module
>> #HOverrides
>> [3] https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
>> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
>> search-solr-
>> api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMet
>> adataExtractor.java#L458
>> [4] https://issues.apache.org/jira/browse/TIKA-940
>> [5] https://issues.apache.org/jira/browse/TIKA-1411
>>
>> On Wed, Dec 10, 2014 at 9:20 PM, Arnold, Garth <[email protected]> wrote:
>> > Hello - is it possible to enable searching of additional filetypes within 
>> > XWiki
>> 6.2.4? Specifically I would like to be able to search attachments that are 
>> 7-Zip
>> .7z archives. It looks to me as though the underlying library (Commons
>> Compress) supports this filetype, but I am a new XWiki user and non-java
>> programmer so I may be assuming too much.
>> >
>> > Thanks in advance for your thoughts on this -
>> >
>> > Garth Arnold
>
>
> ________________________________
>
> GHC Confidentiality Statement
>
> This message and any attached files might contain confidential information 
> protected by federal and state law. The information is intended only for the 
> use of the individual(s) or entities originally named as addressees. The 
> improper disclosure of such information may be subject to civil or criminal 
> penalties. If this message reached you in error, please contact the sender 
> and destroy this message. Disclosing, copying, forwarding, or distributing 
> the information by unauthorized individuals or entities is strictly 
> prohibited by law.
>
> _______________________________________________
> users mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/users

Reply via email to