[ 
https://issues.apache.org/jira/browse/TIKA-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883302#comment-17883302
 ] 

ASF GitHub Bot commented on TIKA-4309:
--------------------------------------

alexey-pelykh commented on PR #1947:
URL: https://github.com/apache/tika/pull/1947#issuecomment-2363927825

   
([source](https://github.com/duderman/mimetype/blob/d3851d6e3e6b3a0f40c3b56d7fe38f01d0938f90/internal/matchers/binary.go#L19-L27)))
 
   ```go
   // Class matches a java class file.
   func Class(in []byte) bool {
        return classOrMachO(in) && in[7] > 30
   }
   
   // MachO matches Mach-O binaries format
   func MachO(in []byte) bool {
        return classOrMachO(in) && in[7] < 20
   }
   ```
   
   this approach relies on testing the 8th byte, or in other words for far 
Mach-O 
   ```
   struct fat_header {
        uint32_t        magic;          /* FAT_MAGIC */
        uint32_t        nfat_arch;      /* number of structs that follow */
   };
   ```
   the `nfat_arch` and I can't find thus far any reasonable explanation why 
counter is limited by 20. For Java Class, that's a lsb byte of uint16_t `major` 
version - also makes no sense why 30 is the threshold. Seem like weird empiric 
values to me.
   
   To test for Mach-O universal, we could look for 0xCAFEBABE or 0xCAFEBABF, 
get offset of the first Mach-O from the first struct, and verify that it's a 
Mach-O. Does Tika's XML allow "read uint and read second it at first ints 
location"?




> ExecutableParser: support MachO
> -------------------------------
>
>                 Key: TIKA-4309
>                 URL: https://issues.apache.org/jira/browse/TIKA-4309
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Alexey Pelykh
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to