[ https://issues.apache.org/jira/browse/TIKA-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883302#comment-17883302 ]
ASF GitHub Bot commented on TIKA-4309: -------------------------------------- alexey-pelykh commented on PR #1947: URL: https://github.com/apache/tika/pull/1947#issuecomment-2363927825 ([source](https://github.com/duderman/mimetype/blob/d3851d6e3e6b3a0f40c3b56d7fe38f01d0938f90/internal/matchers/binary.go#L19-L27))) ```go // Class matches a java class file. func Class(in []byte) bool { return classOrMachO(in) && in[7] > 30 } // MachO matches Mach-O binaries format func MachO(in []byte) bool { return classOrMachO(in) && in[7] < 20 } ``` this approach relies on testing the 8th byte, or in other words for far Mach-O ``` struct fat_header { uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ }; ``` the `nfat_arch` and I can't find thus far any reasonable explanation why counter is limited by 20. For Java Class, that's a lsb byte of uint16_t `major` version - also makes no sense why 30 is the threshold. Seem like weird empiric values to me. To test for Mach-O universal, we could look for 0xCAFEBABE or 0xCAFEBABF, get offset of the first Mach-O from the first struct, and verify that it's a Mach-O. Does Tika's XML allow "read uint and read second it at first ints location"? > ExecutableParser: support MachO > ------------------------------- > > Key: TIKA-4309 > URL: https://issues.apache.org/jira/browse/TIKA-4309 > Project: Tika > Issue Type: New Feature > Reporter: Alexey Pelykh > Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)