alexey-pelykh commented on PR #1947: URL: https://github.com/apache/tika/pull/1947#issuecomment-2363927825
([source](https://github.com/duderman/mimetype/blob/d3851d6e3e6b3a0f40c3b56d7fe38f01d0938f90/internal/matchers/binary.go#L19-L27))) ```go // Class matches a java class file. func Class(in []byte) bool { return classOrMachO(in) && in[7] > 30 } // MachO matches Mach-O binaries format func MachO(in []byte) bool { return classOrMachO(in) && in[7] < 20 } ``` this approach relies on testing the 8th byte, or in other words for far Mach-O ``` struct fat_header { uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ }; ``` the `nfat_arch` and I can't find thus far any reasonable explanation why counter is limited by 20. For Java Class, that's a lsb byte of uint16_t `major` version - also makes no sense why 30 is the threshold. Seem like weird empiric values to me. To test for Mach-O universal, we could look for 0xCAFEBABE or 0xCAFEBABF, get offset of the first Mach-O from the first struct, and verify that it's a Mach-O. Does Tika's XML allow "read uint and read second it at first ints location"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org