alexey-pelykh commented on PR #1947:
URL: https://github.com/apache/tika/pull/1947#issuecomment-2363927825

   
([source](https://github.com/duderman/mimetype/blob/d3851d6e3e6b3a0f40c3b56d7fe38f01d0938f90/internal/matchers/binary.go#L19-L27)))
 
   ```go
   // Class matches a java class file.
   func Class(in []byte) bool {
        return classOrMachO(in) && in[7] > 30
   }
   
   // MachO matches Mach-O binaries format
   func MachO(in []byte) bool {
        return classOrMachO(in) && in[7] < 20
   }
   ```
   
   this approach relies on testing the 8th byte, or in other words for far 
Mach-O 
   ```
   struct fat_header {
        uint32_t        magic;          /* FAT_MAGIC */
        uint32_t        nfat_arch;      /* number of structs that follow */
   };
   ```
   the `nfat_arch` and I can't find thus far any reasonable explanation why 
counter is limited by 20. For Java Class, that's a lsb byte of uint16_t `major` 
version - also makes no sense why 30 is the threshold. Seem like weird empiric 
values to me.
   
   To test for Mach-O universal, we could look for 0xCAFEBABE or 0xCAFEBABF, 
get offset of the first Mach-O from the first struct, and verify that it's a 
Mach-O. Does Tika's XML allow "read uint and read second it at first ints 
location"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to