alexey-pelykh commented on PR #1947: URL: https://github.com/apache/tika/pull/1947#issuecomment-2363832999
> Y, that makes sense. If we treat it as a container, file though, we could make up/find a mime type for fatmachO (application/x-fat-mach-o I just made that up...is there an existing mime type?) and then the attachments/embedded files would have their own correct mime types. I've seen `application/x-mach-binary` and `application/x-mach-o` only, however I'd be more than happy to come up with more definite ones, alike the ELF-ones. Mach-O also can be shared lib, executable, etc. So technically we should've been doing `application/x-mach-o-executable` `application/x-mach-o-sharedlib` and so on. For Fat I've seen `application/x-mach-universal` but following the proposed structure it would've been `application/x-mach-o-universal` > If at all possible, we should try to use magic to distinguish fat machO from the other cafebabes. The structure of fat Mach-O is quite [vague](https://book.hacktricks.xyz/macos-hardening/macos-security-and-privilege-escalation/macos-files-folders-and-binaries/universal-binaries-and-mach-o-format#fat-header) (and [this](https://opensource.apple.com/source/xnu/xnu-123.5/EXTERNAL_HEADERS/mach-o/fat.h)), only deep validation by code can help. So ideally I'd use ExecutableParser as priority and if it fails - try other magic-matching `cafebabe`'s > It is tricky if you're new to Tika. :) That I've noticed :) > I can try to help if you can create the skeleton for this file type I'd gladly do so, the container is quite simple. It's 0xCAFEBABE + uint32t of number of headers and every header just contains cpu/arch/type flags -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org