Gregory Lepore created TIKA-4054:
------------------------------------
Summary: Add various file identifications to reduce
application/octet-stream
Key: TIKA-4054
URL: https://issues.apache.org/jira/browse/TIKA-4054
Project: Tika
Issue Type: Sub-task
Reporter: Gregory Lepore
Catch all task for various format identification data which are currently being
identified as application/octet-stream. Most data is from PRONOM.
SPSS Data File
||External signatures|File extension: sav|
||Internal signatures|
||Name|SPSS Data File|
||Description|BOF: $FL2@(#)|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|24464C3240282329|
|
|
Amiga Disk File
||External signatures|File extension: adf|
||Internal signatures|
||Name|Amiga Disk File|
||Description|BOF: ‘DOS’ followed by ‘00\|01\|02\|03\|04\|05\|06\|07’ depending
on the format of the disk. More information on the internal signature can be
found here: http://lclevy.free.fr/adflib/adf_info.html#p41|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|444F53(00\|01\|02\|03\|04\|05\|06\|07)|
|
|
JEOL NMR Spectroscopy
||External signatures|File extension: jdf|
||Internal signatures| |
||Name|JDF NMR Spectroscopy big endian|
||Description|Big Endian: BOF: 4A454F4C2E4E4D52 (JEOL.NMR)|
||Byte sequences|
|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|4A454F4C2E4E4D52|
| | |
||Name|JDF little endian|
||Description|Little Endian: 524D4E2E4C4F454A (RMN.LOEJ)|
||Byte sequences| |
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|524D4E2E4C4F454A|
ASPRS Lidar Data Exchange Format
||External signatures|File extension: las
File extension: laz|
||Internal signatures|
||Name|ASPRS Lidar Data Exchange Format 1.2|
||Description|ASCII header: LASF, followed after 20 bytes by version number 1.2|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Byte order| |
||Value|4C415346\{20}0102\{78}[00:99]|
|
|
ASPRS Lidar Data Exchange Format v1.1
||External signatures|File extension: las
File extension: laz|
||Internal signatures|
||Name|ASPRS Lidar Data Exchange Format 1.1|
||Description|ASCII header: LASF, followed after 20 bytes by version number 1.1|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Byte order| |
||Value|4C415346\{20}0101\{78}[00:99]|
|
|
3D Studio
||External signatures|File extension: 3ds|
||Internal signatures|
||Name|3D Studio (V1)|
||Description|Primary chunk ID, chunk length, version subchunk ID, chunk
length, version, 3D-editor chunk ID.|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Byte order|Little-endian|
||Value|4D4D\{4}02000A000000(03\|04)\{3}3D3D|
|
||Name|3D Studio (V2)|
||Description|Primary chunk ID, chunk length, 3D-editor chunk ID|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|4D4D\{4}3D3D|
|
|
TAP (ZX Spectrum)
||External signatures|File extension: tap|
||Internal signatures|
||Name|TAPZX|
||Description|…\{20}ÿ|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|130000\{20}FF|
|
|
Sibelius
||External signatures|File extension: sib|
||Internal signatures|
||Name|Sibelius|
||Description|Absolute from beginning of file, magic bytes: .SIBELIUS|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|0F534942454C495553|
|
|
Portable Sound Format
||External signatures|File extension: psf
File extension: psf1
File extension: psflib
File extension: minipsf
File extension: minipsf1
File extension: gsf
File extension: gsflib
File extension: minigsf|
||Internal signatures|
||Name|Portable Sound Format|
||Description|BOF: PSFx, where x represents one of the following values for
which PSF has been adapted 4th byte: 0x01: Playstation (PSF1) 0x02: Playstation
2 (PSF2) 0x11: Sega Saturn (SSF) 0x12: Sega Dreamcast (DSF) 0x13: Sega Genesis
0x21: Nintendo 64 (USF) 0x22: GameBoy Advance (GSF) 0x23: Super NES (SNSF)
0x41: Capcom QSound (QSF) Format description:
http://web.archive.org/web/20140125155137/http://wiki.neillcorlett.com/PSFFormat|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|505346(01\|02\|11\|12\|13\|21\|22\|23\|41)|
|
|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)