Gregory Lepore created TIKA-4060:
------------------------------------
Summary: Add magic to audio/aac in tika-mimetypes.xml
Key: TIKA-4060
URL: https://issues.apache.org/jira/browse/TIKA-4060
Project: Tika
Issue Type: Sub-task
Reporter: Gregory Lepore
Attachments:
067aece423d8694a891a61a45ac0e870914bc1314ef510ac40b36ca3397843ef,
cb1bec08898db7a733b42ac44bdd76b6177cd3a07a2435a83fd99b7453d564d1
Currently tika-mimetypes only recognizes audio/aac files by the file extension.
PRONOM recently added support for identifying aac files, but the signature is
tricky. There are two signatures, below in PRONOM format curly braces mean to
look ahead between the two values for the subsequent patterns.
The first pattern is pretty basic, the second pattern is the first pattern
after a 2048 ID3 header.
||Name|Audio Data Transport Stream sig.1|
||Description|An FF pattern from BOF with variation of byte stream|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)|
|
||Name|Audio Data Transport Stream sig.2|
||Description|ID3 tag variation with variable byte stream|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|494433\{0-2045}FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)|
|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)