Gregory Lepore created TIKA-4060:
------------------------------------

             Summary: Add magic to audio/aac in tika-mimetypes.xml
                 Key: TIKA-4060
                 URL: https://issues.apache.org/jira/browse/TIKA-4060
             Project: Tika
          Issue Type: Sub-task
            Reporter: Gregory Lepore
         Attachments: 
067aece423d8694a891a61a45ac0e870914bc1314ef510ac40b36ca3397843ef, 
cb1bec08898db7a733b42ac44bdd76b6177cd3a07a2435a83fd99b7453d564d1

Currently tika-mimetypes only recognizes audio/aac files by the file extension. 
PRONOM recently added support for identifying aac files, but the signature is 
tricky. There are two signatures, below in PRONOM format curly braces mean to 
look ahead between the two values for the subsequent patterns.

 

The first pattern is pretty basic, the second pattern is the first pattern 
after a 2048 ID3 header.

 
||Name|Audio Data Transport Stream sig.1|
||Description|An FF pattern from BOF with variation of byte stream|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)|
|
||Name|Audio Data Transport Stream sig.2|
||Description|ID3 tag variation with variable byte stream|
||Byte sequences|
||Position type|Absolute from BOF|
||Offset|0|
||Maximum Offset|0|
||Byte order| |
||Value|494433\{0-2045}FF(F0\|F1\|F8\|F9)(40\|41\|44\|45\|48\|49\|4C\|4D\|50\|51\|54\|55\|58\|59\|5C\|5D\|60\|61\|64\|65\|68\|69\|6C\|6D\|70\|71\|80\|81\|84\|85\|88\|89\|8C\|8D\|90\|91\|94\|95\|98\|99\|9C\|9D\|A0\|A1\|A4\|A5\|A8\|A9\|AC\|AD\|B0\|B1)(00\|01\|20\|40\|41\|60\|80\|81\|60\|A0\|C0\|C1\|E0)|
|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to