Hi,
Unfortunately Audio and Video is what MH is about. As I mentioned
I'm not sure that Tika is being used correctly for mimetype detection.
I download the runnable jar file both 1.1 and the current 1.2. Both seem
to handle the mpg file that demonstrated the original issue without a
problem.
modules/matterhorn-workflow-service-impl/src/main/resources/sample/camera.mpg
I will try and play around with the code to see if I can get Tika to
work in the test case.
Regards
James
On 12/04/2012 08:04 AM, Lukas Rohner wrote:
Hi,
I was probably the last person working with Tika and mime type
detection, I improved the mime type detection a bit and updated Tika to
version 1.1. With version 1.1 Tika introduced an osgi library, because
of that it looks a bit strange to use in the code with this service
dependencie, but in fact it's better than before, we don't have to
include Tika to every bundle itself now.
Tika is a really good library for mime type detection, but
unfortunately not for audio video at the moment, this is why matterhorn
is shipping it's own mime type detection. And yes there are different
other mime type detection libraries which can be used. So the best
approach may be using several of the libraries together and write it
down in a Util class, because at the moment it's not clear where in the
code is used which mime type detection.
Lukas Rohner
Am 29.11.2012 um 10:19 schrieb Rubén Pérez <[email protected]
<mailto:[email protected]>>:
I have just checked THIS PAGE
<http://www.rgagnon.com/javadetails/java-0487.html>. For what I'm
seeing, it seems to me that using Tika for *only* MimeType detection
is like "killing flies with a cannon". The "real" metadata detection
is done with MediaInfo, which is one of the best metadata detection
tools, and its only flaw (for what we want) is it does not return the
MimeTypes.
If I were you, I would switch to a library with less dependencies and
more specific than Tika. The activation .jar is already used in
Matterhorn, and the "fileNameMap" method seems to be Java's native
approach to the issue. On the other hand, JMimeMagic or mime-util
approaches seem lightweight, efficient and to the point. Of course, I
didn't test them myself, it's just the general impression.
Anything that can help us to prune the thick bush of dependencies that
Matterhorn already has is a really good thing. But, of course, this is
only my opinion.
Rubén Pérez
TELTEK Video Research
www.teltek.es <http://www.teltek.es/>
2012/11/28 James Perrin <[email protected]
<mailto:[email protected]>>
Hi,
I had a look at the following issue about video mpegs not being
correctly identified. http://opencast.jira.com/__browse/MH-8288
<http://opencast.jira.com/browse/MH-8288>
Though the immediate solution was quite simple it raised some
questions about whether mimetype identification was being done
correctly and needs reviewing. I've no experience in this area so
please correct me.
The MediaInspectionServiceImpl is meant to make use of Apache Tika
for initial inspection of files. I don't know anything about Tika
but it seemed to attempt to get the mimetype in rather an odd way.
The extractContentType() fn gives the input file as a stream to a
Tika parser which then returns a metadata object from which the
mimetype is obtained by querying the Content type of the
httpheader in the meta data. OK that may work.
However in inspectTrack() which calls extractContentType() there
is a comment saying the library doesn't detect audio and video
metadata!? Indeed in the issue I was looking at it returned
application/octet-stream.
The code then defaults to using opencasts own MimeType class which
matches the mimetype by file extension (this is where the original
problem was with the extension associated wih multiple mimetypes).
This may a way of using Tika but there is a more direct method
using Tika MimeTypes class. It looks that the Tika library should
be quite capable of detecting the mimetype correcty from the file.
Could just replace the opencast mimetype[s] classes altogether?
Regards
James
--
------------------------------__------------------------------__------------
James S. Perrin
Media Technologies Team
Devonshire House, University Precinct
The University of Manchester
Oxford Road, Manchester, M13 9PL
t: +44 (0) 161 275 6945 <tel:%2B44%20%280%29%20161%20275%206945>
e: [email protected]
<mailto:[email protected]>
w: www.manchester.ac.uk/__researchcomputing
<http://www.manchester.ac.uk/researchcomputing>
------------------------------__------------------------------__------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------__------------------------------__------------
_________________________________________________
Matterhorn mailing list
[email protected] <mailto:[email protected]>
http://lists.opencastproject.__org/mailman/listinfo/__matterhorn
<http://lists.opencastproject.org/mailman/listinfo/matterhorn>
To unsubscribe please email
matterhorn-unsubscribe@__opencastproject.org
<mailto:[email protected]>
_________________________________________________
_______________________________________________
Matterhorn mailing list
[email protected] <mailto:[email protected]>
http://lists.opencastproject.org/mailman/listinfo/matterhorn
To unsubscribe please email
[email protected]
_______________________________________________
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn
To unsubscribe please email
[email protected]
_______________________________________________
--
------------------------------------------------------------------------
James S. Perrin
Media Technologies Team
Devonshire House, University Precinct
The University of Manchester
Oxford Road, Manchester, M13 9PL
t: +44 (0) 161 275 6945
e: [email protected]
w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn
To unsubscribe please email
[email protected]
_______________________________________________