Hi,

Unfortunately Audio and Video is what MH is about. As I mentioned I'm not sure that Tika is being used correctly for mimetype detection.

I download the runnable jar file both 1.1 and the current 1.2. Both seem to handle the mpg file that demonstrated the original issue without a problem.

modules/matterhorn-workflow-service-impl/src/main/resources/sample/camera.mpg

I will try and play around with the code to see if I can get Tika to work in the test case.

Regards
James


On 12/04/2012 08:04 AM, Lukas Rohner wrote:
Hi,

I was probably the last person working with Tika and mime type
detection, I improved the mime type detection a bit and updated Tika to
version 1.1. With version 1.1 Tika introduced an osgi library, because
of that it looks a bit strange to use in the code with this service
dependencie, but in fact it's better than before, we don't have to
include Tika to every bundle itself now.

Tika is a really good library for mime type detection, but
unfortunately not for audio video at the moment, this is why matterhorn
is shipping it's own mime type detection. And yes there are different
other mime type detection libraries which can be used. So the best
approach may be using several of the libraries together and write it
down in a Util class, because at the moment it's not clear where in the
code is used which mime type detection.

Lukas Rohner

Am 29.11.2012 um 10:19 schrieb Rubén Pérez <[email protected]
<mailto:[email protected]>>:

I have just checked THIS PAGE
<http://www.rgagnon.com/javadetails/java-0487.html>. For what I'm
seeing, it seems to me that using Tika for *only* MimeType detection
is like "killing flies with a cannon". The "real" metadata detection
is done with MediaInfo, which is one of the best metadata detection
tools, and its only flaw (for what we want) is it does not return the
MimeTypes.

If I were you, I would switch to a library with less dependencies and
more specific than Tika. The activation .jar is already used in
Matterhorn, and the "fileNameMap" method seems to be Java's native
approach to the issue. On the other hand, JMimeMagic or mime-util
approaches seem lightweight, efficient and to the point. Of course, I
didn't test them myself, it's just the general impression.

Anything that can help us to prune the thick bush of dependencies that
Matterhorn already has is a really good thing. But, of course, this is
only my opinion.

Rubén Pérez
TELTEK Video Research
www.teltek.es <http://www.teltek.es/>



2012/11/28 James Perrin <[email protected]
<mailto:[email protected]>>

    Hi,

      I had a look at the following issue about video mpegs not being
    correctly identified. http://opencast.jira.com/__browse/MH-8288
    <http://opencast.jira.com/browse/MH-8288>

      Though the immediate solution was quite simple it raised some
    questions about whether mimetype identification was being done
    correctly and needs reviewing. I've no experience in this area so
    please correct me.

    The MediaInspectionServiceImpl is meant to make use of Apache Tika
    for initial inspection of files. I don't know anything about Tika
    but it seemed to attempt to get the mimetype in rather an odd way.
    The extractContentType() fn gives the input file as a stream to a
    Tika parser which then returns a metadata object from which the
    mimetype is obtained by querying the Content type of the
    httpheader in the meta data. OK that may work.

    However in inspectTrack() which calls extractContentType() there
    is a comment saying the library doesn't detect audio and video
    metadata!? Indeed in the issue I was looking at it returned
    application/octet-stream.

    The code then defaults to using opencasts own MimeType class which
    matches the mimetype by file extension (this is where the original
    problem was with the extension associated wih multiple mimetypes).

    This may a way of using Tika but there is a more direct method
    using Tika MimeTypes class. It looks that the Tika library should
    be quite capable of detecting the mimetype correcty from the file.
    Could just replace the opencast mimetype[s] classes altogether?

    Regards
    James


    --
    ------------------------------__------------------------------__------------
     James S. Perrin

     Media Technologies Team
     Devonshire House, University Precinct
     The University of Manchester
     Oxford Road, Manchester, M13 9PL

     t: +44 (0) 161 275 6945 <tel:%2B44%20%280%29%20161%20275%206945>
     e: [email protected]
    <mailto:[email protected]>
     w: www.manchester.ac.uk/__researchcomputing
    <http://www.manchester.ac.uk/researchcomputing>
    ------------------------------__------------------------------__------------
    "The test of intellect is the refusal to belabour the obvious"
    - Alfred Bester
    ------------------------------__------------------------------__------------
    _________________________________________________
    Matterhorn mailing list
    [email protected] <mailto:[email protected]>
    http://lists.opencastproject.__org/mailman/listinfo/__matterhorn
    <http://lists.opencastproject.org/mailman/listinfo/matterhorn>


    To unsubscribe please email
    matterhorn-unsubscribe@__opencastproject.org
    <mailto:[email protected]>
    _________________________________________________


_______________________________________________
Matterhorn mailing list
[email protected] <mailto:[email protected]>
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________



_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________


--
------------------------------------------------------------------------
 James S. Perrin

 Media Technologies Team
 Devonshire House, University Precinct
 The University of Manchester
 Oxford Road, Manchester, M13 9PL

 t: +44 (0) 161 275 6945
 e: [email protected]
 w: www.manchester.ac.uk/researchcomputing
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

Reply via email to