Re: [Opencast Matterhorn] Mediapackage element #proposal

Christoph Drießen Thu, 14 Feb 2013 02:08:02 -0800

Hi Ruben,

don't apologize for taking some time to write an answer. An elaborate answer is 
much better than just firing off postings to get out a reponse fast. And this 
topic needs some thinking.


It seems that the discussion is now reduced to the point if the media package 
should know which elements make up a certain publication. Am I right?

First I'd like to define (maybe again) these terms:

publication
A consumable representation of a media package, either complete or partial. Not 
necessecarily open to the public, maybe protected by some security mechanism.
Examples: YouTube page, Engage player page, RSS feed

channel
The logical entity that serves a publication.
Examples: Engage-Player, YouTube, Engage-RSS

Now to the points:

1) Which parts of a media package a channel uses to make up it's publication is 
an implementation detail that's only subject to the channel. Which parts are 
taken may even vary over time when a channel's implementation or capabilities 
change.
 
In a component oriented architecture like we have in Matterhorn each component 
has its own set of responsibilities. Therefore a channel _must not_ expose any 
of its publication details because the rest of Matterhorn _must not_ deal with 
it. Whenever you have to deal with publications the system _has_ to ask the 
respective channel (service). This is due to the design principle of separating 
concerns. Big systems like Matterhorn have to follow these principles (and they 
are already put in place through service orientation) to avoid an 
unmaintainable mess.

Let me quote you:

> Besides, your proposed scheme does not eliminate the need for a publishing 
> service to keep a relation between every single publication and the "source" 
> material for that publication. It just hides it, leaving it to each specific 
> implementation.

That's true and as I explained above there are very good reasons to hide it.

> I think this is bad in the long run, because if you hide such details, then 
> it is more difficult to coordinate the actions of different services. 

I disagree as described in the first paragraph. Only a clear separation of 
concerns leads to a good and stable architecture that stays maintainable in the 
long run. Can you imagine such a use case?

> And, since it's not obvious by examining the Mediapackage whether that/those 
> element(s) was/were published somewhere else, you need to poll all the 
> publishing services, informing them that this/these elements has/have been 
> modified, please republish.

This is the nature of a component oriented design.

Software is always subject to change. If the core has the responsibility to 
decide which channels to call for e.g. republishing based on former entries in 
the media package how should we deal with evolving capabilites of a channel? 
Take the following scenario:

a) There is a media package A (track, episode_dc, series_dc). Channel A uses 
"track" and the title of "episode_dc" and notes this in the media package.

b) Channel A evolves and now also takes some fields of "series_dc". 

c) "series_dc" is edited and the media package is republished. The core now 
does _not_ call channel A since in the media package it is only recorded that 
the channel uses "track" and "episode_dc". 

A possible solution could be to create a _new_ channel B instead of modifying 
channel A. But I don't think this is a very practible solution.

3) How source elements of a media package are used depends on the channel. Some 
may copy the source elements as is and some may just extract parts of some 
elements, create a digest or transform them and put it in place in a different 
way. How to deal with this? An example media package contains (track, 
episode_dublincore, series_dublincore). Publication to an imaginary channel A 
results in a complete copy of "track" to some server and the extraction of a 
title from "series_dublincore" and "episode_dublincore" which are stored in a 
database. What should the media package contain? The old approach of adding a 
new element to the media package that points to the distribution by a URI does 
not work anymore since the metadata files aren't copied. They're just _used_. 
This would require a new element that only references a media package element 
but has no URI on its own. 

If channels are supposed to write such references to the media package (and we 
therefore follow you're proposal) we open the door for code that circumvents 
the "right" path of using the channels but starts messing around with these 
references directly. 

To conclude: If we really want channels to note which media package elements 
they've used to create their publications it must be clear that this is for 
display purposes, e.g. in the admin UI only. No code should ever rely on these 
entries.

Christoph




Am 12.02.2013 um 14:40 schrieb Rubén Pérez <rubenpe...@teltek.es>:

> Dear all,
> 
> Again, I have to start my mail apologising. First, again, for the long time 
> writing this mail took me (even though I intend to keep it short, I 
> promise!). Second, I'd like to apologise to Cristoph because it looked as if 
> I totally neglected his answer to my email, and the reality is that somehow 
> that mail came in later (my Gmail account is doing weird things lately), and 
> I only saw it when I was re-reading the other mails. So let me come back to 
> your answers first :)
> 
> I see your point. An alternative to "presentation" may be "publication".
> 
> Nooow we are talking! :)
> 
> The idea with the "presentation/publication" element is different. It is 
> intended to describe a _logical_ presentation of the media package und should 
> _not_ deal with technical or implementation details like where to find the 
> actual media files. So in the case of Engage there's only one element, like 
> there would be only one element for YouTube. Take the YouTube example: Here 
> there would also be no reference to the actual media file. This is totally 
> subject to YouTube presentation layer to create the page to view the video in 
> a way that the embedded player knows where the media is located. This should 
> be the same with Engage. The entry only points to the presenting page. How 
> the player then knows where to fetch the media files from is an 
> implementation detail that should not be handled by the media package 
> structure. That being said the search index needs to expose the URLs to the 
> media files seperately now. Currently (if I'm not totally wrong) the URLs are 
> parsed from the returned media package XML. But that's acutally a design flaw 
> and should be corrected anyway.
> 
> I don't mean for the publications (allow me to use the alternative I like 
> best :) ) to reference "actual" files, but I see like a good thing that 
> publications point out to the tracks in the Mediapackage that are 
> distributed. Because, taking also the Youtube example, the files are sent to 
> an external server and we do not know anything else about them, but we do 
> know which Tracks (and metadata) we have sent in the first place. And we must 
> agree that the publication in Youtube is just a view or a (re)presentation of 
> those tracks and catalogs. Why not keeping mutual references between the 
> mediapackage elements and their representations? From a human perspective, it 
> makes it easier to know which elements have been published and where, while 
> it also makes it easier for the services to know which publication(s) depend 
> on which element(s) a vice-versa.
> 
> Besides, your proposed scheme does not eliminate the need for a publishing 
> service to keep a relation between every single publication and the "source" 
> material for that publication. It just hides it, leaving it to each specific 
> implementation. I think this is bad in the long run, because if you hide such 
> details, then it is more difficult to coordinate the actions of different 
> services. 
> 
> Take, for instance, the case in which you watch a video in the Engage player 
> and you discover there's something you have to edit. In order to change a 
> video/some metadata/whatever, you need to know the element ID and the 
> Mediapackage ID (now the latter happens to be the same as the engage ID, but 
> it's just a happy coincidence), so you need to ask the search service about 
> that info. Then you reprocess the element(s) and republish it/them. And, 
> since it's not obvious by examining the Mediapackage whether that/those 
> element(s) was/were published somewhere else, you need to poll all the 
> publishing services, informing them that this/these elements has/have been 
> modified, please republish.
> Of course, that edition may just be intended only for the engage. You never 
> know how it is going to affect the other publishing services, since you don't 
> know which services' publications are based in the element(s) you have just 
> modified, since you don't know where that/those specific element(s) has/have 
> been published. What is worse, you don't know if you want that change to be 
> exclusive to Engage. Maybe that edition would be OK for Youtube, too, maybe 
> not, but if you want to know you need to explicitly ask: "Hey Youtube, do you 
> know something about this stuff?", and make your decision later, because you 
> don't have this information until you ask it explicitly.
> Depending on which publishing services are serving the same element(s), you 
> may want to create a duplicate, or you may want to simply substitute it/them 
> by this/these new version(s), depending on the circumstances, but you can't 
> make a well-informed decission without having the whole picture, and you need 
> to explicitly gather information from all the services involved in order to 
> get the whole picture. Which is just the situation we have now and that the 
> archive tries to solve, that if you want to get the whole Mediapackage, you 
> need to ask the workflow service and the search service and all the potential 
> services that may have copied parts of the mediapackage that might have been 
> deleted after the workflow cleanup.
> 
> In conclusion, I think that knowing the relation between a certain 
> publication and the Mediapackage elements it depends on, but also whether a 
> certain element has been published to a certain "publication" (if you allow 
> the repetition) is crucial to make an efficient repository management. This 
> is not alluding to the internal implementation of your system, but to the 
> structure of you Mediapackage: what is there, where it is exactly and which 
> view(s) of that your audience has.
> 
> I was going to address Tobias' remarks explicitly, but I think that I sort of 
> have addressed them already. 
> 
> Rubén Pérez Vázquez
> 
> www.teltek.es
> 
> 
> 
> 2013/2/8 Tobias Wunden <tob...@entwinemedia.com>
> Hi Ruben,
> 
> > I'd like to briefly state my general opinion here: I don't think that this 
> > proposal is not necessary; quite the contrary, I do think that we lack the 
> > ability to keep track of where our media has been published. What I'm 
> > saying is that we need to be exhaustive about it, not just note down all 
> > the (so to speak) "external references" to our media with no more hierarchy 
> > than "who put those references there", because such references may depend 
> > on each other in ways that "those who put them there" may not understand 
> > (or even should not).
> 
> In order to stick to good old tradition, I can't agree with you :-) In my 
> opinion, the mediapackage is not the place to store how excatly the 
> distribution channel was doing its work. It is up to the channel 
> implementation to keep track of the inner mechanics if that should be needed 
> for retraction. The <dict> section in the presentation element already gives 
> the channel a chance to store some metadata or information it may need to get 
> back to.
> 
> > Let me explain this with your specific example: when you distribute your 
> > files via streaming, you *still* have to distribute those files to the 
> > streaming server AND publish them (announce them, include them in the 
> > search index, create some RSS pointing at the player, etc). Even though the 
> > file is not directly accessible (downloadable), this does not change the 
> > fact that if you retract the files from the streaming server, all the other 
> > stuff is invalid and outdated.
> 
> One of the overall design goals of the proposal is to *decouple* dependencies 
> that had so far been interpreted into various aspects of the mediapackage, so 
> I would not agree that there should be a dependency between the 
> presentations. One may publish to an LMS and Engage, and even if those two 
> presentations would rely on the same Apache or Red5 serving the files, I 
> would not want that to be expressed in the mediapackage, because now you are 
> storing part of your infrastructure setup in the mediapackage which may 
> change at any time, resulting in invalid mediapackages.
> 
> > I see two ways to go:
> >       • We do not care about this interdependency. If we retract media from 
> > the streaming server,  we must explicitly "unpublish" the reference(s) to 
> > that media, too. So this is a manual process, that will become more 
> > complicated as the number of available channels grows.
> 
> Here I see a difference in the ways we are thinking. You are talking about 
> retracting from the streaming server. The streaming server however is not a 
> "presentation". It is merely a helper service to Engage, your LMS and so on. 
> So you would retract from Engage or the LMS, not from the streaming server. 
> It is then up to the infrastructure setup to keep track of how many clients 
> rely on a stream (for example by introducing a counter that is increased 
> every time a service is putting the same file here. Only if that counter is 
> down to zero, which is after all "presentation" have been retracted can the 
> file be deleted).
> 
> >       • We make the distribution services notify the publishing services 
> > about certain media being removed, so that they can "unpublish" their 
> > references to the media.
> 
> This again seems to be looking the other way. If you keep looking from the 
> other side, the presentation services would inform the file hosting service 
> that it no longer needs that file (because it has just been retracted). If 
> the file hosting service receives enough "retract" notifiactions as it 
> received "publish" notifications, it may safely delete the file.
> 
> Looking forward to your's (and anyone else's) thoughts,
> 
> Tobias
> _______________________________________________
> Matterhorn mailing list
> Matterhorn@opencastproject.org
> http://lists.opencastproject.org/mailman/listinfo/matterhorn
> 
> 
> To unsubscribe please email
> matterhorn-unsubscr...@opencastproject.org
> _______________________________________________
> 
> _______________________________________________
> Matterhorn mailing list
> Matterhorn@opencastproject.org
> http://lists.opencastproject.org/mailman/listinfo/matterhorn
> 
> 
> To unsubscribe please email
> matterhorn-unsubscr...@opencastproject.org
> _______________________________________________

_______________________________________________
Matterhorn mailing list
Matterhorn@opencastproject.org
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
matterhorn-unsubscr...@opencastproject.org
_______________________________________________

Re: [Opencast Matterhorn] Mediapackage element #proposal

Reply via email to