Hi all,
On Friday at the Un-Conference, I mentioned how we had series catalog data loss
from capture agents that unintentionally modified the series catalog.The
capture agent parses the series catalog received from Matterhorn into a
structure that is not able preserve all the elements. The unparsed elements are
lost when the series catalog is reconstructed for its return trip to Matterhorn
via ingest.
I request feedback from the community on Matterhorn policy for catalog updates
by the ingest service. My desire is to maintain the ability for Matterhorn to
create new catalogs on ingest, but not allow updates to series or episode
catalogs using the ingest service. Capture agents that create episode and
series fresh from the capture agent interfaces would continue to be created in
Matterhorn. But, capture agents that support editing the series and episode
catalogs would need to use the recording and series REST endpoints directly.
I put a potential patch http://opencast.jira.com/browse/MH-928 (just 5 lines
new code, adds 1 try catch and 1 extra method call, see second diff below).
But there is a conflict from the header comments (see first diff below). The
comment conflict is "potentially modified dublin core document" series data
should be honored by the ingest service. This is the policy that needs
clarification from the community.
// "-" signifies a current line proposed to be removed, "+" signifies the
replacement
// The IngestServiceImpl.updateSeries() comment change:
- * Updates the persistent representation of a series based on a potentially
modified dublin core document.
+ * Only create a series if it does not exist. This ensures that the series
exists for the ingested
+ * package's reference to isPartOf, yet prevents more current data in the
system from being overwritten.
// The IngestServiceImpl.updateSeries() code change proposal:
- seriesService.updateSeries(dc);
+ try {
+ // test if series exists
+ seriesService.getSeries(id);
+ logger.debug("Series id {} already exists. Ignoring series catalog
from ingest.", id);
+ } catch (NotFoundException nf) {
+ // safe to create series
+ seriesService.updateSeries(dc);
+ isCreated = true;
+ logger.info("Ingest created new series with id {} ", id);
+ }
FYI - From previous code commit comments, the reason for series update/create
at this early point in the ingest process is that ingests associated to a
series but found in Matterhorn, puts the ingest in error. The embedded series
catalog cannot be processed at the regular ingest zip file processing, because
the threads launched there cannot flush the new series in time for the
package's isPartOf to find the series match. The ingest fails if the series is
not created prior to processing the ingest zip.
During the Friday Matterhorn Un-conference, there was an idea of de-coupling
rich metadata from Matterhorn versus enabling Matterhorn to accommodate it. It
would be helpful to investigate how Matterhorn could accommodate at least the
transport and integrity of rich metadata so it can be harvested as supplemental
data to the event media. This would prevent the requirement of placing
Matterhorn in a metadata infrastructure. It allows Matterhorn to be more self
sufficient and self-standing.
- Karen
_______________________________________________
Matterhorn mailing list
Matterhorn@opencastproject.org
http://lists.opencastproject.org/mailman/listinfo/matterhorn
To unsubscribe please email
matterhorn-unsubscr...@opencastproject.org
_______________________________________________