Great discussion and feedback in this thread - plenty to act on. Thanks Ted Clancy for kicking this off with an impassioned reality check. And Thanks in particular to Benjamin Francis for summarizing product requirements and use-cases, and especially to both Ted and Ben taking the time last week in Whistler to discuss all of this in person - I definitely came away with a better understanding of the data, problem space, and perspectives for Gaia's use-cases. I've also followed up with Gregor and jst to broaden and double-check my understanding and possible paths forward.
tl;dr: It's time. Let's land microformats parsing support in Gecko as a Q3 Platform deliverable that Gaia can use. Specifically: On Mon, Jun 29, 2015 at 2:47 PM, Benjamin Francis <bfran...@mozilla.com> wrote: > Thanks for the responses, > > Let me reiterate the Product requirements: > > 1. Support for a syntax and vocabulary already in wide use on the web to > allow the creation of cards for the largest possible volume of existing > pinnable content I think there's rough consensus that a subset of OG, as described by Ted, satisfies this. Minimizing our exposure to OG (including Twitter Cards) is ideal for a number of reasons (backcompat/proprietary maintenance etc.). > 2. Support for a syntax with a large enough and/or extensible vocabulary > to allow cards to be created for all the types of pinnable content and > associated actions we need in Gaia There appear to be multiple options for this, with the best (most open, aligned with our mission, already open source interoperably implemented, etc.) being microformats. On that in particular: > *Gaia Content* > > Open Graph does not have a large enough vocabulary, or (as Kelly says) the > ability to associate actions with content, needed for the second requirement The "associate actions with content" use-case is an interesting one that's worthy of more specific follow-up on Kelly's response. More on that separately. > Schema.org has a large existing vocabulary which basically > fulfils these use cases, though some parts are more tested than others, "fulfils" mostly in theory. Schema is 99% overdesigned and aspirational, most objects and properties not showing up anywhere even in search results (except generic testing tools perhaps). A small handful of Schema objects and subset of properties are actually implemented by anyone in anything user-facing. Everything else is untested, and claiming "fulfils these use cases" puts far too much faith in a company known for abandoning their overdesigned efforts (APIs, vocabularies, syntaxes!) every few years. Google Base / gData / etc. likely "fulfilled" these use cases too. > with examples given in Microdata, RDFa and JSON-LD syntaxes, eg: > > - Contact - http://schema.org/Person > - Event - http://schema.org/Event > - Photo - http://schema.org/Photograph > - Song - http://schema.org/MusicRecording > - Video - http://schema.org/VideoObject > - Radio station - http://schema.org/RadioChannel > - Email - http://schema.org/EmailMessage > - Message - http://schema.org/Comment This explicit list of use-cases is very helpful. Existing interoperably implemented microformats support most of these: - Contact - http://microformats.org/wiki/h-card - Event - http://microformats.org/wiki/h-event - Photo - http://microformats.org/wiki/h-entry with u-photo property - Song - no current vocabulary - classic hAudio vocabulary could be simplified for this - Video - http://microformats.org/wiki/h-entry with u-video property - Radio station - no current vocabulary - worth researching with schema RadioChannel as input - Email - http://microformats.org/wiki/h-entry with u-in-reply-to property - Message - http://microformats.org/wiki/h-entry For Song and Radio Station in particular - I will take the action of bringing these use-cases to the microformats community and see what the community can come up with, and how quickly. Discussion will be on #microformats on Freenode (archived, see microformats.org/wiki/irc) if anyone wants to contribute or just lurk. > Schema.org also provides existing schemas for actions associated with items > (https://schema.org/docs/actions.html), The "actions" space has been a difficult and challenging one. Google's (abandoned) "web intents" was one such effort. Currently the IndieWeb community is pursuing Web Actions (and has them working across sites) http://indiewebcamp.com/webactions There's likely potential there to connect webactions to be part of the format of the post/page to be parsed, consumed, re-used. Again, this is something I'll take to the #microformats community and we can see what people there come up with. > although examples are only given in > JSON-LD syntax. Schema.org is just a vocabulary and Tantek tells me it's > theoretically possible to express this vocabulary in Microformats syntax > too - it's possible to create new vendor prefixed types, or suggest new > standard types to be added to the Microformats wiki. Yes. > This would be required > because Microformats does not have a big enough existing vocabulary for > Gaia's needs. Per analysis above, there are two objects Song and RadioStation, and approach to "actions" needed. > Microdata, RDFa and JSON-LD use URL namespaces so are > extensible by design with a non-centralised vocabulary (this is seen as a > strength by some, as a weakness by others). Indeed. In CSS, -vendor- prefixes have had some success, and some downsides as well. Ironically, the ease-of-use of -vendor- prefixes over URL based namespaces led to perhaps more popularity than desired for vendor specific things (witness our -webkit- compat headaches), whereas the web seems to be (mostly?) surviving URL based namespace pollution. microformats2 takes the CSS approach of non-centralized -vendor- prefixes for the same ease-of-use reasons as CSS. > There is resistance to implementing a full Microdata or RDFa parser in > Gecko due to its complexity. It's not just that, but the experience (that any Mozilla engineer who was here before Firefox will relay, e.g ping jst sometime if you want to hear horror stories) of RDF, triple-stores etc. being a disaster for Mozilla, performance, etc. and taking ages to undo. > JSON-LD is more self-contained by design (for > better or worse) Note: this is purely *in theory*. In practice, if you're actually bothering with JSON-LD (not just plain JSON), and using or depending on anything triples related, you're likely to run into similar problems and objections. It's a very high risk path. If you're ignoring all the "LD"ness of JSON-LD, then just admit that upfront and use some one-off JSON. > Microformats is possibly > less Gecko work to implement than Microdata or RDFa, but more than JSON-LD. There are multiple open source interoperable microformats parsers, including in Javascript (node-compatible even), verified with a test suite. Landing an existing open source modern microformats parser is very much doable, and is something we have been incrementally working towards for some time, in particular with aspirational use-cases for Gaia! (Gordon and Josh worked on this years ago). > *Conclusions* > > My conclusion is that the least required work in Gecko for the highest > return would be: > > 1. *Open Graph* (bug 1178484) - Extending the existing metachange > Browser API event to include all meta tags with a "property" attribute. > This would allow Gaia to add support for all of the Open Graph types, > fulfilling requirement 1. I'd still like to go with Ted's recommendation on this, and minimize exposure, minimize Open Graph implementation/vocab surface etc. for all the reasons we avoid adding backcompat tech debt. > 2. *JSON-LD* (bug 1178491) - Adding a linkeddatachange event to the > Browser API which is dispatched by Gecko whenever it encounters a script > tag with a type of "application/ld+json" (as per the W3C recommendation > [5]), including the JSON content in the payload of the event. This would > allow the Gaia system app to support existing schema.org schemas > (including actions), with the least amount of work in Gecko, and already in > a JSON format it can store directly in the Places database > (DataStore/IndexedDB). > > … > > It's clear that there's not a consensus amongst everyone that JSON-LD is > the best format for Mozilla to promote for structured data on the web going > forward In fact quite the opposite! There *is* a pretty strong engineering consensus, in both this thread, and other threads *against* any use of JSON-LD, or anything Linked Data or otherwise rebranded RDF / Semantic Web, and for good reason. Ted's email provides the highlevel outline for why. Annevk debunked the assumption of "W3C Spec = must be good". > I would suggest that they go ahead with implementing Microformats in Gecko > and we can use it in Gaia when it's ready. Suggestion accepted. It's time. We have been supporting microformats to some degree or other in Firefox for years in incrementally since Firefox 3. In the meanwhile, microformats matured, indexed by search engines since 2006 (rich snippets since 2009), microformats2 was designed (based on lessons learned from microdata and RDFa, focused by real-world use-cases), developed, implemented, tested, and shipped on thousands of sites (mostly IndieWeb based, withknown.com etc.), and consumed by various indie readers and other sites. There are now several microformats2 open source parsing libraries across languages, deployed live and testable: http://microformats.org/wiki/microformats2#Parsers > I would recommend exposing it to > Gaia via a getStructuredData() method on the Browser API (bug 1169634) > which returns a Promise which resolves with the canonical JSON > representation of any Microformats data present in a document. The update to this bug makes sense: * To support getStructuredData, using the canonical JSON representation of microformats on the page. This will allow us to move forward with a much simpler JSON based model, and hopefully avoid all the LinkedData / triples pitfalls. > This will > then allow us to add the necessary support in the Gaia system app. (When > implementing this it might also make sense to hook it up to the Open Graph > and JSON-LD support to create a single API with support for multiple > formats). From all evidence so far, the simpler canonical JSON should suffice for this. I can also pose the question to the #microformats community of how to reinterpret Open Graph "og:" meta tag markup as what they would mean in canonical microformats JSON, as members of that community have already been working on parsing *both* OG: and microformats, for all the similar pragmatic reasons we've discussed here. > In the mean time, given our tight schedule, I would be grateful if we could > not to block the Gaia work on the implementation of Microformats or any > more discussion on which formats we'd like to promote going forward. From my understanding in discussing with Ted, there's nothing being blocked on the pragmatic minimal implementation support of OG: meta tags. Worst comes to worse we could even make up our own "og:moz:…" vendor specific markup should we need to for Gaia (unexposed to web platform). I'm happy to sync-up with Ted to make sure that we continue to not block the Gaia work. > Thanks for everyone's input on this so far, I hope we can now get to work. Thanks to you Ben for continuing to pursue and iteratively analyze the various options, and providing the data, with continued critical (re)analysis. Q3 has begun, let's get to work. Tantek _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform