On June 29, 2015 at 7:07:33 AM, Michael Henretty (mhenre...@mozilla.com) wrote:
> We will definitely start with the simple open graph stuff that Ted
> mentioned ("og:title", "og:type", "og:url", "og:image", "og:description")
> since they are so widely used. And yes, even these simple ones are
> problematic. For instance, when navigating between dailymotion videos they
> keep the current meta tags, and just updates the html body content. In
> fact, single-page-apps in general are hard here. Also, on the mobile
> version of youtube they leave out og tags entirely, probably as a
> performance optimization. Turns out, many sites do this. So in 2.5 we will
> have to account for all of this and the solution might not be pretty.
ok, it's good to see you've already started to encounter the issues.
> I think Microformats addresses the aforementioned problems.
They might, though they can also change from under you in fun ways, or be
invalid/incorrect.
> But if youtube,
> wikipedia, pinterest, twitter, facebook, tumblr, etc don't use them widely
> what is the point of supporting them in a moz-internal API? Let's be
> pragmatic and start with og. What's the next biggest win for us? Is the
> data clear? Ben seems to think JSON-LD [1], does anyone have data to the
> contrary?
I don't have data, just some graying hair and warnings from the distant past
[1]. You've all seen already how controversial these formats are, and hopefully
you understand why now (expecting validity/sanity from the web is a non-starter
- it's the fallacy of the semantic web, and why we mockingly call it the
"pedantic web" and recoil in horror and lash out with rage at the mere mention
of it).
So flip the problem a bit: what you actually want is just simple data that can
be transformed into a card, right? basically, we scrape some text values from a
HTML page and you just put it into a different HTML document: the card.
As long as you don't expect validity of that data (i.e., you don't expect a
standards conforming JSON-LD, RDFa, microdata, microformat, whatever parser*)
then that frees us to build some kind of HTML Scraper that is actually built
for purpose (one that is fault tolerant, and basically doesn't give a crap what
the RDFa or JSON-LD spec says, but is designed to aggressively find the data
you need to build nice cards). This is also why I suggest you start with og:
data, because it basically takes the same approach: it doesn't give a crap what
the RDFa spec says (and neither do developers that add it to their pages, as
I'm sure you've already seen), it just defines some things by using some HTML
elements that kinda-sorta looks like RDFa. However, it comes with a ton of
problems which you will have a great time trying to deal with as you build the
pinned-sites feature. The same with Twitter's card format.
At the end of the day, what Gecko should be passing back is a simple JS object
that contains:
{
og: {... name/value pairs...}
twitter: {... name/value pairs...}
other_because_we_can_add_new_things_as_needed_yay: {... name/value pairs...}
}
If we are not going to be doing any semantic inferencing on that data or
actually doing the "linked data" part, then we don't need a JSON-LD
representation of it. We just need a fairly simple structure from which FxOS
can build different cards. That avoids talk of supporting controversial formats
like JSON-LD and RDFa, while actually supporting web content: in the sense
that, "we are just pulling this 'og' meta stuff from the page, we don't care
what it is".
My 2c,
[1] Warning from 2003, that the same things happened with RSS. They had to
abandon XML:
http://www.xml.com/pub/a/2003/01/22/dive-into-xml.html
"I know, I know, this is how HTML got to be "tag soup": browsers that never
complained. Now the same thing is happening in the RSS world because the same
social dynamics apply. End users who can't even spell "XML" certainly don't
care about silly little formatting rules; they just want to follow their
favorite sites in their news aggregator. When 10% of the world's RSS feeds are
not well-formed -- including some high-profile feeds that thousands of people
want to read -- the ability to parse ill-formed feeds becomes a competitive
advantage. (And if you think the same thing won't happen when RDF and the
Semantic Web go mainstream, you're deluding yourself. The same social dynamics
apply. Boy, is that going to be messy.)"
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform