On June 29, 2015 at 7:07:33 AM, Michael Henretty (mhenre...@mozilla.com) wrote:
> We will definitely start with the simple open graph stuff that Ted
> mentioned ("og:title", "og:type", "og:url", "og:image", "og:description")
> since they are so widely used. And yes, even these simple ones are
> problematic. For instance, when navigating between dailymotion videos they
> keep the current meta tags, and just updates the html body content. In
> fact, single-page-apps in general are hard here. Also, on the mobile
> version of youtube they leave out og tags entirely, probably as a
> performance optimization. Turns out, many sites do this. So in 2.5 we will
> have to account for all of this and the solution might not be pretty.

ok, it's good to see you've already started to encounter the issues. 

> I think Microformats addresses the aforementioned problems.

They might, though they can also change from under you in fun ways, or be 
invalid/incorrect. 

> But if youtube,
> wikipedia, pinterest, twitter, facebook, tumblr, etc don't use them widely
> what is the point of supporting them in a moz-internal API? Let's be
> pragmatic and start with og. What's the next biggest win for us? Is the
> data clear? Ben seems to think JSON-LD [1], does anyone have data to the
> contrary?

I don't have data, just some graying hair and warnings from the distant past 
[1]. You've all seen already how controversial these formats are, and hopefully 
you understand why now (expecting validity/sanity from the web is a non-starter 
- it's the fallacy of the semantic web, and why we mockingly call it the 
"pedantic web" and recoil in horror and lash out with rage at the mere mention 
of it). 

So flip the problem a bit: what you actually want is just simple data that can 
be transformed into a card, right? basically, we scrape some text values from a 
HTML page and you just put it into a different HTML document: the card. 

As long as you don't expect validity of that data (i.e., you don't expect a 
standards conforming JSON-LD, RDFa, microdata, microformat, whatever parser*) 
then that frees us to build some kind of HTML Scraper that is actually built 
for purpose (one that is fault tolerant, and basically doesn't give a crap what 
the RDFa or JSON-LD spec says, but is designed to aggressively find the data 
you need to build nice cards). This is also why I suggest you start with og: 
data, because it basically takes the same approach: it doesn't give a crap what 
the RDFa spec says (and neither do developers that add it to their pages, as 
I'm sure you've already seen), it just defines some things by using some HTML 
elements that kinda-sorta looks like RDFa. However, it comes with a ton of 
problems which you will have a great time trying to deal with as you build the 
pinned-sites feature. The same with Twitter's card format. 

At the end of the day, what Gecko should be passing back is a simple JS object 
that contains:

{
og: {... name/value pairs...}
twitter: {... name/value pairs...}
other_because_we_can_add_new_things_as_needed_yay: {... name/value pairs...}
}

If we are not going to be doing any semantic inferencing on that data or 
actually doing the "linked data" part, then we don't need a JSON-LD 
representation of it. We just need a fairly simple structure from which FxOS 
can build different cards. That avoids talk of supporting controversial formats 
like JSON-LD and RDFa, while actually supporting web content: in the sense 
that, "we are just pulling this 'og' meta stuff from the page, we don't care 
what it is".  

My 2c,

[1] Warning from 2003, that the same things happened with RSS. They had to 
abandon XML:
http://www.xml.com/pub/a/2003/01/22/dive-into-xml.html   

"I know, I know, this is how HTML got to be "tag soup": browsers that never 
complained. Now the same thing is happening in the RSS world because the same 
social dynamics apply. End users who can't even spell "XML" certainly don't 
care about silly little formatting rules; they just want to follow their 
favorite sites in their news aggregator. When 10% of the world's RSS feeds are 
not well-formed -- including some high-profile feeds that thousands of people 
want to read -- the ability to parse ill-formed feeds becomes a competitive 
advantage. (And if you think the same thing won't happen when RDF and the 
Semantic Web go mainstream, you're deluding yourself. The same social dynamics 
apply. Boy, is that going to be messy.)"



_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to