On Sun, May 8, 2016 at 9:15 AM, Simon Pieters <sim...@opera.com> wrote:
> httparchive (494,168 pages):
>
> SELECT COUNT(*) AS num, REGEXP_EXTRACT(LOWER(body),
> r'<track\s(?:[^>]+\s)?kind\s*=\s*([a-z]+|["\'][^"\']+["\'])') as match
> FROM [httparchive:har.2016_04_15_chrome_requests_bodies]
> GROUP BY match
> ORDER BY num DESC
>
> Row     num     match
> 1       17616286        null
> 2       523     "subtitles"
> 3       108     "captions"
> 4       58      "metadata"
> 5       6       "subtitle"
> 6       6       'subtitles'
> 7       5       "thumbnails"
> 8       3       'captions'
> 9       1       "dotsub"
> 10      1       "${assettracktype}"
> 11      1       'subtitle'
>
>
> We could add "subtitle" as a new keyword if that turns out to be a problem.

Thanks for the data!  Looks like we're talking on the order of 0.001%
of pages, so I think this can be safely landed.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to