The data platform and tools teams are working on our core Telemetry system, the data pipeline, providing core datasets and maintaining some central data viewing tools.
To make new work more visible, we intend to provide quarterly updates. What's new in the last few months? On the data collection side, scalars <https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/scalars.html> are now supported through the pipeline, so new flag and count histograms are now disallowed on Desktop in favour of boolean and uint scalars. Event Telemetry <https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/events.html> is now ready for adoption. A general events table <https://sql.telemetry.mozilla.org/queries/3415/source#table> is available, a sync events table coming up and further uses are being looked at. For documentation, we re-worked the guide for adding new Telemetry <https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe> and extended the detailed data collection documentation <https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/> . The prototype for making probe history <https://georgf.github.io/fx-data-explorer/> more discoverable now has daily updates and supports Nightly too. For filing or finding bugs, there is now a new Data Platform and Tools <https://bugzilla.mozilla.org/describecomponents.cgi?product=Data%20Platform%20and%20Tools> product. Note that client-side bugs still go into the separate Toolkit::Telemetry component <https://bugzilla.mozilla.org/enter_bug.cgi?product=Toolkit&component=Telemetry> . The data pipeline work powers results for re:dash <https://sql.telemetry.mozilla.org/> and custom analysis <https://analysis.telemetry.mozilla.org/> among other things. Notable recent work here includes: - Providing efficient lookup of client histories using Hbase <https://python-moztelemetry.readthedocs.io/en/stable/userguide.html#module-moztelemetry.hbase> . - Experimental support for Zeppelin <https://mail.mozilla.org/pipermail/fhr-dev/2017-March/001210.html>, a new notebook type that improves Jupyter. - The Telemetry dashboard <https://telemetry.mozilla.org/new-pipeline/dist.html> is now faster through a dedicated read replica and client-side caching. - The Dataset API now has a select method <http://python-moztelemetry.readthedocs.io/en/stable/userguide.html#moztelemetry.dataset.Dataset.select> to return a subset of fields. - Providing a framework for testable Python ETL jobs <https://github.com/mozilla/python_etl> generated from a template <https://github.com/harterrt/cookiecutter-python-etl>. - Direct-to-parquet <https://mozilla-services.github.io/lua_sandbox_extensions/parquet/sandboxes/heka/output/s3_parquet.html> is in production, making easier to build datasets from incoming pings. The data tools work powers tools that make data analysis more accessible across Mozilla. Updates here are: - For re:dash <https://sql.telemetry.mozilla.org/>, the UI improved to make the dashboard list more accessible. - re:dash query issues were reduced by handling failing queries using exponential back-off. - There is also a python re:dash client <https://github.com/mozilla/redash_client> (h/t to emtwo), allowing programmatic generation of queries and dashboards. - The distribution viewer <https://gauss.telemetry.mozilla.org/> is now live, making distributions of a set of important Firefox metrics available. - The analysis service <http://analysis.telemetry.mozilla.org/> gained features <https://github.com/mozilla/telemetry-analysis-service/blob/master/WHATSNEW.md> like persistent cluster storage and the ability to extend cluster lifetimes. Coming soon For the next few months, interesting projects in the pipeline include: - Work to decrease data latency, by sending the last ping of a Firefox session immediately. We will also start sending timely pings for new users and updates. - Rebooting documentation <https://docs.google.com/presentation/d/1zWbzDCNkM5tzR9K6WgO4vR7fpiuJDP-JBNLrYDsbeUA/edit#slide=id.g1d58c03b5b_0_1>, providing guidance as well as tying existing documentation together. - Start supporting new data collection from add-ons in Telemetry, starting with events. Contact us Please reach out to us with any questions or concerns. You can find us on IRC in #telemetry and #datapipeline. The main mailing list for data topics is fhr-dev <https://mail.mozilla.org/listinfo/fhr-dev>. Bugs can be filed in one of these components <https://wiki.mozilla.org/Telemetry#Filing_Bugs>. You can also find us on Twitter as @MozTelemetry <https://twitter.com/moztelemetry>. _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform