Re: [Discuss] Benchmarking infrastructure

Francois Saint-Jacques Wed, 24 Apr 2019 05:02:35 -0700

No worries,

I'll update the PR to refactor this cli function in a re-usable function.
Luckily it's small enough and not too much logic is leaking.


On Tue, Apr 23, 2019 at 4:24 PM Wes McKinney <[email protected]> wrote:

> hi Francois,
>
> This sounds like good progress.
>
> For any tool consumable through a CLI/command-line interface my
> recommendation is to ensure that the software is usable as a library
> equally as well as via a CLI interface.
>
> In this patch I see
>
>
> https://github.com/apache/arrow/pull/4141/files#diff-7a8805436a6884ddf74fe3eaec697e71R212
>
> Please be wary of making business logic exclusively available through
> a CLI; it makes composability and reuse harder (= requiring
> refactoring that might have been avoidable). AFAICT this is still a
> concern with Crossbow; there is task business logic that can only be
> accessed by using the command line interface
>
> - Wes
>
> On Tue, Apr 16, 2019 at 11:44 AM Francois Saint-Jacques
> <[email protected]> wrote:
> >
> > Hello,
> >
> > A small status update, I recently implemented archery [1] a tool for
> Arrow
> > benchmarks comparison [2]. The documentation ([3] and [4]) is in the
> > pull-request. The primary goal is to compare 2 commits (and/or build
> > directories) for performance regressions. For now, it supports C++
> > benchmarks.
> > This is accessible via the command `archery benchmark diff`. The end
> result
> > is
> > a one comparison per line, with an regression indicator.
> >
> > Currently, there is no facility to perform a single "run", e.g. run
> > benchmarks
> > in the current workspace without comparing to a previous version. This
> was
> > initially implemented in [5] but depended heavily on ctest (with no
> control
> > on
> > execution). Once [1] is merged, I'll re-implement single run (ARROW-5071)
> > this
> > in term of archery, since it already execute and parses C++ benchmarks.
> >
> > The next goal is to be able to push the results into an upstream
> database,
> > be
> > it the one defined in dev/benchmarking, or codespeed as Areg proposed.
> The
> > steps required for this:
> > - ARROW-5071: Run and format benchmark results for upstream consumption
> >   (ideally under the `archery benchmark run` sub-command)
> > - ARROW-5175: Make a list of benchmarks to include in regression checks
> > - ARROW-4716: Collect machine and benchmarks context
> > - ARROW-TBD: Push benchmark results to upstream database
> >
> > In parallel, with ARROW-4827, Krisztian and I are working on 2 related
> > buildbot
> > sub-projects enabling some regression detection:
> > - Triggering on-demand benchmark comparison via comments in PR
> >    (as proposed by Wes)
> > - Regression check on master merge (without database support)
> >
> > François
> >
> > P.S.
> > A collateral of this PR is that archery is a modular python library and
> can
> > be
> > used for other purposes, e.g. it could centralize orphaned scripts in
> dev/,
> > e.g. linting, release, and merge since it offers utilities to handle
> arrow
> > sources, git, cmake and exposes a usable CLI interface (with
> documentation).
> >
> > [1] https://github.com/apache/arrow/pull/4141
> > [2] https://jira.apache.org/jira/browse/ARROW-4827
> > [3]
> >
> https://github.com/apache/arrow/blob/512ae64bc074a0b620966131f9338d4a1eed2356/docs/source/developers/benchmarks.rst
> > [4]
> >
> https://github.com/apache/arrow/pull/4141/files#diff-7a8805436a6884ddf74fe3eaec697e71R216
> > [5] https://github.com/apache/arrow/pull/4077
> >
> > On Fri, Mar 29, 2019 at 3:21 PM Melik-Adamyan, Areg <
> > [email protected]> wrote:
> >
> > > >When you say "output is parsed", how is that exactly? We don't have
> any
> > > scripts in the repository to do this yet (I have some comments on this
> > > below). We also have to collect machine information and insert that
> into
> > > the database. From my >perspective we have quite a bit of engineering
> work
> > > on this topic ("benchmark execution and data collection") to do.
> > > Yes I wrote one as a test.  Then it can do POST to the needed endpoint
> the
> > > JSON structure. Everything else will be done in the
> > >
> > > >My team and I have some physical hardware (including an Aarch64 Jetson
> > > TX2 machine, might be interesting to see what the ARM64 results look
> like)
> > > where we'd like to run benchmarks and upload the results also, so we
> need
> > > to write some documentation about how to add a new machine and set up a
> > > cron job of some kind.
> > > If it can run Linux, then we can setup it.
> > >
> > > >I'd like to eventually have a bot that we can ask to run a benchmark
> > > comparison versus master. Reporting on all PRs automatically might be
> quite
> > > a bit of work (and load on the machines)
> > > You should be able to choose the comparison between any two points:
> > > master-PR, master now - master yesterday, etc.
> > >
> > > >I thought the idea (based on our past e-mail discussions) was that we
> > > would implement benchmark collectors (as programs in the Arrow git
> > > repository) for each benchmarking framework, starting with gbenchmark
> and
> > > expanding to include ASV (for Python) and then others
> > > I'll open a PR and happy to put it into Arrow.
> > >
> > > >It seems like writing the benchmark collector script that runs the
> > > benchmarks, collects machine information, and inserts data into an
> instance
> > > of the database is the next milestone. Until that's done it seems
> difficult
> > > to do much else
> > > Ok, will update the Jira 5070 and link the 5071.
> > >
> > > Thanks.
> > >
>

Re: [Discuss] Benchmarking infrastructure

Reply via email to