Re: [Discuss] Benchmarking infrastructure

Wes McKinney Fri, 29 Mar 2019 09:24:44 -0700

I see that Francois is doing some work related to this in

https://github.com/apache/arrow/pull/4077


On Fri, Mar 29, 2019 at 11:20 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> hi,
>
> After doing a little research I took a closer look at the shell scripts in
>
> https://github.com/apache/arrow/tree/master/dev/benchmarking
>
> While these may work for importing the gbenchmark data, the general
> approach seems inflexible to me, and I would recommend rewriting them
> as Python programs to enable better extensibility, finer grained
> control (e.g. to refine and manipulate the output to be "nicer"), and
> make it easier to support importing output from different kinds of
> benchmark output.
>
> - Wes
>
> On Fri, Mar 29, 2019 at 10:06 AM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > hi Areg,
> >
> > On Fri, Mar 29, 2019 at 1:25 AM Melik-Adamyan, Areg
> > <areg.melik-adam...@intel.com> wrote:
> > >
> > > Back to the benchmarking per commit.
> > >
> > > So currently I have fired a community TeamCity Edition here 
> > > http://arrow-publi-1wwtu5dnaytn9-2060566241.us-east-1.elb.amazonaws.com 
> > > and dedicated pool of two Skylake bare metal machines (Intel(R) Core(TM) 
> > > i7-6700 CPU @ 3.40GHz) This can go to up to 4 if needed.
> > > Then the machines are prepared for benchmarking in the following way:
> > > - In BIOS/Setup power saving features are disabled
> > > - Machines are locked for access using pam_access
> > > - Max frequency is set through  cpupower  and in /etc/sysconfig/cpupower
> > > - All services that are not needed switched off: > uptime 23:15:17 up 26 
> > > days, 23:24,  1 user,  load average: 0.00, 0.00, 0.00
> > > - Transparent huge pages set on demand cat 
> > > /sys/kernel/mm/transparent_hugepage/enabled
> > > always [madvise] never
> > > - audit control switched off auditctl -e 0
> > > - Memory clean added to launch scripts echo 3 > /proc/sys/vm/drop_caches
> > > - pstate=disable added to the kernel config
> > >
> > > This config is giving relatively clean and not noisy machine.
> > > Commits in master trigger build and ctest -L benchmarks. Output is parsed.
> >
> > When you say "output is parsed", how is that exactly? We don't have
> > any scripts in the repository to do this yet (I have some comments on
> > this below). We also have to collect machine information and insert
> > that into the database. From my perspective we have quite a bit of
> > engineering work on this topic ("benchmark execution and data
> > collection") to do.
> >
> > My team and I have some physical hardware (including an Aarch64 Jetson
> > TX2 machine, might be interesting to see what the ARM64 results look
> > like) where we'd like to run benchmarks and upload the results also,
> > so we need to write some documentation about how to add a new machine
> > and set up a cron job of some kind
> >
> > >
> > > What is missing:
> > > * Where should our Codespeed database reside? I can fire-up a VM and put 
> > > it there, or if you have other preferences let's discuss.
> >
> > Since this isn't ASF-owned infrastructure, it can go anywhere. It
> > would be nice to make backups publicly available
> >
> > > * What address should it have?
> >
> > The address can be anything really
> >
> > > * How to make it available to all developers? Do we want to integrate 
> > > into CI or not?
> >
> > I'd like to eventually have a bot that we can ask to run a benchmark
> > comparison versus master. Reporting on all PRs automatically might be
> > quite a bit of work (and load on the machines)
> >
> > > * What is the standard benchmark output? I suppose Googlebench, but lets 
> > > state that.
> >
> > I thought the idea (based on our past e-mail discussions) was that we
> > would implement benchmark collectors (as programs in the Arrow git
> > repository) for each benchmarking framework, starting with gbenchmark
> > and expanding to include ASV (for Python) and then others
> >
> > > * My interest is the C++ benchmarks only for now. Do we need to track all 
> > > benchmarks?
> >
> > Yes I think we want to be able to run the Python benchmarks too and
> > insert that data. Other languages can implement a benchmark collector
> > to arrange their benchmark data according to the database schema
> >
> > > * What is the process of adding benchmarks?
> >
> > Normal pull requests (see all the C++ programs that end in
> > "-benchmark.cc"). The benchmark collector / insertion scripts may need
> > to recognize when a benchmark has been run for the first time (I
> > haven't looked closely enough at the schema to see if there are any
> > primary keys associated with a particular benchmark name)
> >
> > >
> > > Anything else for short term?
> >
> > It seems like writing the benchmark collector script that runs the
> > benchmarks, collects machine information, and inserts data into an
> > instance of the database is the next milestone. Until that's done it
> > seems difficult to do much else
> >
> > >
> > > -Areg.
> > >
> > >
> > >
> > >
> > >

Re: [Discuss] Benchmarking infrastructure

Reply via email to