Re: Benchmarking dashboard proposal

Tanya Schlusser Wed, 20 Feb 2019 10:35:03 -0800

>
> Side question: is it expected to be able to connect to the DB directly
> from the outside?  I don't have any clue about the possible security
> implications.



This is do-able by creating different database accounts. Also, Wes's
solution was to back up the database periodically (daily?) to protect
against accidents. The current setup has a root user (full permission),
`arrow_anonymous` user (select + insert only), and `arrow_admin` (select,
insert, update, delete).

On Wed, Feb 20, 2019 at 12:19 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Side question: is it expected to be able to connect to the DB directly
> from the outside?  I don't have any clue about the possible security
> implications.
>
> Regards
>
> Antoine.
>
>
>
> Le 20/02/2019 à 18:55, Melik-Adamyan, Areg a écrit :
> > There is a lot of discussion going in the PR ARROW-4313 itself; I would
> like to bring some of the high-level questions here to discuss. First of
> all many thanks to Tanya for the work you are doing.
> > Related to the dashboard intrinsics, I would like to set some scope and
> stick to that so we would not waste any job and get maximum efficiency from
> the work we are doing on the dashboard.
> > One thing that IMHO we are missing is against which requirements the
> work (DDL) is being done and in which scope? For me there are several
> things:
> > 1. We want continuous *validated* performance tracking against checkins
> to catch performance regressions and progressions. Validated means that the
> running environment is isolated enough so the stddev (assuming the
> distribution is normal) is as close to 0 as possible. It means both
> hardware and software should be fixed and not changeable to have only one
> variable to measure.
> > 2. The unit-tests framework (google/benchmark) allows to effectively
> report in textual format the needed data on benchmark with preamble
> containing information about the machine on which the benchmarks are run.
> > 3. So with environments set and regular runs you have all the artifacts,
> though not in a very comprehensible format. So the reason to set a
> dashboard is to allow to consume data and be able to track performance of
> various parts on a historical perspective and much more nicely with
> visualizations.
> > And here are the scope restrictions I have in mind:
> > - Disallow to enter data to the central repo any single benchmarks run,
> as they do not mean much in the case of continuous and statistically
> relevant measurements. What information you will get if someone reports
> some single run? You do not know how clean it was done, and more
> importantly is it possible to reproduce elsewhere. That is why even if it
> is better, worse or the same you cannot compare with the data already in
> the DB.
> > - Mandate the contributors to have dedicated environment for
> measurements. Otherwise they can use the TeamCity to run and parse data and
> publish on their site. Data that enters Arrow performance DB becomes Arrow
> community owned data. And it becomes community's job to answer why certain
> things are better or worse.
> > -  Because the numbers and flavors for CPU/GPU/accelerators are huge we
> cannot satisfy all the needs upfront and create DB that covers all the
> possible variants. I think we should have simple CPU and GPU configs now,
> even if they will not be perfect. By simple I mean basic brand string. That
> should be enough. Having all the detailed info in the DB does not make
> sense, as my experience is telling, you never use them, you use the
> CPUID/brandname to get the info needed.
> > - Scope and reqs will change during the time and going huge now will
> make things complicated later. So I think it will be beneficial to have
> something quick up and running, get better understanding of our needs and
> gaps, and go from there.
> > The needed infra is already up on AWS, so as soon as we resolve DNS and
> key exchange issues we can launch.
> >
> > -Areg.
> >
> > -----Original Message-----
> > From: Tanya Schlusser [mailto:ta...@tickel.net]
> > Sent: Thursday, February 7, 2019 4:40 PM
> > To: dev@arrow.apache.org
> > Subject: Re: Benchmarking dashboard proposal
> >
> > Late, but there's a PR now with first-draft DDL (
> https://github.com/apache/arrow/pull/3586).
> > Happy to receive any feedback!
> >
> > I tried to think about how people would submit benchmarks, and added a
> Postgraphile container for http-via-GraphQL.
> > If others have strong opinions on the data modeling please speak up
> because I'm more a database user than a designer.
> >
> > I can also help with benchmarking work in R/Python given guidance/a
> roadmap/examples from someone else.
> >
> > Best,
> > Tanya
> >
> > On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser <ta...@tickel.net>
> wrote:
> >
> >> I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL
> >> along with a README in a new directory `arrow/dev/benchmarking` unless
> >> directed otherwise.
> >>
> >> A "C++ Benchmark Collector" script would be super. I expect some
> >> back-and-forth on this to identify naïve assumptions in the data model.
> >>
> >> Attempting to submit actual benchmarks is how to get a handle on that.
> >> I recognize I'm blocking downstream work. Better to get an initial PR
> >> and some discussion going.
> >>
> >> Best,
> >> Tanya
> >>
> >> On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >>
> >>> hi folks,
> >>>
> >>> I'm curious where we currently stand on this project. I see the
> >>> discussion in https://issues.apache.org/jira/browse/ARROW-4313 --
> >>> would the next step be to have a pull request with .sql files
> >>> containing the DDL required to create the schema in PostgreSQL?
> >>>
> >>> I could volunteer to write the "C++ Benchmark Collector" script that
> >>> will run all the benchmarks on Linux and collect their data to be
> >>> inserted into the database.
> >>>
> >>> Thanks
> >>> Wes
> >>>
> >>> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net>
> >>> wrote:
> >>>>
> >>>> I don't want to be the bottleneck and have posted an initial draft
> >>>> data model in the JIRA issue
> >>> https://issues.apache.org/jira/browse/ARROW-4313
> >>>>
> >>>> It should not be a problem to get content into a form that would be
> >>>> acceptable for either a static site like ASV (via CORS queries to a
> >>>> GraphQL/REST interface) or a codespeed-style site (via a separate
> >>>> schema organized for Django)
> >>>>
> >>>> I don't think I'm experienced enough to actually write any
> >>>> benchmarks though, so all I can contribute is backend work for this
> task.
> >>>>
> >>>> Best,
> >>>> Tanya
> >>>>
> >>>> On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <wesmck...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> hi folks,
> >>>>>
> >>>>> I'd like to propose some kind of timeline for getting a first
> >>>>> iteration of a benchmark database developed and live, with
> >>>>> scripts to enable one or more initial agents to start adding new
> >>>>> data on a daily / per-commit basis. I have at least 3 physical
> >>>>> machines where I could immediately set up cron jobs to start
> >>>>> adding new data, and I could attempt to backfill data as far back as
> possible.
> >>>>>
> >>>>> Personally, I would like to see this done by the end of February
> >>>>> if not sooner -- if we don't have the volunteers to push the work
> >>>>> to completion by then please let me know as I will rearrange my
> >>>>> priorities to make sure that it happens. Does that sounds reasonable?
> >>>>>
> >>>>> Please let me know if this plan sounds reasonable:
> >>>>>
> >>>>> * Set up a hosted PostgreSQL instance, configure backups
> >>>>> * Propose and adopt a database schema for storing benchmark
> >>>>> results
> >>>>> * For C++, write script (or Dockerfile) to execute all
> >>>>> google-benchmarks, output results to JSON, then adapter script
> >>>>> (Python) to ingest into database
> >>>>> * For Python, similar script that invokes ASV, then inserts ASV
> >>>>> results into benchmark database
> >>>>>
> >>>>> This seems to be a pre-requisite for having a front-end to
> >>>>> visualize the results, but the dashboard/front end can hopefully
> >>>>> be implemented in such a way that the details of the benchmark
> >>>>> database are not too tightly coupled
> >>>>>
> >>>>> (Do we have any other benchmarks in the project that would need
> >>>>> to be inserted initially?)
> >>>>>
> >>>>> Related work to trigger benchmarks on agents when new commits
> >>>>> land in master can happen concurrently -- one task need not block
> >>>>> the other
> >>>>>
> >>>>> Thanks
> >>>>> Wes
> >>>>>
> >>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney
> >>>>> <wesmck...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>> Sorry, copy-paste failure:
> >>>>> https://issues.apache.org/jira/browse/ARROW-4313
> >>>>>>
> >>>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney
> >>>>>> <wesmck...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> I don't think there is one but I just created
> >>>>>>>
> >>>>>
> >>> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c52
> >>> 91a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
> >>>>>>>
> >>>>>>> On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <
> >>> ta...@tickel.net>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Areg,
> >>>>>>>>
> >>>>>>>> If you'd like help, I volunteer! No experience benchmarking
> >>>>>>>> but
> >>> tons
> >>>>>>>> experience databasing—I can mock the backend (database +
> >>>>>>>> http)
> >>> as a
> >>>>>>>> starting point for discussion if this is the way people
> >>>>>>>> want to
> >>> go.
> >>>>>>>>
> >>>>>>>> Is there a Jira ticket for this that i can jump into?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
> >>> wesmck...@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> hi Areg,
> >>>>>>>>>
> >>>>>>>>> This sounds great -- we've discussed building a more
> >>> full-featured
> >>>>>>>>> benchmark automation system in the past but nothing has
> >>>>>>>>> been
> >>>>> developed
> >>>>>>>>> yet.
> >>>>>>>>>
> >>>>>>>>> Your proposal about the details sounds OK; the single
> >>>>>>>>> most
> >>>>> important
> >>>>>>>>> thing to me is that we build and maintain a very general
> >>> purpose
> >>>>>>>>> database schema for building the historical benchmark
> >>>>>>>>> database
> >>>>>>>>>
> >>>>>>>>> The benchmark database should keep track of:
> >>>>>>>>>
> >>>>>>>>> * Timestamp of benchmark run
> >>>>>>>>> * Git commit hash of codebase
> >>>>>>>>> * Machine unique name (sort of the "user id")
> >>>>>>>>> * CPU identification for machine, and clock frequency (in
> >>> case of
> >>>>>>>>> overclocking)
> >>>>>>>>> * CPU cache sizes (L1/L2/L3)
> >>>>>>>>> * Whether or not CPU throttling is enabled (if it can be
> >>> easily
> >>>>> determined)
> >>>>>>>>> * RAM size
> >>>>>>>>> * GPU identification (if any)
> >>>>>>>>> * Benchmark unique name
> >>>>>>>>> * Programming language(s) associated with benchmark (e.g.
> >>>>>>>>> a
> >>>>> benchmark
> >>>>>>>>> may involve both C++ and Python)
> >>>>>>>>> * Benchmark time, plus mean and standard deviation if
> >>> available,
> >>>>> else NULL
> >>>>>>>>>
> >>>>>>>>> (maybe some other things)
> >>>>>>>>>
> >>>>>>>>> I would rather not be locked into the internal database
> >>> schema of a
> >>>>>>>>> particular benchmarking tool. So people in the community
> >>>>>>>>> can
> >>> just
> >>>>> run
> >>>>>>>>> SQL queries against the database and use the data however
> >>>>>>>>> they
> >>>>> like.
> >>>>>>>>> We'll just have to be careful that people don't DROP
> >>>>>>>>> TABLE or
> >>>>> DELETE
> >>>>>>>>> (but we should have daily backups so we can recover from
> >>>>>>>>> such
> >>>>> cases)
> >>>>>>>>>
> >>>>>>>>> So while we may make use of TeamCity to schedule the runs
> >>>>>>>>> on
> >>> the
> >>>>> cloud
> >>>>>>>>> and physical hardware, we should also provide a path for
> >>>>>>>>> other
> >>>>> people
> >>>>>>>>> in the community to add data to the benchmark database on
> >>> their
> >>>>>>>>> hardware on an ad hoc basis. For example, I have several
> >>> machines
> >>>>> in
> >>>>>>>>> my home on all operating systems (Windows / macOS /
> >>>>>>>>> Linux,
> >>> and soon
> >>>>>>>>> also ARM64) and I'd like to set up scheduled tasks / cron
> >>> jobs to
> >>>>>>>>> report in to the database at least on a daily basis.
> >>>>>>>>>
> >>>>>>>>> Ideally the benchmark database would just be a PostgreSQL
> >>> server
> >>>>> with
> >>>>>>>>> a schema we write down and keep backed up etc. Hosted
> >>> PostgreSQL is
> >>>>>>>>> inexpensive ($200+ per year depending on size of
> >>>>>>>>> instance;
> >>> this
> >>>>>>>>> probably doesn't need to be a crazy big machine)
> >>>>>>>>>
> >>>>>>>>> I suspect there will be a manageable amount of
> >>>>>>>>> development
> >>>>> involved to
> >>>>>>>>> glue each of the benchmarking frameworks together with
> >>>>>>>>> the
> >>>>> benchmark
> >>>>>>>>> database. This can also handle querying the operating
> >>>>>>>>> system
> >>> for
> >>>>> the
> >>>>>>>>> system information listed above
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Wes
> >>>>>>>>>
> >>>>>>>>> On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> >>>>>>>>> <areg.melik-adam...@intel.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hello,
> >>>>>>>>>>
> >>>>>>>>>> I want to restart/attach to the discussions for
> >>>>>>>>>> creating
> >>> Arrow
> >>>>>>>>> benchmarking dashboard. I want to propose performance
> >>> benchmark
> >>>>> run per
> >>>>>>>>> commit to track the changes.
> >>>>>>>>>> The proposal includes building infrastructure for
> >>>>>>>>>> per-commit
> >>>>> tracking
> >>>>>>>>> comprising of the following parts:
> >>>>>>>>>> - Hosted JetBrains for OSS
> >>>>>>>>>> https://teamcity.jetbrains.com/
> >>> as a
> >>>>> build
> >>>>>>>>> system
> >>>>>>>>>> - Agents running in cloud both VM/container
> >>>>>>>>>> (DigitalOcean,
> >>> or
> >>>>> others)
> >>>>>>>>> and bare-metal (Packet.net/AWS) and on-premise(Nvidia
> >>>>>>>>> boxes?)
> >>>>>>>>>> - JFrog artifactory storage and management for OSS
> >>>>>>>>>> projects
> >>>>>>>>> https://jfrog.com/open-source/#artifactory2
> >>>>>>>>>> - Codespeed as a frontend
> >>> https://github.com/tobami/codespeed
> >>>>>>>>>>
> >>>>>>>>>> I am volunteering to build such system (if needed more
> >>>>>>>>>> Intel
> >>>>> folks will
> >>>>>>>>> be involved) so we can start tracking performance on
> >>>>>>>>> various
> >>>>> platforms and
> >>>>>>>>> understand how changes affect it.
> >>>>>>>>>>
> >>>>>>>>>> Please, let me know your thoughts!
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> -Areg.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> >>
>

Re: Benchmarking dashboard proposal

Reply via email to