Re: [Discuss] Benchmarking infrastructure

2019-04-24 Thread Francois Saint-Jacques
No worries, I'll update the PR to refactor this cli function in a re-usable function. Luckily it's small enough and not too much logic is leaking. On Tue, Apr 23, 2019 at 4:24 PM Wes McKinney wrote: > hi Francois, > > This sounds like good progress. > > For any tool consumable through a CLI/com

Re: [Discuss] Benchmarking infrastructure

2019-04-23 Thread Wes McKinney
hi Francois, This sounds like good progress. For any tool consumable through a CLI/command-line interface my recommendation is to ensure that the software is usable as a library equally as well as via a CLI interface. In this patch I see https://github.com/apache/arrow/pull/4141/files#diff-7a88

Re: [Discuss] Benchmarking infrastructure

2019-04-16 Thread Francois Saint-Jacques
Hello, A small status update, I recently implemented archery [1] a tool for Arrow benchmarks comparison [2]. The documentation ([3] and [4]) is in the pull-request. The primary goal is to compare 2 commits (and/or build directories) for performance regressions. For now, it supports C++ benchmarks.

Re: [Discuss] Benchmarking infrastructure

2019-04-01 Thread Wes McKinney
hi David -- yes, we definitely should set up cross-host and cross-implementation performance testing (that we can measure and record in the benchmark database) for Flight. As one starting point https://issues.apache.org/jira/browse/ARROW-4566 - Wes On Mon, Apr 1, 2019 at 10:30 AM David Li wrote

Re: [Discuss] Benchmarking infrastructure

2019-04-01 Thread David Li
One more thought, is there interest in running cross-host Flight benchmarks, and perhaps validating them against iperf or a similar tool? It would be great to get latency/throughput numbers and make sure upgrades to gRPC don't tank performance on accident, and it would help argue for why people sho

Re: [Discuss] Benchmarking infrastructure

2019-03-30 Thread Antoine Pitrou
Le 29/03/2019 à 16:06, Wes McKinney a écrit : > >> * How to make it available to all developers? Do we want to integrate into >> CI or not? > > I'd like to eventually have a bot that we can ask to run a benchmark > comparison versus master. Reporting on all PRs automatically might be > quite a

RE: [Discuss] Benchmarking infrastructure

2019-03-29 Thread Melik-Adamyan, Areg
>When you say "output is parsed", how is that exactly? We don't have any >scripts in the repository to do this yet (I have some comments on this below). >We also have to collect machine information and insert that into the database. >From my >perspective we have quite a bit of engineering work o

Re: [Discuss] Benchmarking infrastructure

2019-03-29 Thread Wes McKinney
I see that Francois is doing some work related to this in https://github.com/apache/arrow/pull/4077 On Fri, Mar 29, 2019 at 11:20 AM Wes McKinney wrote: > > hi, > > After doing a little research I took a closer look at the shell scripts in > > https://github.com/apache/arrow/tree/master/dev/benc

Re: [Discuss] Benchmarking infrastructure

2019-03-29 Thread Wes McKinney
hi, After doing a little research I took a closer look at the shell scripts in https://github.com/apache/arrow/tree/master/dev/benchmarking While these may work for importing the gbenchmark data, the general approach seems inflexible to me, and I would recommend rewriting them as Python programs

Re: [Discuss] Benchmarking infrastructure

2019-03-29 Thread Wes McKinney
hi Areg, On Fri, Mar 29, 2019 at 1:25 AM Melik-Adamyan, Areg wrote: > > Back to the benchmarking per commit. > > So currently I have fired a community TeamCity Edition here > http://arrow-publi-1wwtu5dnaytn9-2060566241.us-east-1.elb.amazonaws.com and > dedicated pool of two Skylake bare metal m