On Thu, Apr 25, 2019 at 1:28 AM Melik-Adamyan, Areg
<areg.melik-adam...@intel.com> wrote:
>
> Hi,
>
> We are talking about the same thing actually, but you do not want to use 3rd 
> party tools.
> For 3 and 4 - you run the first version store in 1.out, then second version 
> store in 2.out and run compare tool. Your tool does two steps automatically, 
> that is fine.
>
> > Various reason why I think the archery route is preferred over a mix of
> > scattered scripts, CI pipeline steps and random go binaries.
> >
> > 1. It is OS agnostic since it's written in python, and depends on cmake + 
> > git
> >    installed in PATH.
> [>] So is Google Benchmark, cmake and git, no?
> >
> > 2. Self contained in arrow's repository, no need to manually install 
> > external
> >    dependencies (go toolchain, then compile & install benchstat, benchcmp).
> >    Assuming python3 and pip are provided, which we already need for pyarrow.
> [>] Those operations are lighter than 'conda install', but ok, point taken.
> >
> > 3. Written as a library where the command line is a frontend. This makes it
> >    very easy to test and re-use. It also opens the door to clearing
> >    technical debt we've accumulated in `dev/`. This is not relevant for the
> >    benchmark sub-project, but still relevant for arrow developers in 
> > general.
> [>] Agree, but out of the scope of the benchmarking.
> >
> > 4. Benchmark framework agnostic. This does not depend on google's
> > benchmark and
> >    go benchmark output format. It does support it, but does not mandate it.
> >    Will be key to support Python (ASV) and other languages.
> [>] I do not understand what do you mean by other languages testing: core 
> performance will come from the core C++ libraries, everything else will be 
> wrappers around. So if I understand correctly by testing languages, we are 
> testing wrappers?
> >

C++ is only one native Arrow implementation. There are 5 others: Java,
JavaScript, Rust, Go, and C#. There are 5 binding-centric languages:
C, Ruby, R, Python, and MATLAB. On a 2 year horizon I would expect to
see some other languages here: Swift and Julia are a couple likely
ones.

Some benchmarks involve downstream languages -- e.g. Python, Ruby, and
R have performance critical integrations whose behavior needs
continuous monitoring. As an example, the performance of conversions
between pandas and Arrow columnar format is very important for
downstream use cases (see e.g. [1]).

[1]: http://arrow.apache.org/blog/2019/02/05/python-string-memory-0.12/

> > 5. Shell scripts tend to grow un-maintenance. I say this as someone who 
> > abuse
> >    them. (archery implementation is derived from a local bash script).
> [>] There is no shell script in the first approach, but I totally share your 
> pain.
> >
> > 6. It is not orchestrated by a complex CI pipeline (which effectively is a
> >    non-portable hardly reproducible script). It is self contained, can run
> >    within a CI or on a local machine. This is very convenient for local 
> > testing
> >    and debugging. I loathe waiting for the CI, especially when iterating in
> > development.
> [>] What you are really saying, is that Archery *is the CI* that you ship 
> with the source code. It does all the same things. I am not against, but it 
> will create a maintenance burden, and in a couple of years, you'll discover 
> that it is outdated :)
>

It seems that we disagree about the scope of work involved in this
project, which is OK. We aren't asking you to do any extra work
yourself, but having a scalable (in project complexity and process)
and configurable long-term solution to benchmarking is important to
us, and we (myself and my colleagues) are committing ourselves to
building and maintaining it.

Thanks
Wes

> > You can get a sneak peek at of automation working here
> > http://nashville.ursalabs.org:4100/#/builders/16/builds/129,
> > note that this doesn't use dedicated hardware yet.
> [>] Nice, so, when we can start using it, and I guess nobody will object that 
> perf.zaiteki.tech is not competing with Archery. So how can I help you to 
> proceed faster? I can create and host DB from 5071 in the cloud if you want.
>
> -Areg.

Reply via email to