Re: duplicate arm test results?

Christophe Lyon via Gcc Mon, 05 Oct 2020 00:27:18 -0700

On Thu, 24 Sep 2020 at 14:12, Christophe Lyon
<christophe.l...@linaro.org> wrote:
>
> On Wed, 23 Sep 2020 at 17:50, Christophe Lyon
> <christophe.l...@linaro.org> wrote:
> >
> > On Wed, 23 Sep 2020 at 17:33, Martin Sebor <mse...@gmail.com> wrote:
> > >
> > > On 9/23/20 2:54 AM, Christophe Lyon wrote:
> > > > On Wed, 23 Sep 2020 at 01:47, Martin Sebor <mse...@gmail.com> wrote:
> > > >>
> > > >> On 9/22/20 9:15 AM, Christophe Lyon wrote:
> > > >>> On Tue, 22 Sep 2020 at 17:02, Martin Sebor <mse...@gmail.com> wrote:
> > > >>>>
> > > >>>> Hi Christophe,
> > > >>>>
> > > >>>> While checking recent test results I noticed many posts with results
> > > >>>> for various flavors of arm that at high level seem like duplicates
> > > >>>> of one another.
> > > >>>>
> > > >>>> For example, the batch below all have the same title, but not all
> > > >>>> of the contents are the same.  The details (such as test failures)
> > > >>>> on some of the pages are different.
> > > >>>>
> > > >>>> Can you help explain the differences?  Is there a way to avoid
> > > >>>> the duplication?
> > > >>>>
> > > >>>
> > > >>> Sure, I am aware that many results look the same...
> > > >>>
> > > >>>
> > > >>> If you look at the top of the report (~line 5), you'll see:
> > > >>> Running target myarm-sim
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m3/-mfloat-abi=soft/-march=armv7-m
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m
> > > >>> Running target 
> > > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m7/-mfloat-abi=hard/-march=armv7e-m+fp.dp
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m4/-mfloat-abi=hard/-march=armv7e-m+fp
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-m33/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
> > > >>> Running target 
> > > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
> > > >>> Running target 
> > > >>> myarm-sim/-mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > > >>>
> > > >>> For all of these, the first line of the report is:
> > > >>> LAST_UPDATED: Tue Sep 22 09:39:18 UTC 2020 (revision
> > > >>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c)
> > > >>> TARGET=arm-none-eabi CPU=default FPU=default MODE=default
> > > >>>
> > > >>> I have other combinations where I override the configure flags, eg:
> > > >>> LAST_UPDATED: Tue Sep 22 11:25:12 UTC 2020 (revision
> > > >>> r9-8928-gb3043e490896ea37cd0273e6e149c3eeb3298720)
> > > >>> TARGET=arm-none-linux-gnueabihf CPU=cortex-a9 FPU=neon-fp16 MODE=thumb
> > > >>>
> > > >>> I tried to see if I could fit something in the subject line, but that
> > > >>> didn't seem convenient (would be too long, and I fear modifying the
> > > >>> awk script....)
> > > >>
> > > >> Without some indication of a difference in the title there's no way
> > > >> to know what result to look at, and checking all of them isn't really
> > > >> practical.  The duplication (and the sheer number of results) also
> > > >> make it more difficult to find results for targets other than arm-*.
> > > >> There are about 13,000 results for September and over 10,000 of those
> > > >> for arm-* alone.  It's good to have data but when there's this much
> > > >> of it, and when the only form of presentation is as a running list,
> > > >> it's too cumbersome to work with.
> > > >>
> > > >
> > > > To help me track & report regressions, I build higher level reports 
> > > > like:
> > > > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
> > > > where it's more obvious what configurations are tested.
> > >
> > > That looks awesome!  The regression indicator looks especially
> > > helpful.  I really wish we had an overview like this for all
> > > results.  I've been thinking about writing a script to scrape
> > > gcc-testresults and format an HTML table kind of like this for
> > > years.  With that, the number of posts sent to the list wouldn't
> > > be a problem (at least not for those using the page).  But it
> > > would require settling on a standard format for the basic
> > > parameters of each run.
> > >
> >
> > It's probably easier to detect regressions and format reports from the
> > .sum files rather than extracting them from the mailing-list.
> > But your approach has the advantage that you can detect regressions
> > from reports sent by other people, not only by you.
> >
> >
> > > >
> > > > Each line of such reports can send a message to gcc-testresults.
> > > >
> > > > I can control when such emails are sent, independently for each line:
> > > > - never
> > > > - for daily bump
> > > > - for each validation
> > > >
> > > > So, I can easily reduce the amount of emails (by disabling them for
> > > > some configurations),
> > > > but that won't make the subject more informative.
> > > > I included the short revision (rXX-YYYY) in the title to make it 
> > > > clearer.
> > > >
> > > > The number of configurations has grown over time because we regularly
> > > > found regressions
> > > > in configurations not tested previously.
> > > >
> > > > I can probably easily add the values of --with-cpu, --with-fpu,
> > > > --with-mode and RUNTESTFLAGS
> > > > as part of the [<branch> revision rXX-YYYY-ZZZZZ] string in the title,
> > > > would that help?
> > > > I fear that's going to make very long subject lines.
> > > >
> > > > It would probably be cleaner to update test_summary such that it adds
> > > > more info as part of $host
> > > > (as in "... testsuite on $host"), so that it grabs useful configure
> > > > parameters and runtestflags, however
> > > > this would be more controversial.
> > >
> > > Until a way to present summaries is available, would grouping
> > > the results of multiple runs in the same "basic configuration"
> > > (for some definition of basic) in the same post work for you?
> > >
> >
> > That's not convenient for me at the moment: each build+make check runs
> > on a different server in a scratch area. It sends its results, saves
> > the logs and everything else is discarded.
> > After that I have a pass to compute regressions once all .sum are
> > available, and that's when I build the HTML reports you saw.
> > It's not terribly hard to reorganize, but it does require some work
> > and probably some disruption. I tend to try to make sure the reports
> > and results are still generated while I make changes to the scripts
> > :-)
> >
> > In the meantime, I am updating the title format following the
> > suggestions from Richard & Jakub. Hopefully this will be in place
> > quite soon, after the currently-running validations have completed.
> >
>
> I have updated my scripts, twice because I discovered that an empty
> DEV-PHASE had a special meaning when constructing the revision
> string....
>
> So a few reports last night had just:
> Results for 11.0.0 (GCC) testsuite on XXXX
> as title, which can now be as long as:
> Results for 8.4.1 [r8-10521 DEFMODE=arm DEFCPU=cortex-a9
> DEFFPU=neon-fp16
> TESTFLAGS=-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-
> abi=hard] (GCC) testsuite on XXX
>
> The first revision number (11.0.0 and 8.4.1 above) come from BASEVER,
> but I didn't try to replace it with an empty as it seems many things
> depend on it.
>
> Does it help you?
>


Hi,

To drastically reduce the traffic on the list, I've switched to
sending results only for "Daily bump" builds on master.

For the time being I'm still sending results for every commit (x
number of tested configurations) on release branches.

Christophe

> Thanks,
>
> Christophe
>
>
> > Thanks,
> >
> > Christophe
> >
> > > Martin
> > >
> > > >
> > > > Christophe
> > > >
> > > >> Martin
> > > >>
> > > >>>
> > > >>> I think HJ generates several "running targets" in the same log, I run
> > > >>> them separately to benefit from the compute farm I have access to.
> > > >>>
> > > >>> Christophe
> > > >>>
> > > >>>> Thanks
> > > >>>> Martin
> > > >>>>
> > > >>>> Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite 
> > > >>>> on
> > > >>>> arm-none-eabi   Christophe LYON
> > > >>
> > >

Re: duplicate arm test results?

Reply via email to