Re: duplicate arm test results?

Christophe Lyon via Gcc Thu, 24 Sep 2020 05:13:41 -0700

On Wed, 23 Sep 2020 at 17:50, Christophe Lyon
<christophe.l...@linaro.org> wrote:
>
> On Wed, 23 Sep 2020 at 17:33, Martin Sebor <mse...@gmail.com> wrote:
> >
> > On 9/23/20 2:54 AM, Christophe Lyon wrote:
> > > On Wed, 23 Sep 2020 at 01:47, Martin Sebor <mse...@gmail.com> wrote:
> > >>
> > >> On 9/22/20 9:15 AM, Christophe Lyon wrote:
> > >>> On Tue, 22 Sep 2020 at 17:02, Martin Sebor <mse...@gmail.com> wrote:
> > >>>>
> > >>>> Hi Christophe,
> > >>>>
> > >>>> While checking recent test results I noticed many posts with results
> > >>>> for various flavors of arm that at high level seem like duplicates
> > >>>> of one another.
> > >>>>
> > >>>> For example, the batch below all have the same title, but not all
> > >>>> of the contents are the same.  The details (such as test failures)
> > >>>> on some of the pages are different.
> > >>>>
> > >>>> Can you help explain the differences?  Is there a way to avoid
> > >>>> the duplication?
> > >>>>
> > >>>
> > >>> Sure, I am aware that many results look the same...
> > >>>
> > >>>
> > >>> If you look at the top of the report (~line 5), you'll see:
> > >>> Running target myarm-sim
> > >>> Running target 
> > >>> myarm-sim/-mthumb/-mcpu=cortex-m3/-mfloat-abi=soft/-march=armv7-m
> > >>> Running target 
> > >>> myarm-sim/-mthumb/-mcpu=cortex-m0/-mfloat-abi=soft/-march=armv6s-m
> > >>> Running target 
> > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > >>> Running target 
> > >>> myarm-sim/-mthumb/-mcpu=cortex-m7/-mfloat-abi=hard/-march=armv7e-m+fp.dp
> > >>> Running target 
> > >>> myarm-sim/-mthumb/-mcpu=cortex-m4/-mfloat-abi=hard/-march=armv7e-m+fp
> > >>> Running target 
> > >>> myarm-sim/-mthumb/-mcpu=cortex-m33/-mfloat-abi=hard/-march=armv8-m.main+fp+dsp
> > >>> Running target 
> > >>> myarm-sim/-mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd
> > >>> Running target 
> > >>> myarm-sim/-mthumb/-mcpu=cortex-a7/-mfloat-abi=hard/-march=armv7ve+simd
> > >>>
> > >>> For all of these, the first line of the report is:
> > >>> LAST_UPDATED: Tue Sep 22 09:39:18 UTC 2020 (revision
> > >>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c)
> > >>> TARGET=arm-none-eabi CPU=default FPU=default MODE=default
> > >>>
> > >>> I have other combinations where I override the configure flags, eg:
> > >>> LAST_UPDATED: Tue Sep 22 11:25:12 UTC 2020 (revision
> > >>> r9-8928-gb3043e490896ea37cd0273e6e149c3eeb3298720)
> > >>> TARGET=arm-none-linux-gnueabihf CPU=cortex-a9 FPU=neon-fp16 MODE=thumb
> > >>>
> > >>> I tried to see if I could fit something in the subject line, but that
> > >>> didn't seem convenient (would be too long, and I fear modifying the
> > >>> awk script....)
> > >>
> > >> Without some indication of a difference in the title there's no way
> > >> to know what result to look at, and checking all of them isn't really
> > >> practical.  The duplication (and the sheer number of results) also
> > >> make it more difficult to find results for targets other than arm-*.
> > >> There are about 13,000 results for September and over 10,000 of those
> > >> for arm-* alone.  It's good to have data but when there's this much
> > >> of it, and when the only form of presentation is as a running list,
> > >> it's too cumbersome to work with.
> > >>
> > >
> > > To help me track & report regressions, I build higher level reports like:
> > > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
> > > where it's more obvious what configurations are tested.
> >
> > That looks awesome!  The regression indicator looks especially
> > helpful.  I really wish we had an overview like this for all
> > results.  I've been thinking about writing a script to scrape
> > gcc-testresults and format an HTML table kind of like this for
> > years.  With that, the number of posts sent to the list wouldn't
> > be a problem (at least not for those using the page).  But it
> > would require settling on a standard format for the basic
> > parameters of each run.
> >
>
> It's probably easier to detect regressions and format reports from the
> .sum files rather than extracting them from the mailing-list.
> But your approach has the advantage that you can detect regressions
> from reports sent by other people, not only by you.
>
>
> > >
> > > Each line of such reports can send a message to gcc-testresults.
> > >
> > > I can control when such emails are sent, independently for each line:
> > > - never
> > > - for daily bump
> > > - for each validation
> > >
> > > So, I can easily reduce the amount of emails (by disabling them for
> > > some configurations),
> > > but that won't make the subject more informative.
> > > I included the short revision (rXX-YYYY) in the title to make it clearer.
> > >
> > > The number of configurations has grown over time because we regularly
> > > found regressions
> > > in configurations not tested previously.
> > >
> > > I can probably easily add the values of --with-cpu, --with-fpu,
> > > --with-mode and RUNTESTFLAGS
> > > as part of the [<branch> revision rXX-YYYY-ZZZZZ] string in the title,
> > > would that help?
> > > I fear that's going to make very long subject lines.
> > >
> > > It would probably be cleaner to update test_summary such that it adds
> > > more info as part of $host
> > > (as in "... testsuite on $host"), so that it grabs useful configure
> > > parameters and runtestflags, however
> > > this would be more controversial.
> >
> > Until a way to present summaries is available, would grouping
> > the results of multiple runs in the same "basic configuration"
> > (for some definition of basic) in the same post work for you?
> >
>
> That's not convenient for me at the moment: each build+make check runs
> on a different server in a scratch area. It sends its results, saves
> the logs and everything else is discarded.
> After that I have a pass to compute regressions once all .sum are
> available, and that's when I build the HTML reports you saw.
> It's not terribly hard to reorganize, but it does require some work
> and probably some disruption. I tend to try to make sure the reports
> and results are still generated while I make changes to the scripts
> :-)
>
> In the meantime, I am updating the title format following the
> suggestions from Richard & Jakub. Hopefully this will be in place
> quite soon, after the currently-running validations have completed.
>


I have updated my scripts, twice because I discovered that an empty
DEV-PHASE had a special meaning when constructing the revision
string....

So a few reports last night had just:
Results for 11.0.0 (GCC) testsuite on XXXX
as title, which can now be as long as:
Results for 8.4.1 [r8-10521 DEFMODE=arm DEFCPU=cortex-a9
DEFFPU=neon-fp16
TESTFLAGS=-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-
abi=hard] (GCC) testsuite on XXX

The first revision number (11.0.0 and 8.4.1 above) come from BASEVER,
but I didn't try to replace it with an empty as it seems many things
depend on it.

Does it help you?

Thanks,

Christophe


> Thanks,
>
> Christophe
>
> > Martin
> >
> > >
> > > Christophe
> > >
> > >> Martin
> > >>
> > >>>
> > >>> I think HJ generates several "running targets" in the same log, I run
> > >>> them separately to benefit from the compute farm I have access to.
> > >>>
> > >>> Christophe
> > >>>
> > >>>> Thanks
> > >>>> Martin
> > >>>>
> > >>>> Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>>>        Results for 11.0.0 20200922 (experimental) [master revision
> > >>>> r11-3343-g44135373fcdbe4019c5524ec3dff8e93d9ef113c] (GCC) testsuite on
> > >>>> arm-none-eabi   Christophe LYON
> > >>
> >

Re: duplicate arm test results?

Reply via email to