Hi, Quoting Christian Kastner (2020-07-16 14:08:34) > On 2020-07-16 12:53, Pirate Praveen wrote: > >> Generally speaking, I think it's a mistake to apply the question of > >> "preferred form for modification" to unit test payloads. Unit tests are > >> purely about functionality. The original source to a payload is an > >> arbitrary choice (possibly even randomly generated), and could be > >> replaced with any other appropriate arbitrary choice at no detriment to > >> the software or the user. > > I think this needs to be clearly documented in policy. I don't think > > this interpretation is generally accepted. I have seen many cases where > > tests are disabled for this reason. > Perhaps I spoke too generally. For example, I can see, as one of > probably many counter-examples, the case where the input is not > completely arbitrary (eg: input is a captured stream). > > But to take the other extreme, using completely arbitrary data, as an > example: say my code implements a ROT13 function and I create a test for > it using a blob of random data as well as the expected output. > > That random data was generated somehow, eg: using Python's random > module, and could therefore be regenerated given the correct program and > seed. However, I did not include the code to generate that data. > > Would we really reasonably expect anyone to act upon that random blob in > any way?
I have another data point with one of my packages (genext2fs) where I made a contribution to upstream. Their unit tests execute the program with some input and a given set of parameters and then check that the md5sum of the created ext2 filesystem image matches the expected value. Without thinking, I added the following into their test script: H4sIAAAAAAAAA+3WTW6DMBAF4Fn3FD6B8fj3PKAqahQSSwSk9vY1uKssGiJliFretzECJAYeY1s3JM4UKYRlLG7H5ZhdTIHZGevK+ZTYkgrypRFN17EdlKIh5/G3++5d/6N004qbA47er8/fWVduV2aLD7D7/A85C88Ba/ufA/sQIhk25VdA/2+h5t+1gx4/pd7vfv+Hm/ytmfNH/8vr+ql7e3UR8DK6uUx9L/uMtev/3P8p+KX/oyHlZMuqntX/9T34Z9yk9Gco8//xkGWf8Uj+Mbpl/Y+JVJQtq9r5/K+bj3Z474+Xk9wG4JH86/rvyzxAirfYnOw+/+vXWTb+uv9PaV3+JfiSv/WOlJVPf/f5AwAAAAAAAAAAAMD/9A0cPbO/ACgAAA== This is a base64 encoded gzipped tarball with a few test files in it. I generated it using GNU tar but since I found it likely that a GNU tar version in the future (or the past) will produce a slightly different tarball and because I needed some fixed input without different output on systems without GNU tar (like BSD or MacOS) or on older systems or on future systems, I just dumped that binary blob into the upstream software. In the meantime, that binary blob is even in the Debian package: https://sources.debian.org/src/genext2fs/1.5.0-1/test.sh/#L89 The curious thing for me personally is, that I didn't feel bad about this at all and at no point from writing the code up to me packaging and uploading the Debian package containing the blob, I thought even twice about whether this is DFSG compliant or not. Only now after having read this thread I start wondering whether I have actually created an RC bug myself. Did I? I love the principles of the DFSG and it really surprises me that despite my love for these freedoms I didn't think twice about including that binary blob instead of generating it on the fly. Was my mind fooled by how short the blob is? A perl script generating the tarball such that it's bit-by-bit identical across all platforms would be longer than this blob. What do you guys think? Should I put work into writing a script which produces above binary blob as part of the test suite to avoid having my package be RC buggy? I would love to get some guidance. Thanks! cheers, josch
signature.asc
Description: signature