Hi, in the Debian Med team there are two GSoC students very busy to write autopkgtests for (in the long run) all our packages (if possible). For several packages it is necessary to provide data sets which in many cases are not provided together with the upstream source. The students rather were seeking the internet for data available in scientific publications or some public databases. I personally think they did quite a good job in doing so to test our software in some real world examples.
In the Debian Med team we have the rule that we do not only rely on the existing packaging source directory for the autopkgtest but in addition provide the test script (and thus the needed data) inside /usr/share/doc/pkgname to serve as a useful example for the program in question on one hand and also enable users to run the test right on their machine as well. In the case of larger data sets it seems to be natural to provide the data in a separate binary architecture all package to not bloat the machines of users who do not want this and also save bandwidt of our mirroring network. New binary packages require new processing and my question is here about a set of rejection mails we received ( . On Sun, Sep 13, 2020 at 12:00:08PM +0000, Thorsten Alteholz wrote:[1] > your debian tar file is much too large. I admit the debian/ dir (2.7MB) exceeds the real code (300kB) by far. However can we please fix somewhere in our packaging documentation what size of the debian/ dir is acceptable or not. > Please put all data in a separate source package and don't forget to add the > copyright information. I think we should try to document somehow, when there is a need for some separate source package. I would agree if the code is some kind of moving target and data would not change or if there is some kind of versioned downloadable tarball or the data can be shared between different software package. But here none of these conditions is fulfilled. > But as you don't really need 4we2.ply, you might just omit as well. I think in the example of edtsurf this is the major point. You have given the perfectly helpful hint to Pranav in some other case and instead of shipping data for the only reason of comparing the result we started to ship check sums instead. So I think for this case we can settle with some solution. On Sun Sep 13 13:00:09 BST 2020, Thorsten Alteholz wrote:[2] > please explain why you need such a huge amount of test data in this package. Shayan has explained it in 2a. I also think if upstream delivers the source package that way we should not really change the tarball to shrink the size of the data originally shipped if the license is OK. The same question as I had above for the term "much too large" applies here for "huge amount". Some kind of rule of thumb what is acceptable or not would be helpful. I'm also wondering what you mean by "Please explain" in a "reject" mail. For my understanding someone asks for an explanation before a decision is drawn. But the reject is actually a decision. In what form would you expect the explanation. Probably not via mail (as Shayan did) since this would not bring back the package into the new queue. So could you please be more verbosely like: Please explain in debian/README.??? why you decided to keep all test data that is provided by upstream. or something like this. Shayan is a very dedicated and extremely productive newcomer. It would be great if he would get some more helpful advise how to enhance. Since I personally also have no real clue I'm writing here for some kind of general clarification. On Sun Sep 13 18:00:08 BST 2020, Thorsten Alteholz wrote:[3] > please don't hide data under debian/*. Sorry, Thorsten but I think "hiding" is not the right term. We have no other dir to add extra files than the debian/ dir. That's why I think Nilesh was correct to store the data here - where these IMHO naturally belong since these are test data and thus are next to the autopkgtest script. > If you really need those data please create a separate source package That's the question that I'm repeatedly wondering about and thats why I assemble all these three rejects here in one mail: What is the general opinion for creating a separate source package in cases like this. I do not see any profit from an extra source package. From my point of view autopkgtest data are belonging to the packaging code and thus are fine here. But for sure I might be wrong and would like to clarify this hereby. > and never ever do Recommend: such package. That's absolutely correct and definitely an oversight of mine as the sponsor of this package. Sorry about this. On the other hand I'm not sure whether it is a reason for a reject. If there would be no other issue for the package I would consider it more productive for all of us if you would accept and file an RC bug that could be fixed in the source-only upload that is needed anyway. > Don't forget to mention the copyright information. In principle yes, but these data are not copyrightable as far as I know. Nilesh has mentioned the origin of data in debian/tests/README to provide a reference. If you consider this information not sufficient please let us know a better way. I'm trying to clarify the questions here and we will add this to the Debian Med policy at least (for a start - I guess this question might come up in other teams as well) to make sure our we will push better packages in future into new queue. For sure as always I like to express my explicit thanks to Thorsten who has spent a lot of hours with all our packages. We really appreciate this effort and would love to become better to decrease the amount of time this kind of packages might take. Kind regards Andreas. [1] https://alioth-lists.debian.net/pipermail/debian-med-packaging/2020-September/084428.html [2] https://alioth-lists.debian.net/pipermail/debian-med-packaging/2020-September/084430.html [2a] https://alioth-lists.debian.net/pipermail/debian-med-packaging/2020-September/084441.html [3] https://alioth-lists.debian.net/pipermail/debian-med-packaging/2020-September/084432.html -- http://fam-tille.de