On Mon, Jun 10, 2024 at 1:04 PM Andres Freund <and...@anarazel.de> wrote: > Just for context for the rest the email: I think we desperately need to move > off perl for tests. The infrastructure around our testing is basically > unmaintained and just about nobody that started doing dev stuff in the last 10 > years learned perl.
Okay. Personally, I'm going to try to stay out of discussions around subtracting Perl and focus on adding Python, for a bunch of different reasons: - Tests aren't cheap, but in my experience, the maintenance-cost math for tests is a lot different than the math for implementations. - I don't personally care for Perl, but having tests in any form is usually better than not having them. - Trying to convince people to get rid of X while adding Y is a good way to make sure Y never happens. > On 2024-06-10 11:46:00 -0700, Jacob Champion wrote: > > 4. It'd be great to split apart client-side tests from server-side > > tests. Driving Postgres via psql all the time is fine for acceptance > > testing, but it becomes a big problem when you need to test how > > clients talk to servers with incompatible feature sets, or how a peer > > behaves when talking to something buggy. > > That seems orthogonal to using pytest vs something else? Yes, I think that's fair. It's going to be hard not to talk about "things that pytest+Python don't give us directly but are much easier to build" in all of this (and I tried to call that out in the next section, maybe belatedly). I think I'm going to have to convince both a group of people who want to ask "why pytest in particular?" and a group of people who ask "why isn't what we have good enough?" > > == Why pytest? == > > > > From the small and biased sample at the unconference session, it looks > > like a number of people have independently settled on pytest in their > > own projects. In my opinion, pytest occupies a nice space where it > > solves some of the above problems for us, and it gives us plenty of > > tools to solve the other problems without too much pain. > > We might be able to alleviate that by simply abstracting it away, but I found > pytest's testrunner pretty painful. Oodles of options that are not very well > documented and that often don't work because they are very specific to some > situations, without that being explained. Hm. There are a bunch of them, but I've never needed to go through the oodles of options. Anything in particular that caused problems? > > Problem 1 (rerun failing tests): One architectural roadblock to this > > in our Test::More suite is that tests depend on setup that's done by > > previous tests. pytest allows you to declare each test's setup > > requirements via pytest fixtures, letting the test runner build up the > > world exactly as it needs to be for a single isolated test. These > > fixtures may be given a "scope" so that multiple tests may share the > > same setup for performance or other reasons. > > OTOH, that's quite likely to increase overall test times very > significantly. Yes, sometimes that can be avoided with careful use of various > features, but often that's hard, and IME is rarely done rigiorously. Well, scopes are pretty front and center when you start building pytest fixtures, and the complicated longer setups will hopefully converge correctly early on and be reused everywhere else. I imagine no one wants to build cluster setup from scratch. On a slight tangent, is this not a problem today? I mean... part of my personal long-term goal is in increasing test hygiene, which is going to take some shifts in practice. As long as review keeps the quality of the tests fairly high, I see the inevitable "our tests take too long" problem as a good one. That's true no matter what framework we use, unless the framework is so bad that no one uses it and the runtime is trivial. If we're worried that people will immediately start exploding the runtime and no one will notice during review, maybe we can have some infrastructure flag how much a patch increased it? > > Problem 2 (seeing what failed): pytest does this via assertion > > introspection and very detailed failure reporting. If you haven't seen > > this before, take a look at the pytest homepage [1]; there's an > > example of a full log. > > That's not really different than what the perl tap test stuff allows. We > indeed are bad at utilizing it, but I'm not sure that switching languages will > change that. Jelte already touched on this, but I wanted to hammer on the point: If no one, not even the developers who chose and like Perl, is using Test::More in a way that's maintainable, I would prefer to use a framework that does maintainable things by default so that you have to try really hard to screw it up. It is possible to screw up `assert actual == expected`, but it takes more work than doing it the right way. > I think part of the problem is that the information about what precisely > failed is often much harder to collect when testing multiple servers > interacting than when doing localized unit tests. > > I think we ought to invest a bunch in improving that, I'd hope that a lot of > that work would be largely independent of the language the tests are written > in. We do a lot more acceptance testing than internal testing, which came up as a major complaint from me and others during the unconference. One of the reasons people avoid writing internal tests in Perl is because it's very painful to find a rhythm with Test::More. From experience test-driving the OAuth work, I'm *very* happy with the development cycle that pytest gave me. Other languages _could_ do that, sure. It's a simple matter of programming... > Ugh, I think this is actually python's weakest area. There's about a dozen > package managers and "python distributions", that are at best half compatible, > and the documentation situation around this is *awful*. So... don't support the half-compatible stuff? I thought this conversation was still going on with Windows Perl (ActiveState? Strawberry?) but everyone just seems to pick what works for them and move on to better things to do. Modern CPython includes pip and venv. Done. If someone comes to us with some horrible Anaconda setup wanting to know why their duct tape doesn't work, can't we just tell them no? > > When it comes to third-party packages, which I think we're > > probably going to want in moderation, we would still need to discuss > > supply chain safety. Python is not as mature here as, say, Go. > > What external dependencies are you imagining? The OAuth pytest suite makes extensive use of - psycopg, to easily drive libpq; - construct, for on-the-wire packet representations and manipulation; and - pyca/cryptography, for easy generation of certificates and manual crypto testing. I'd imagine each would need considerable discussion, if there is interest in doing the same things that I do with them. > I think somewhere between 1 and 4 a *substantial* amount of work would be > required to provide a bunch of the infrastructure that Cluster.pm etc > provide. Otherwise we'll end up with a lot of copy pasted code between tests. Possibly, yes. I think it depends on what you want to test first, and there's a green-field aspect of hope/anxiety/ennui, too. Are you trying to port the acceptance-test framework that we already have, or are you trying to build a framework that can handle the things we can't currently test? Will it be easier to refactor duplication into shared fixtures when the language doesn't encourage an infinite number of ways to do things? Or will we have to keep on top of it to avoid pain? --Jacob