Re: [PATCH 3/5] dts: use Pydantic in the configuration

Juraj Linkeš Tue, 17 Sep 2024 04:13:21 -0700



On 22. 8. 2024 18:39, Luca Vizzarro wrote:

This change brings in Pydantic in place of Warlock. Pydantic offers
a built-in model validation system in the classes, which allows for
a more resilient and simpler code. As a consequence of this change:

- most validation is now built-in
- further validation is added to verify:
   - cross referencing of node names and ports
   - test suite and test cases names
- dictionaries representing the config schema are removed
- the config schema is no longer used for validation but kept as an
   alternative format for the developer

If it's not used, we should remove it right away (in this patch). I seethat it's updated in v5, but we can just add it back.

- the config schema can now be generated automatically from the
   Pydantic models
- the TrafficGeneratorType enum has been changed from inheriting
   StrEnum to the native str and Enum. This change was necessary to
   enable the discriminator for object unions
- the structure of the classes has been slightly changed to perfectly
   match the structure of the configuration files
- updates the test suite argument to catch the ValidationError that
   TestSuiteConfig can now raise

Passive voice is not used here, but the rest of the bullet points areusing it.

  delete mode 100644 dts/framework/config/types.py

A note, don't forget to remove this from doc sources if those get mergedbefore this.


diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py

@@ -2,17 +2,19 @@

-the YAML test run configuration file
-and validates it according to :download:`the schema <conf_yaml_schema.json>`.
+the YAML test run configuration file and validates it against the 
:class:`Configuration` Pydantic
+dataclass model. The Pydantic model is also available as
+:download:`JSON schema <conf_yaml_schema.json>`.


This second sentence should be moved to the last patch.

@@ -33,29 +35,31 @@

+)
+from pydantic.config import JsonDict
+from pydantic.dataclasses import dataclass

We should probably distinguish between built-in dataclasses and pydanticdataclasses (as pydantic adds the extra argument). Importing them aspydantic_dataclass seems like the easiest way to achieve this.

@@ -116,14 +120,14 @@ class Compiler(StrEnum):

  @unique
-class TrafficGeneratorType(StrEnum):
+class TrafficGeneratorType(str, Enum):
      """The supported traffic generators."""

#:

-    SCAPY = auto()
+    SCAPY = "SCAPY"


Do discriminators not work with auto()?

-@dataclass(slots=True, frozen=True)
+@dataclass(slots=True, frozen=True, kw_only=True, 
config=ConfigDict(extra="forbid"))

Is there any special reason for kw_only? Maybe we should add the reasonfor this (and also the config arg) to the module dosctring and commit msg.

@@ -136,12 +140,17 @@ class HugepageConfiguration:

-@dataclass(slots=True, frozen=True)
+PciAddress = Annotated[
+    str, 
StringConstraints(pattern=r"^[\da-fA-F]{4}:[\da-fA-F]{2}:[\da-fA-F]{2}.\d:?\w*$")

We have a pattern for this in utils.py. We can reuse and maybe update itif needed.

+]
+"""A constrained string type representing a PCI address."""

This should be above the var and I think regular comment (with #) shouldsuffice.

@@ -150,69 +159,53 @@ class PortConfig:

+TrafficGeneratorConfigTypes = Annotated[ScapyTrafficGeneratorConfig, 
Field(discriminator="type")]

-@dataclass(slots=True, frozen=True)

-class ScapyTrafficGeneratorConfig(TrafficGeneratorConfig):
-    """Scapy traffic generator specific configuration."""

- pass

+LogicalCores = Annotated[
+    str,
+    
StringConstraints(pattern=r"^(([0-9]+|([0-9]+-[0-9]+))(,([0-9]+|([0-9]+-[0-9]+)))*)?$"),
+    Field(
+        description="Comma-separated list of logical cores to use. "
+        "An empty string means use all lcores.",
+        examples=["1,2,3,4,5,18-22", "10-15"],
+    ),
+]


These two types don't have have a docstring, but others have.

@@ -232,69 +225,25 @@ class NodeConfiguration:

      arch: Architecture
      os: OS

Adding the descriptions to all fields would be beneficial. Do we want todo that in this patch?

@@ -313,10 +264,14 @@ class TGNodeConfiguration(NodeConfiguration):

-    traffic_generator: TrafficGeneratorConfig
+    traffic_generator: TrafficGeneratorConfigTypes
+

+NodeConfigurationTypes = TGNodeConfiguration | SutNodeConfiguration

+"""Union type for all the node configuration types."""


Same note as with PciAddress.

@@ -405,31 +369,63 @@ class TestSuiteConfig:

+    test_suite_name: str = Field(
+        title="Test suite name",
+        description="The identifying name of the test suite.",

I think we need to update this to mention that it's the test suitemodule name. Maybe we can also update the field, as it's only used inthis object.

+        alias="test_suite",
+    )
+    test_cases_names: list[str] = Field(
+        default_factory=list,
+        title="Test cases by name",
+        description="The identifying name of the test cases of the test 
suite.",
+        alias="test_cases",
+    )


The attributes under Attributes need to be updated.

+
+    @cached_property
+    def test_suite_spec(self) -> "TestSuiteSpec":
+        """The specification of the requested test suite."""
+        from framework.test_suite import find_by_name
+
+        test_suite_spec = find_by_name(self.test_suite_name)
+        assert test_suite_spec is not None, f"{self.test_suite_name} is not a valid 
test suite name"

Doesn't end with a dot; the message should also mention that we'redealing with module name.

+        return test_suite_spec
+
+    @model_validator(mode="before")

I think it makes sense to exlude these from docs. I tried putting :metaprivate: into a docstring and it seems to be working.

      @classmethod
-    def from_dict(
-        cls,
-        entry: str | TestSuiteConfigDict,
-    ) -> Self:
-        """Create an instance from two different types.
+    def convert_from_string(cls, data: Any) -> Any:
+        """Convert the string representation into a valid mapping."""
+        if isinstance(data, str):
+            [test_suite, *test_cases] = data.split()
+            return dict(test_suite=test_suite, test_cases=test_cases)
+        return data
+

Why is this here? To unify the format with the one accepted by the--test-suite argument? Do we want to add an alternative format? If so,we need to make sure we document clearly that there are two alternativesand that they're equivalent.

+    @model_validator(mode="after")
+    def validate_names(self) -> Self:
+        """Validate the supplied test suite and test cases names."""

In Configuration.validate_test_runs_with_nodes() the docstring mentionsthe use of the cached property, let's also do that here.

+        available_test_cases = map(lambda t: t.name, 
self.test_suite_spec.test_cases)
+        for requested_test_case in self.test_cases_names:
+            assert requested_test_case in available_test_cases, (
+                f"{requested_test_case} is not a valid test case "
+                f"for test suite {self.test_suite_name}"

for test suite -> of test suite; also end with a dot. The dot is missingin a lot of places (and capital letters where the message doesn't startwith a var value).

@@ -442,143 +438,132 @@ class TestRunConfiguration:

-@dataclass(slots=True, frozen=True)
+
+@dataclass(frozen=True, kw_only=True)
  class Configuration:
      """DTS testbed and test configuration.

- The node configuration is not stored in this object. Rather, all used node configurations

-    are stored inside the test run configuration where the nodes are actually 
used.
-

I think it makes sense to explain the extra validation (with the@*_validator decorators) that's being done in the docstring (if weremove the validation methods from the generated docs). The docstringshould be updated for each model that doing the extra validation.

      Attributes:
          test_runs: Test run configurations.
+        nodes: Node configurations.
      """

- test_runs: list[TestRunConfiguration]

+    test_runs: list[TestRunConfiguration] = Field(min_length=1)
+    nodes: list[NodeConfigurationTypes] = Field(min_length=1)

+ @field_validator("nodes")

      @classmethod
-    def from_dict(cls, d: ConfigurationDict) -> Self:
-        """A convenience method that processes the inputs before creating an 
instance.
+    def validate_node_names(cls, nodes: list[NodeConfiguration]) -> 
list[NodeConfiguration]:
+        """Validate that the node names are unique."""
+        nodes_by_name: dict[str, int] = {}
+        for node_no, node in enumerate(nodes):
+            assert node.name not in nodes_by_name, (
+                f"node {node_no} cannot have the same name as node 
{nodes_by_name[node.name]} "
+                f"({node.name})"
+            )
+            nodes_by_name[node.name] = node_no
+
+        return nodes
+
+    @model_validator(mode="after")
+    def validate_ports(self) -> Self:
+        """Validate that the ports are all linked to valid ones."""
+        port_links: dict[tuple[str, str], Literal[False] | tuple[int, int]] = {
+            (node.name, port.pci): False for node in self.nodes for port in 
node.ports
+        }
+
+        for node_no, node in enumerate(self.nodes):

I could see why we're use enumeration for nodes in validate_node_names,but here we can just use node names in assert messages. At least thatshould be the case if nodes get validated before this model validatorruns - it that the case?

+            for port_no, port in enumerate(node.ports):
+                peer_port_identifier = (port.peer_node, port.peer_pci)
+                peer_port = port_links.get(peer_port_identifier, None)
+                assert peer_port is not None, (
+                    "invalid peer port specified for " 
f"nodes.{node_no}.ports.{port_no}"
+                )
+                assert peer_port is False, (
+                    f"the peer port specified for nodes.{node_no}.ports.{port_no} 
"
+                    f"is already linked to 
nodes.{peer_port[0]}.ports.{peer_port[1]}"
+                )
+                port_links[peer_port_identifier] = (node_no, port_no)

+    @cached_property
+    def test_runs_with_nodes(self) -> list[TestRunWithNodesConfiguration]:

Let's move the property to be the first member of the class, to unifythe order it with TestSuiteConfig.

+        """List test runs with the associated nodes."""

This doesn't list the test runs. I think the docstring should say a bitmore to make it obvious that this is the main attribute to use with thisclass. Or maybe that could be in the the class's docstring.


We're also missing the Returns: section.

+        test_runs_with_nodes = []

- Returns:

-            The whole configuration instance.
-        """
-        nodes: list[SutNodeConfiguration | TGNodeConfiguration] = list(
-            map(NodeConfiguration.from_dict, d["nodes"])
-        )
-        assert len(nodes) > 0, "There must be a node to test"
+        for test_run_no, test_run in enumerate(self.test_runs):
+            sut_node_name = test_run.system_under_test_node.node_name
+            sut_node = next(filter(lambda n: n.name == sut_node_name, 
self.nodes), None)

There are a number of these instead of a list comprehension (I mentionedthis in a previous patch). I still don't really see a reason to not uselist comprehensions in all these cases.

+
+
+ConfigurationType = TypeAdapter(Configuration)

This new transformed class exists only for validation purposes, right? Ithink we can move this to load_config, as it's not going to be usedanywhere else.

Also I'd rename it to something else, it's not a type. MaybeConfigurationAdapter or PydanticConfiguration or ConfigurationModel (asthe adapter adds some methods from BaseModel). Or something else, butthe type in the name confused me.

diff --git a/dts/framework/runner.py b/dts/framework/runner.py
@@ -231,10 +234,10 @@ def _get_test_suites_with_cases(
          test_suites_with_cases = []

for test_suite_config in test_suite_configs:

-            test_suite_class = 
self._get_test_suite_class(test_suite_config.test_suite)
+            test_suite_class = 
self._get_test_suite_class(test_suite_config.test_suite_name)

We've already done all the validation and importing at this point and weshould be able to use test_suite_config.test_suite_spec, right? The sameis true for TestSuiteWithCases, which holds the same information.

Looks like you removed _get_test_suite_class in a subsequent patch, butwe should think about getting rid of TestSuiteWithCases, as it wasconceived to do what TestSuiteSpec is doing.

Re: [PATCH 3/5] dts: use Pydantic in the configuration

Reply via email to