Anthony Harivel, Mar 27, 2024 at 16:18:
Hi Robin,
Thanks for this patch. I did test it and it works as expected.
Nonetheless, maybe we can improve on some parts.
Hey Anthony, thanks a lot for testing!
In 'class TelemetrySocket', there is:
...
self.sock.connect(path)
data = json.loads(self.sock.recv(1024).decode())
...
Maybe we can improve with something like:
try:
rcv_data = self.sock.recv(1024)
if rcv_data:
data = json.loads(rcv_data.decode())
else:
print("No data received from socket.")
except json.JSONDecodeError as e:
print("Error decoding JSON:", e)
except Exception as e:
print("An error occurred:", e)
So that it handles a bit better the error cases.
This is undesired as it would actually mask the errors from the calling
code. Unless you can do something about the error, printing it should be
left to the calling code. I will handle these errors better in v2.
In the same way to implement more robust error handling mechanisms in:
def load_endpoints
...
except Exception as e:
LOG.error("Failed to load endpoint module '%s' from '%s': %s", name, f, e)
...
For example, you might catch FileNotFoundError, ImportError, or SyntaxError.
That could help to debug!
We could print the whole stack trace but I don't see what special
handling could be done depending on the exception. I will try to make
this better for v2.
About TelemetryEndpoint I would see something like:
class TelemetryEndpoint:
"""
Placeholder class only used for typing annotations.
"""
@staticmethod
def info() -> typing.Dict[MetricName, MetricInfo]:
"""
Mapping of metric names to their description and type.
"""
raise NotImplementedError()
@staticmethod
def metrics(sock: TelemetrySocket) -> typing.List[MetricValue]:
"""
Request data from sock and return it as metric values. Each metric
name must be present in info().
"""
try:
metrics = []
metrics_data = sock.fetch_metrics_data()
for metric_name, metric_value in metrics_data.items():
metrics.append((metric_name, metric_value, {}))
return metrics
except Exception as e:
LOG.error("Failed to fetch metrics data: %s", e)
# If unable to fetch metrics data, return an empty list
return []
With these changes, the metrics method of the TelemetryEndpoint class
could handle errors better and the exporter can continue functioning
even if there are issues with fetching metrics data.
I don't know if all of that makes sens or if it's just nitpicking!
I can also propose an enhanced version of your patch if you prefer.
As indicated in the docstring, this class is merely a placeholder. Its
code is never executed. It only stands to represent what functions must
be implemented in endpoints.
However, your point is valid. I will update my code to handle errors in
endpoints more gracefully and avoid failing the whole request if there
is only one error.
Thanks for the thorough review!