Hi, On 2024/11/29 6:47, Daniel Sahlberg wrote: > I took some time today to try to figure out how it all connects. I made a > minimal client which I believe should be correct: > [[[ > def cc(o): > print("commit " + repr(o)) > > def ec(url, type, o): > print("event on " + repr(url) + ": " + repr(type) + "\n" + repr(o)) > > def mc(o): > print("metadata " + repr(o)) > > import client > c=client.MultiClient(["http://pubsub.apache.org:2069"], cc, ec, mc) > c.run_forever() > ]]] > > Running on Python 3.11.2, I only get the following output: > [[[ > event on 'http://pubsub.apache.org:2069': 'connected' > None > event on 'http://pubsub.apache.org:2069': 'closed' > None > event on 'http://pubsub.apache.org:2069': 'connected' > None > event on 'http://pubsub.apache.org:2069': 'closed' > None > ... > ]]] > 'closed' come immediately after 'connected'. Between each pair there is > approximately 30 seconds > > I also tries settingup svnwcsub.py/svnwcsub.conf and I got it running but > with the same events as above. > > Running curl on the same endpoint print a "stillalive" event every 5 seconds > (with the occational actual event inbetween): > [[[ > $ curl http://pubsub.apache.org:2069 > {"stillalive": 1732781004.3811345} > {"stillalive": 1732781009.3833125} > ]]] > > Comparing the svnpubsub and curl network traffic using tcpdump, I noted that > svnpubsub use HTTP/1.0 while curl use HTTP/1.1. I also saw some differences > between the format actual on-the-wire-format by pubsub.apache.org > (Transfer-Encoding: chunked, so each record is a tuple consisting of [length] > and [json] separated and terminated by \r\n) and the expected formt (record > separator \0 and no length field). > > Checking the code for pypubsub [1] it seems the \0 can be added depending on > the User-Agent header but it seems to always use Transfer-Encoding: chunked. > > I've tested the below changes: > > [[[ > Index: client.py > =================================================================== > --- client.py (revision 1922187) > +++ client.py (working copy) > @@ -97,7 +97,8 @@ > self.handle_error() > return > > - self.push(('GET %s HTTP/1.0\r\n\r\n' % resource).encode('ascii')) > + self.push(('GET %s HTTP/1.1\r\nHost: %s\r\n\r\n' % > + (resource, host + ":" + str(port))).encode('ascii')) > > def handle_connect(self): > self.event_callback('connected', None) > @@ -113,12 +114,20 @@ > def found_terminator(self): > if self.skipping_headers: > self.skipping_headers = False > - # Each JSON record is terminated by a null character > - self.set_terminator(b'\0') > + # pypubsub is using Transfer-Encoding: chunked so each message is > + # 0xLength\r\n{json...}\n\r\n > + # ^^^^^^^^^^ <- This is the Length > + # Set terminator to the trailing \n plus the second \r\n to catch the > + # whole message > + self.set_terminator(b'\n\r\n') > else: > - record = b"".join(self.ibuffer) > + record = b"".join(self.ibuffer).decode() > self.ibuffer = [] > - self.parser.feed(record.decode()) > + # Cut out the length before parsing the json > + index = record.find('\r\n') > + if index: > + record = record[index+2:] > + self.parser.feed(record) > > def collect_incoming_data(self, data): > # Remember the last time we saw activity > ]]] > > With this, the client seems to work. I'd appreciate if someone with Python > knowledge can review it before I commit. > > The next step will be to rewrite for asyncio, but I'd like to have a > functional client as a baseline. > > Cheers, > Daniel > > [1] https://github.com/apache/infrastructure-pypubsub/blob/master/pypubsub.py
I'm not familiar with svnpubsub.py but tried to run the script with the proposed patch. However it doesn't work. Investigating it, 400 Bad Request is received because the client.py sends "GET HTTP/1.1" (missing request uri). After applying the following patch, it seems to work. [[[ diff --git a/tools/server-side/svnpubsub/svnpubsub/client.py b/tools/server-side/svnpubsub/svnpubsub/client.py index 871a5e925..004003512 100644 --- a/tools/server-side/svnpubsub/svnpubsub/client.py +++ b/tools/server-side/svnpubsub/svnpubsub/client.py @@ -75,7 +75,7 @@ class Client(asynchat.async_chat): raise ValueError("URL scheme must be http: '%s'" % url) host = parsed_url.hostname port = parsed_url.port - resource = parsed_url.path + resource = parsed_url.path or '/' if parsed_url.query: resource += "?%s" % parsed_url.query if parsed_url.fragment: ]]] I think it would simplify by the following rather than `host + ":" + str(port)`. [[[ - self.push(('GET %s HTTP/1.0\r\n\r\n' % resource).encode('ascii')) + self.push(('GET %s HTTP/1.1\r\nHost: %s:%d\r\n\r\n' % + (resource, host, port)).encode('ascii')) ]]] In the minimal client, it uses `repr` but `__repr__` method of `Notification` class is not defined. I'd suggest to add the `__repr__` method. [[[ @@ -131,6 +140,8 @@ class Client(asynchat.async_chat): class Notification(object): def __init__(self, data): self.__dict__.update(data) + def __repr__(self): + return '%s(%r)' % (self.KIND, self.__dict__) class Commit(Notification): KIND = 'COMMIT' ]]] Finally, `cc` and `mc` methods in the minimal client should have `url` parameter. [[[ --- svnpubsub-client.py.orig 2024-12-04 06:25:33.083708157 +0900 +++ svnpubsub-client.py 2024-12-02 10:28:30.489176287 +0900 @@ -1,10 +1,10 @@ -def cc(o): +def cc(url, o): print("commit " + repr(o)) def ec(url, type, o): print("event on " + repr(url) + ": " + repr(type) + "\n" + repr(o)) -def mc(o): +def mc(url, o): print("metadata " + repr(o)) import client ]]] -- Jun Omae <jun6...@gmail.com> (大前 潤)