On 2023/07/03 09:37:01 Yasuhito FUTATSUKI wrote:
> Hi,
> 
> On 2023/07/03 3:17, James McCoy wrote:
> > Hi all,
> > 
> > Python 3.6 deprecated the asyncore and asynchat modules in favor of the
> > asyncio module.  Per [PEP 594], that means that the modules
> > will be removed in Python 3.12, thus breaking svnpubsub/client.py.
> > 
> > Is there someone familiar enough with these Python modules to work on
> > updating svnpubsub?  What's our policy on minimum supported Python
> > version for the tools?
> 
> Before it, nobody check (and/or report) that the scripts in
> tools/server-side/svnpubsub can run with any version of Python 3.x.
> 
> https://cwiki.apache.org/confluence/display/SVN/Subversion%27s+Python+3+Support+Status
> 
> It seems that those tools in the section "Not Checked For Python 3 Yet"
> are now unmantained.
> 
> > [PEP 594]: https://peps.python.org/pep-0594/#deprecated-modules
> > 
> > Cheers,
> 
> Cheers,
> -- 
> Yasuhito FUTATSUKI <futat...@yf.bsdclub.org>
> 

I took some time today to try to figure out how it all connects. I made a 
minimal client which I believe should be correct:
[[[
def cc(o):
  print("commit " + repr(o))

def ec(url, type, o):
  print("event on " + repr(url) + ": " + repr(type) + "\n" + repr(o))

def mc(o):
  print("metadata " + repr(o))

import client
c=client.MultiClient(["http://pubsub.apache.org:2069";], cc, ec, mc)
c.run_forever()
]]]

Running on Python 3.11.2, I only get the following output:
[[[
event on 'http://pubsub.apache.org:2069': 'connected'
None
event on 'http://pubsub.apache.org:2069': 'closed'
None
event on 'http://pubsub.apache.org:2069': 'connected'
None
event on 'http://pubsub.apache.org:2069': 'closed'
None
...
]]]
'closed' come immediately after 'connected'. Between each pair there is 
approximately 30 seconds

I also tries settingup svnwcsub.py/svnwcsub.conf and I got it running but with 
the same events as above.

Running curl on the same endpoint print a "stillalive" event every 5 seconds 
(with the occational actual event inbetween):
[[[
$ curl http://pubsub.apache.org:2069
{"stillalive": 1732781004.3811345}
{"stillalive": 1732781009.3833125}
]]]

Comparing the svnpubsub and curl network traffic using tcpdump, I noted that 
svnpubsub use HTTP/1.0 while curl use HTTP/1.1. I also saw some differences 
between the format actual on-the-wire-format by pubsub.apache.org 
(Transfer-Encoding: chunked, so each record is a tuple consisting of [length] 
and [json] separated and terminated by \r\n) and the expected formt (record 
separator \0 and no length field).

Checking the code for pypubsub [1] it seems the \0 can be added depending on 
the User-Agent header but it seems to always use Transfer-Encoding: chunked.

I've tested the below changes:

[[[
Index: client.py
===================================================================
--- client.py   (revision 1922187)
+++ client.py   (working copy)
@@ -97,7 +97,8 @@
       self.handle_error()
       return

-    self.push(('GET %s HTTP/1.0\r\n\r\n' % resource).encode('ascii'))
+    self.push(('GET %s HTTP/1.1\r\nHost: %s\r\n\r\n' %
+              (resource, host + ":" + str(port))).encode('ascii'))

   def handle_connect(self):
     self.event_callback('connected', None)
@@ -113,12 +114,20 @@
   def found_terminator(self):
     if self.skipping_headers:
       self.skipping_headers = False
-      # Each JSON record is terminated by a null character
-      self.set_terminator(b'\0')
+      # pypubsub is using Transfer-Encoding: chunked so each message is
+      # 0xLength\r\n{json...}\n\r\n
+      #             ^^^^^^^^^^ <- This is the Length
+      # Set terminator to the trailing \n plus the second \r\n to catch the
+      # whole message
+      self.set_terminator(b'\n\r\n')
     else:
-      record = b"".join(self.ibuffer)
+      record = b"".join(self.ibuffer).decode()
       self.ibuffer = []
-      self.parser.feed(record.decode())
+      # Cut out the length before parsing the json
+      index = record.find('\r\n')
+      if index:
+        record = record[index+2:]
+      self.parser.feed(record)

   def collect_incoming_data(self, data):
     # Remember the last time we saw activity
]]]

With this, the client seems to work. I'd appreciate if someone with Python 
knowledge can review it before I commit.

The next step will be to rewrite for asyncio, but I'd like to have a functional 
client as a baseline.

Cheers,
Daniel

[1] https://github.com/apache/infrastructure-pypubsub/blob/master/pypubsub.py

Reply via email to