On Nov 20, 9:03 am, Krzysztof Retel <[EMAIL PROTECTED]> wrote: > Hi guys, > > I am struggling writing fast UDP server. It has to handle around 10000 > UDP packets per second. I started building that with non blocking > socket and threads. Unfortunately my approach does not work at all. > I wrote a simple case test: client and server. The client sends 2200 > packets within 0.137447118759 secs. The tcpdump received 2189 packets, > which is not bad at all. > But the server only handles 700 -- 870 packets, when it is non- > blocking, and only 670 – 700 received with blocking sockets. > The client and the server are working within the same local network > and tcpdump shows pretty correct amount of packets received. > > I included a bit of the code of the UDP server. > > class PacketReceive(threading.Thread): > def __init__(self, tname, socket, queue): > self._tname = tname > self._socket = socket > self._queue = queue > threading.Thread.__init__(self, name=self._tname) > > def run(self): > print 'Started thread: ', self.getName() > cnt = 1 > cnt_msgs = 0 > while True: > try: > data = self._socket.recv(512) > msg = data > cnt_msgs += 1 > total += 1 > # self._queue.put(msg) > print 'thread: %s, cnt_msgs: %d' % (self.getName(), > cnt_msgs) > except: > pass > > I was also using Queue, but this didn't help neither. > Any idea what I am doing wrong? > > I was reading that Python socket modules was causing some delays with > TCP server. They recomended to set up socket option for nondelays: > "sock.setsockopt(SOL_TCP, TCP_NODELAY, 1) ". I couldn't find any > similar option for UDP type sockets. > Is there anything I have to change in socket options to make it > working faster? > Why the server can't process all incomming packets? Is there a bug in > the socket layer? btw. I am using Python 2.5 on Ubuntu 8.10. > > Cheers > K
First and foremost, you are not being realistic here. Attempting to squeeze 10,000 packets per second out of 10Mb/s (assumed) Ethernet is not realistic. The maximum theoretical limit is 14,880 frames per second, and that assumes each frame is only 84 bytes per frame, making it useless for data transport. Using your numbers, each frame requires (90B + 84B) 174B, which works out to be a theoretical maximum of ~7200 frames per second. These are obviously some rough numbers but I believe you get the point. It's late here, so I'll double check my numbers tomorrow. In your case, you would not want to use TCP_NODELAY, even if you were to use TCP, as it would actually limit your throughput. UDP does not have such an option because each datagram is an ethernet frame - which is not true for TCP as TCP is a stream. In this case, use of TCP may significantly reduce the number of frames required for transport - assuming TCP_NODELAY is NOT used. If you want to increase your throughput, use larger datagrams. If you are on a reliable connection, which we can safely assume since you are currently using UDP, use of TCP without the use of TCP_NODELAY may yield better performance because of its buffering strategy. Assuming you are using 10Mb ethernet, you are nearing its frame- saturation limits. If you are using 100Mb ethernet, you'll obviously have a lot more elbow room but not nearly as much as one would hope because 100Mb is only possible when frames which are completely filled. It's been a while since I last looked at 100Mb numbers, but it's not likely most people will see numbers near its theoretical limits simply because that number has so many caveats associated with it - and small frames are its nemesis. Since you are using very small datagrams, you are wasting a lot of potential throughput. And if you have other computers on your network, the situation is made yet more difficult. Additionally, many switches and/or routes also have bandwidth limits which may or may not pose a wall for your application. And to make matters worse, you are allocating lots of buffers (4K) to send/receive 90 bytes of data, creating yet more work for your computer. Options to try: See how TCP measures up for you Attempt to place multiple data objects within a single datagram, thereby optimizing available ethernet bandwidth You didn't say if you are CPU-bound, but you are creating a tuple and appending it to a list on every datagram. You may find allocating smaller buffers and optimizing your history accounting may help if you're CPU-bound. Don't forget, localhost does not suffer from frame limits - it's basically testing your memory/bus speed If this is for local use only, considering using a different IPC mechanism - unix domain sockets or memory mapped files -- http://mail.python.org/mailman/listinfo/python-list