Ed, It depends on what IM protocol the company is using. If there is more than one, your job might end up being quite complicated. You indicated port 5190 in your post, does it mean that the company is using only AOL IM? In general it seems like you would have to:
1) Capture the traffic 2) Decode the IM protocol 3) Record the captured text 1) As far as capturing the traffic, I would use a specific tool like tcpick ( a cousin of tcpdump but actually dumps the data to console not just the headers and recreates the tcp streams -- good stuff!). Again if you know the exact port number and the exact protocol this might be very easy because you will set up your capturing program to capture traffic from only 1 port. Let's assume that for now. Here is my quick and dirty attempt. First install tcpick http://tcpick.sourceforge.net/ if you don't have it, then become root and open a Python prompt. (Use ipython... because my mom says it's better ;). In [1]:from subprocess import * #don't do this in your final script always use 'import subprocess' In [2]:cmd='/usr/sbin/tcpick -i eth0 -bR tcp port 80' #use your IM port here instead of 80 #-bR means reconstruct TCP stream and dump data in raw mode to console (good for ASCII stuff). In [3]:p=Popen(cmd, shell=True, bufsize=0, stdout=PIPE, stderr=PIPE) #start a subprocess w/ NO_WAIT In [4]:p.pid #check the process pid, can use this to issue a 'kill' command later... Out[4]:7100 In [5]:p.poll() In [6]:#Acutally it is None, which means process is not finished In [7]:#Read some lines one by one from output In [8]:p.stdout.readline() #Might block here, if so start a browser and load a page Out[8]:'Starting tcpick 0.2.1 at 2006-XX-XX XX:XX EDT\n' In [9]:# In [10]:#Print some lines from the output, one by one: In [11]:p.stdout.readline() Out[11]:'Timeout for connections is 600\n' #first line, tcpick prompt stuff In [12]:p.stdout.readline() Out[12]:'tcpick: listening on eth0\n' In [13]:p.stdout.readline() Out[13]:'setting filter: "tcp"\n' In [14]:p.stdout.readline() Out[14]:'1 SYN-SENT 192.168.0.106:53498 > 64.233.167.104:www\n' In [15]:p.stdout.readline() Out[15]:'1 SYN-RECEIVED 192.168.0.106:53498 > 64.233.167.104:www\n' In [16]:p.stdout.readline() Out[16]:'1 ESTABLISHED 192.168.0.106:53498 > 64.233.167.104:www\n' In [17]:p.stdout.readline() #the good stuff should start right here Out[17]:'GET /search?hl=en&q=42&btnG=Google+Search HTTP/1.1\r\n' In [18]:p.stdout.readline() Out[18]:'Host: www.google.com\r\n' In [19]:p.stdout.readline() Out[19]:'User-Agent: blah blah...\r\n' In [20]:p.stdout.read() #try a read() -- will block, press Ctrl-C exceptions.KeyboardInterrupt In [21]:p.poll() Out[21]:0 #process is finished, return errcode = 0 In [22]:p.stderr.read() Out[22]:'' #no error messages In [23]:p.stdout.read() Out[23]:'\n257 packets captured\n7 tcp sessions detected\n' In [24]: #those were the last stats before tcpick was terminated. Well anyway, your readline()'s will block on process IO when no data supplied from tcpick. Might have to start a thread in Python to manage the thread that spawns the capture process. But in the end the readlines will get you the raw data from the network (in this case it was just one way from my ip to Google, of course you will need it both ways). 2) The decoding will depend on your protocol, if you have more than one IM protocol then the capture idea from above won't work too well, you will have to capture all the traffic then decode each stream, for each side, for each protocol. 3) Recording or replay is easy. Save to files or dump to a MySQL table indexed by user id, timestamp, IP etc. Because of buffering issues you will probably not get a very accurate real-time monitoring system with this setup. Hope this helps, Nick Vatamaniuc Ed Leafe wrote: > I've been approached by a local business that has been advised that > they need to start capturing and archiving their instant messaging in > order to comply with Sarbanes-Oxley. The company is largely PC, but > has a significant number of Macs running OS X, too. > > Googling around quickly turns up IM Grabber for the PC, which would > seem to be just what they need. But there is no equivalent to be > found for OS X. So if anyone knows of any such product, please let me > know and there will be no need for the rest of this post. > > But assuming that there is no such product, would it be possible to > create something in Python, using the socket or a similar module? > They have a number of servers that provide NAT for each group of > machines; I was thinking that something on those servers could > capture all traffic on port 5190 and write it to disk. Is this > reasonable, or am I being too simplistic in my approach? > > -- Ed Leafe > -- http://leafe.com > -- http://dabodev.com -- http://mail.python.org/mailman/listinfo/python-list