Séb wrote: > 1) I have a list of connexion between some computers. This list has > this format :
It looks like you want graph theory. > Ip A Date Ip B > ... ... ... > 192.168.0.1 19.10.2005 192.168.0.2 > 192.168.0.3 19.10.2005 192.168.0.1 > 192.168.0.4 19.10.2005 192.168.0.6 That looks like a list of edges between graph nodes, you see. Each node corresponds to an address and each edge is a connection between two nodes - ip addresses, in your case. > 2) I want to find if there are unknown sequences of connexions in my > data and if these sequences are repeated along the file : > > For example : > > Computer A connects to Computer B then > Computer B connects to Computer C then > Computer C connects to Computer A That looks like finding a path between Node A and Node C. This is a common application of graph theory, and especially when finding routes (eg. for train journeys, or for AI characters in computer games). > 3) Then, the software gives the sequences it has found and how many > times they appear... You can find all the routes between 1 node and others by using depth-first search (or indeed, any other simple graph search algorithm). Basically, this says that, for any node, examine all the nodes it leads to. Then examine those nodes... repeat until you run out of nodes or find where you're looking for. The only complication is remembering the route you took. > I hope this is clear, point 2) is where I have my main problem. Has > someone an idea where to start and if there's already something coded ? I found a couple of Python graph libraries on Google but I can't vouch for their quality. However, writing your own simple graph traversal algorithm should be quite trivial. I would start by populating a dictionary where the keys are instances of address A and the values are lists of address B values for address A. Add each link again with address B as the key and address A as the value if you need to represent bidirectional connections. Then you can perform search on that as required. The only slight complication is avoiding infinite loops - I might use a dictionary of address->boolean values here and check off each address as the algorithm progresses. -- Ben Sizer -- http://mail.python.org/mailman/listinfo/python-list