Sequence and/or pattern matching
Hi everyone, I'm relatively new to python and I want to write a piece of code who do the following work for data mining purpose : 1) I have a list of connexion between some computers. This list has this format : Ip A Date Ip B ...... ... 192.168.0.119.10.2005 192.168.0.2 192.168.0.319.10.2005 192.168.0.1 192.168.0.419.10.2005 192.168.0.6 ...... ... 2) I want to find if there are unknown sequences of connexions in my data and if these sequences are repeated along the file : For example : Computer A connects to Computer B then Computer B connects to Computer C then Computer C connects to Computer A 3) Then, the software gives the sequences it has found and how many times they appear... I hope this is clear, point 2) is where I have my main problem. Has someone an idea where to start and if there's already something coded ? Thanks Séb -- http://mail.python.org/mailman/listinfo/python-list
Re: Sequence and/or pattern matching
> Essentially, if I understand correctly, you want to detect LOOPS given a > sequence of directed connections A->B. "loop detection" and "graph" > would then be the keywords to search for, in this case. Exactly, but the sequence has to be discovered by the piece of code ! > Does this "then" imply you're only interested in loops occurring in this > *sequence*, i.e., is order of connections important? If the sequence of > directed connections was, say, in the different order: > > B->C > A->B > C->A > > would you want this detected as a loop, or not? Yes, it would be nice to detect it as a loop, with for example a threshold. Btw, it would be nice to ignore additional connections in such a way : B->C # Normal connection D->E # Additional connection to ignore A->B # Normal connection C->A # Normal connection Would it be possible ? Thank you very much -- http://mail.python.org/mailman/listinfo/python-list
Re: Sequence and/or pattern matching
Hi everybody, Thanks for the time taken to answer my question. Unfortunatly, it seems that there's a little confusion about what I want to do. In fact, I don't want to search for a particular path between computers. What I really want is to detect sequences of connection that are repeated along the log. Is it clearer, if not, I will put another exmample ;-) Thank you ! Ps Python community is very nice, I'm glad I learn this language ! -- http://mail.python.org/mailman/listinfo/python-list
Re: Sequence and/or pattern matching
Sorry for the confusion, I think my example was unclear. Thank you Mike for this piece of code who solves a part of my problem. In fact, the sequences are unknown at the beginning, so the first part of the code has to find possible sequences and if those sequences are repeated, counts how many time they appear (as your code does). I have found this morning that there's a software produced by i2 software who does this kind of job, but for telephone call analysis. Maybe the description could help to better understand my goal : http://www.i2.co.uk/products/Pattern_Tracer/default.asp Séb -- http://mail.python.org/mailman/listinfo/python-list