James Stroud a écrit : > [EMAIL PROTECTED] wrote: > > Hello > > > > I am looking for python code that takes as input a list of strings > > (most similar, > > but not necessarily, and rather short: say not longer than 50 chars) > > and that computes and outputs the python regular expression that > > matches > > these string values (not necessarily strictly, perhaps the code is able > > to determine > > patterns, i.e. families of strings...). > > > > Thanks for any idea > > > > I'm not sure your application, but Genomicists and Proteomicists have > found that Hidden Markov Models can be very powerful for developing > pattern models. Perhaps have a look at "Biological Sequence Analysis" by > Durbin et al. > > Also, a very cool regex based algorithm was developed at IBM: > > http://cbcsrv.watson.ibm.com/Tspd.html
Indeed, this seems cool! Thanks for the suggestion I have tried their online Text-symbol Pattern Discovery with these input values: cpkg-30000 cpkg-31008 cpkg-3000A cpkg-30006 nsug-300AB nsug-300A2 cpdg-30001 nsug-300A3 > > But I think HMMs are the way to go. Check out HMMER at WUSTL by Sean > Eddy and colleagues: > > http://hmmer.janelia.org/ > > http://selab.janelia.org/people/eddys/ I will look at that more precisely, but at my first look it seems this is more specialized and less accessible for the common mortal... > > James Thanks. This may help me. In addition I continue to look for other ideas, notably because I want code that I can change myself, and exclusively python code > > -- > James Stroud > UCLA-DOE Institute for Genomics and Proteomics > Box 951570 > Los Angeles, CA 90095 > > http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list