Aah! I understand now. Thank you Regards, Krishna Mohan
On Mon, Jan 20, 2014 at 4:48 PM, Ben Finney <ben+pyt...@benfinney.id.au>wrote: > km <srikrishnamo...@gmail.com> writes: > > > I am trying to find sub sequence patterns but constrained by the order > > in which they occur > > There are also specific resources for understanding and testing regex > patterns, such as <URL:http://www.pythonregex.com/>. > > > For example > > > > >>> p = re.compile('(CAA)+?(TCT)+?(TA)+?') > > >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') > > [('CAA', 'TCT', 'TA')] > > > > But I instead find only one instance of the CAA/TCT/TA in that order. > > Yes, because the grouping operator (the parens ‘()’) in each case > contains exactly “CAA”, “TCT”, “TA”. If you want the repetitions to be > part of the group, you need the repetition operator (in your case, ‘+’) > to be part of the group. > > > How can I get 3 matches of CAA, followed by four matches of TCT followed > > by 2 matches of TA ? > > With a little experimenting I get: > > >>> p = re.compile('((?:CAA)+)?((?:TCT)+)?((?:TA)+)?') > >>> p.findall('CAACAACAATCTTCTTCTTCTTATATA') > [('CAACAACAA', 'TCTTCTTCTTCT', 'TATATA'), ('', '', '')] > > Remember that you'll get no more than one group returned for each group > you specify in the pattern. > > > Well these patterns (CAA/TCT/TA) can occur any number of times and > > atleast once so I have to use + in the regex. > > Be aware that regex is not the solution to all parsing problems; for > many parsing problems it is an attractive but inappropriate tool. You > may need to construct a more specific parser for your needs. Even if > it's possible with regex, the resulting pattern may be so complex that > it's better to write it out more explicitly. > > -- > \ “To punish me for my contempt of authority, Fate has made me an | > `\ authority myself.” —Albert Einstein, 1930-09-18 | > _o__) | > Ben Finney > > -- > https://mail.python.org/mailman/listinfo/python-list >
-- https://mail.python.org/mailman/listinfo/python-list