On Dec 27, 10:01 am, Fredrik Tolf <fred...@dolda2000.com> wrote: > On Mon, 26 Dec 2011, mauricel...@acm.org wrote: > > I've tried > > > re.sub('@\S\s[1-9]:[A-N]:[0-9]', '@\S\s', '@HWI-ST115:568:B08LLABXX: > > 1:1105:6465:151103 1:N:0:') > > > but it does not seems to work. > > Indeed, for several reasons. First of all, your backslash sequences are > interpreted by Python as string escapes. You'll need to write either "\\S" > or r"\S" (the r, for raw, turns off backslash escapes). > > Second, when you use only "\S", that matches a single non-space character, > not several; you'll need to quantify them. "\S*" will match zero or more, > "\S+" will match one or more, "\S?" will match zero or one, and there are > a couple of other possibilities as well (see the manual for details). In > this case, you probably want to use "+" for most of those. > > Third, you're not marking the groups that you want to use in the > replacement. Since you want to retain the entire string before the space, > and the numeric element, you'll want to enclose them in parentheses to > mark them as groups. > > Fourth, your replacement string is entirely wacky. You don't use sequences > such as "\S" and "\s" to refer back to groups in the original text, but > numbered references, to refer back to parenthesized groups in the order > they appear in the regex. In accordance what you seemed to want, you > should probably use "@\1/\2" in your case ("\1" refers back to the first > parentesized group, which you be the first "\S+" part, and "\2" to the > second group, which should be the "[1-9]+" part; the at-mark and slash > are inserted as they are into the result string). > > Fifth, you'll probably want to match the last colon as well, in order not > to retain it into the result string. > > All in all, you will probably want to use something like this to correct > that regex: > > re.sub(r'@(\S+)\s([1-9]+):[A-N]+:[0-9]+:', r'@\1/\2', > '@HWI-ST115:568:B08LLABXX:1:1105:6465:151103 1:N:0:') > > Also, you may be interested to know that you can use "\d" instead of > "[0-9]". > > -- > > Fredrik Tolf
For practical 'get-the-hands-dirty' experience look at python-specific: http://kodos.sourceforge.net/ Online: http://gskinner.com/RegExr/ emacs-specific: re-builder and regex-tool http://bc.tech.coop/blog/071103.html -- http://mail.python.org/mailman/listinfo/python-list