RE: [Tutor] How to substitute an element of a list as a pattern for re.compile()

Rich Krauter Wed, 29 Dec 2004 22:06:14 -0800

kumar s wrote:

I have Question: How can I substitute an object as a pattern in making a pattern.
x = 30
pattern = re.compile(x)

Kumar,

You can use string interpolation to insert x into a string, which can then be compiled into a pattern:

x = 30
pat = re.compile('%s'%x)

I really doubt regular expressions will speed up your current searching algorithm. You probably need to reconsider the data structures you are using to represent your data.

I have a list of numbers that I have to match in
another list and write them to a new file:

List 1: range_cors

range_cors[1:5]

['161:378', '334:3', '334:4', '65:436']

List 2: seq

seq[0:2]

['>probe:HG-U133A_2:1007_s_at:416:177;
Interrogation_Position=3330; Antisense;',
'CACCCAGCTGGTCCTGTGGATGGGA']

Can you re-process your second list? One option might be to store that list instead as a dict, where the keys are what you want to search by (maybe a string like '12:34' or a tuple like (12,34)).

Maybe something like the following:

>>> range_cors = ['12:34','34:56']
>>> seq = {'12:34': ['some 12:34 data'],
...        '34:56': ['some 34:56'data','more 34:56 data']}
>>> for item in range_cors:
...     print seq[item]
...     
['some 12:34 data']
['some 34:56 data','more 34:56 data']

Why is this better?

If you have m lines of data and n patterns to search for, then using either of your methods you perform n searches per line, totalling approx. m*n operations. You have to complete approx. m*n operations whether you use the string searching version, or re searching version.

If you pre-process the data so that it can be stored in and retrieved from a dict, pre-processing to get your data into that dict costs you roughly m operations, but your n pattern lookups into that dict cost you only n operations, so you only have to complete approx. m+n operations.

A slow method:

sequences = []
for elem1 in range_cors:

        for index,elem2 in enumerate(seq):
                if elem1 in elem2:
                        sequences.append(elem2)
                        sequences.append(seq[index+1])

A faster method (probably):

for i in range(len(range_cors)):

        for index,m in enumerate(seq):
                pat = re.compile(i)
                if re.search(pat,seq[m]):
                        p.append(seq[m])
                        p.append(seq[index+1])

I am getting errors, because I am trying to create an element as a pattern in re.compile().

pat = re.compile('%s'%i) would probably get rid of the error message, but that's probably still not what you want.

Questions:
1. Is it possible to do this. If so, how can I do this.

You can try, but I doubt regular expressions will help; that approach will probably be even slower.

Can any one help correcting my piece of code and suggesting where I went wrong.

I would scrap what you have and try using a better data structure. I don't know enough about your data to make more specific processing recommendations; but you can probably avoid those nested loops with some careful data pre-processing.

You'll likely get better suggestions if you post a more representative sample of your data, and explain exactly what you want as output.

Good luck.

Rich

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

RE: [Tutor] How to substitute an element of a list as a pattern for re.compile()

Reply via email to