On Sep 6, 1:23 am, 7stud <bbxx789_0...@yahoo.com> wrote: > On Sep 6, 1:14 am, 7stud <bbxx789_0...@yahoo.com> wrote: > > > > > On Sep 5, 5:29 pm, per <perfr...@gmail.com> wrote: > > > > On Sep 5, 7:07 pm, "Rhodri James" <rho...@wildebst.demon.co.uk> wrote: > > > > > On Sat, 05 Sep 2009 23:54:08 +0100, per <perfr...@gmail.com> wrote: > > > > > On Sep 5, 6:42 pm, "Rhodri James" <rho...@wildebst.demon.co.uk> wrote: > > > > >> On Sat, 05 Sep 2009 22:54:41 +0100, per <perfr...@gmail.com> wrote: > > > > >> > I'm trying to efficiently "split" strings based on what substrings > > > > >> > they are made up of. > > > > >> > i have a set of strings that are comprised of known substrings. > > > > >> > For example, a, b, and c are substrings that are not identical to > > > > >> > each > > > > >> > other, e.g.: > > > > >> > a = "0" * 5 > > > > >> > b = "1" * 5 > > > > >> > c = "2" * 5 > > > > > >> > Then my_string might be: > > > > > >> > my_string = a + b + c > > > > > >> > i am looking for an efficient way to solve the following problem. > > > > >> > suppose i have a short > > > > >> > string x that is a substring of my_string. I want to "split" the > > > > >> > string x into blocks based on > > > > >> > what substrings (i.e. a, b, or c) chunks of s fall into. > > > > > >> > to illustrate this, suppose x = "00111". Then I can detect where x > > > > >> > starts in my_string > > > > >> > using my_string.find(x). But I don't know how to partition x into > > > > >> > blocks depending > > > > >> > on the substrings. What I want to get out in this case is: "00", > > > > >> > "111". If x were "001111122", > > > > >> > I'd want to get out "00","11111", "22". > > > > > >> > is there an easy way to do this? i can't simply split x on a, b, > > > > >> > or c > > > > >> > because these might > > > > >> > not be contained in x. I want to avoid doing something inefficient > > > > >> > like looking at all substrings > > > > >> > of my_string etc. > > > > > >> > i wouldn't mind using regular expressions for this but i cannot > > > > >> > think > > > > >> > of an easy regular > > > > >> > expression for this problem. I looked at the string module in the > > > > >> > library but did not see > > > > >> > anything that seemd related but i might have missed it. > > > > > >> I'm not sure I understand your question exactly. You seem to imply > > > > >> that the order of the substrings of x is consistent. If that's the > > > > >> case, this ought to help: > > > > > >> >>> import re > > > > >> >>> x = "001111122" > > > > >> >>> m = re.match(r"(0*)(1*)(2*)", x) > > > > >> >>> m.groups() > > > > > >> ('00', '11111', '22')>>> y = "00111" > > > > >> >>> m = re.match(r"(0*)(1*)(2*)", y) > > > > >> >>> m.groups() > > > > > >> ('00', '111', '') > > > > > >> You'll have to filter out the empty groups for yourself, but that's > > > > >> no great problem. > > > > > > The order of the substrings is consistent but what if it's not 0, 1, 2 > > > > > but a more complicated string? e.g. > > > > > > a = 1030405, b = 1babcf, c = fUUIUP > > > > > > then the substring x might be 4051ba, in which case using a regexp > > > > > with (1*) will not work since both a and b substrings begin with the > > > > > character 1. > > > > > Right. This looks approximately nothing like what I thought your > > > > problem was. Would I be right in thinking that you want to match > > > > substrings of your potential "substrings" against the string x? > > > > > I'm sufficiently confused that I think I'd like to see what your > > > > use case actually is before I make more of a fool of myself. > > > > > -- > > > > Rhodri James *-* Wildebeest Herder to the Masses > > > > it's exactly the same problem, except there are no constraints on the > > > strings. so the problem is, like you say, matching the substrings > > > against the string x. in other words, finding out where x "aligns" to > > > the ordered substrings abc, and then determine what chunk of x belongs > > > to a, what chunk belongs to b, and what chunk belongs to c. > > > > so in the example i gave above, the substrings are: a = 1030405, b = > > > 1babcf, c = fUUIUP, so abc = 10304051babcffUUIUP > > > > given a substring like 4051ba, i'd want to split it into the chunks a, > > > b, and c. in this case, i'd want the result to be: ["405", "1ba"] -- > > > i.e. "405" is the chunk of x that belongs to a, and "1ba" the chunk > > > that belongs to be. in this case, there are no chunks of c. if x > > > instead were "4051babcffUU", the right output is: ["405", "1babcf", > > > "fUU"], which are the corresponding chunks of a, b, and c that make up > > > x respectively. > > > > i'm not sure how to approach this. any ideas/tips would be greatly > > > appreciated. thanks again. > > > a = "1030405" > > b = "1babcf" > > c = "fUUIUP" > > abc = "10304051babcffUUIUP" > > data = "4051babcffU" > > > data_start = abc.find(data) > > b_start = abc.find(b) - data_start > > c_start = abc.find(c) - data_start > > > print data[:b_start] > > print data[b_start:c_start] > > print data[c_start:] > > > --output:-- > > 405 > > 1babcf > > fU > > ...or maybe this is easier to follow: > > a = "1030405" > b = "1babcf" > c = "fUUIUP" > abc = "10304051babcffUUIUP" > data = "4051babcffU" > > data_start = abc.find(data) > new_abc = abc[data_start:] > print new_abc > print data > print "-" * 10 > > --output:-- > 4051babcffUUIUP > 4051babcffU > ---------- > > b_start = new_abc.find(b) > c_start = new_abc.find(c) > > print data[:b_start] > print data[b_start:c_start] > print data[c_start:] > > --output:-- > 405 > 1babcf > fU
Nope. My solutions have problems with: data = "cffU" To handle that data, it gets messier: a = "1030405" b = "1babcf" c = "fUUIUP" abc = "10304051babcffUUIUP" data = "cffU" data_start = abc.find(data) new_abc = abc[data_start:] print new_abc print data print "-" * 10 b_start = new_abc.find(b) if b_start == -1: b_start = 0 c_start = new_abc.find(c) if c_start == -1: c_start = 0 print data[:b_start] print data[b_start:c_start] print data[c_start:] If data is not a substring of abc, then that last line will select the whole data string. -- http://mail.python.org/mailman/listinfo/python-list