Library similar to UserAgent (perl)
Hi everybody, I'm looking for an easy way to put data in a form, then click the button and follow the redirect. Also I need to use cookies. I read that using perl this can be done using the UserAgent lib, that also provide th browser functionality to let the site believe that you are getting the pages from a normali browser like ff. Any suggestion? I think that the main concen is to easilly put the data in the numerous textlabel provided and then magically simulate the 'click me' button. -- http://mail.python.org/mailman/listinfo/python-list
Re: Library similar to UserAgent (perl)
Il Thu, 12 Feb 2009 13:47:09 +0100, Diez B. Roggisch ha scritto: > mattia wrote: > >> Hi everybody, I'm looking for an easy way to put data in a form, then >> click the button and follow the redirect. Also I need to use cookies. I >> read that using perl this can be done using the UserAgent lib, that >> also provide th browser functionality to let the site believe that you >> are getting the pages from a normali browser like ff. Any suggestion? I >> think that the main concen is to easilly put the data in the numerous >> textlabel provided and then magically simulate the 'click me' button. > > Python mechanize. > > Diez Thanks a lot. Mattia -- http://mail.python.org/mailman/listinfo/python-list
Roulette wheel
Hi everyone, I'm new to python and I want to create some simple code in order to code the classical genetic algorithm example: given a population of chromosomes, encoded using 1 and 0, find the chromosome with the maximum number of 1s. Now, despite all the code used to implement the solution, I'm wondering if there is a better way to use the so-called roulette wheel selection in this problem. Here I paste the code of my solution, any advice will be helpful: from random import randint, random def create_chromosome(min, max, length): chromosome = [] for i in range(length): chromosome.append(randint(min, max)) return chromosome def fitness(chrm, ffunc=sum): return ffunc(chrm) def create_population(nelem, min, max, length): return [create_chromosome(min, max, length) for i in range(nelem)] def get_fitness_and_population(population): return [(fitness(x), x) for x in population] def get_roulette_wheel(population): roulette_wheel = [] index = 0 for x in get_fitness_and_population(population): for j in range(x[0]): roulette_wheel.append(index) index += 1 return roulette_wheel pop = create_population(5, 0, 1, 10) rw = get_roulette_wheel(pop) print(rw) print(len(rw)) ri = randint(0, len(rw) - 1) print("Random index:", rw[ri], ", value:", pop[rw[ri]]) -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Wed, 04 Mar 2009 21:30:54 +0100, Peter Otten ha scritto: > mattia wrote: > >> Hi everyone, I'm new to python and I want to create some simple code in >> order to code the classical genetic algorithm example: given a >> population of chromosomes, encoded using 1 and 0, find the chromosome >> with the maximum number of 1s. Now, despite all the code used to >> implement the solution, I'm wondering if there is a better way to use >> the so-called roulette wheel selection in this problem. Here I paste >> the code of my solution, any advice will be helpful: > > Your code looks good to me. > >> from random import randint, random >> >> def create_chromosome(min, max, length): >> chromosome = [] >> for i in range(length): >> chromosome.append(randint(min, max)) >> return chromosome >> >> def fitness(chrm, ffunc=sum): >> return ffunc(chrm) > > fitness = sum > > has the same effect, without the extra indirection. > >> def create_population(nelem, min, max, length): >> return [create_chromosome(min, max, length) for i in range(nelem)] >> >> def get_fitness_and_population(population): >> return [(fitness(x), x) for x in population] >> >> def get_roulette_wheel(population): >> roulette_wheel = [] >> index = 0 >> >> for x in get_fitness_and_population(population): >> for j in range(x[0]): >> roulette_wheel.append(index) >> index += 1 > > Make that > > for index, x in enumerate(get_fitness_and_population(population)): > ... > > I'd also pass the the fitness function explicitly around instead of > making it a global. > >> return roulette_wheel >> >> pop = create_population(5, 0, 1, 10) >> rw = get_roulette_wheel(pop) >> print(rw) >> print(len(rw)) >> ri = randint(0, len(rw) - 1) >> print("Random index:", rw[ri], ", value:", pop[rw[ri]]) > > But these are minor nits :) > > Here's a slightly different approach: > > from random import randint, choice > > def create_chromosome(min, max, length): > return [randint(min, max) for i in range(length)] > > def create_population(nelem, min, max, length): > return [create_chromosome(min, max, length) for i in range(nelem)] > > def get_fitness_and_population(population, fitness): > return [(fitness(x), x) for x in population] > > def get_roulette_wheel(weight_value_pairs): > roulette_wheel = [] > for weight, value in weight_value_pairs: > roulette_wheel += [value]*weight > return roulette_wheel > > if __name__ == "__main__": > pop = create_population(5, 0, 1, 10) > fap = get_fitness_and_population(pop, sum) rw = > get_roulette_wheel(fap) > print("Random value:", choice(rw)) > > Note how get_roulette_wheel() is now completeley independent of the > concrete problem you are using it for. > > Peter Well, thank you very much, I'm new to python and I'm everyday experiencing how in just one line you can write very powerful statements. Loved the line: roulette_wheel += [value]*weight, another useful thing learned today. -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
> Note how get_roulette_wheel() is now completeley independent of the > concrete problem you are using it for. Ok, but also a lot more memory consuming ;-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Thu, 05 Mar 2009 10:46:58 +0100, Peter Otten ha scritto: > mattia wrote: > >>> Note how get_roulette_wheel() is now completeley independent of the >>> concrete problem you are using it for. >> >> Ok, but also a lot more memory consuming ;-) > > I don't think so. Python references objects; therefore the list > > [tiny_little_thing]*N > > does not consume more memory than > > [big_fat_beast]*N > > Or did you have something else in mind? > > Peter Ok, understood. So if I have e.g. [[200 elements]]*N, then I'll have N pointers to the same location of my seq, right? -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Thu, 05 Mar 2009 12:54:39 +0100, Peter Otten ha scritto: > mattia wrote: > >> Il Thu, 05 Mar 2009 10:46:58 +0100, Peter Otten ha scritto: >> >>> mattia wrote: >>> >>>>> Note how get_roulette_wheel() is now completeley independent of the >>>>> concrete problem you are using it for. >>>> >>>> Ok, but also a lot more memory consuming ;-) >>> >>> I don't think so. Python references objects; therefore the list >>> >>> [tiny_little_thing]*N >>> >>> does not consume more memory than > > Oops, should have been less. > > >>> [big_fat_beast]*N >>> >>> Or did you have something else in mind? >>> >>> Peter >> >> Ok, understood. So if I have e.g. [[200 elements]]*N, then I'll have N >> pointers to the same location of my seq, right? > > Right. You can verify this with > >>>> v = [0] >>>> items = [v]*5 >>>> v[0] = 42 >>>> items > [[42], [42], [42], [42], [42]] > > which often surprises newbies. > > Peter Great explanation, thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Wed, 04 Mar 2009 21:30:54 +0100, Peter Otten ha scritto: > mattia wrote: > >> Hi everyone, I'm new to python and I want to create some simple code in >> order to code the classical genetic algorithm example: given a >> population of chromosomes, encoded using 1 and 0, find the chromosome >> with the maximum number of 1s. Now, despite all the code used to >> implement the solution, I'm wondering if there is a better way to use >> the so-called roulette wheel selection in this problem. Here I paste >> the code of my solution, any advice will be helpful: > > Your code looks good to me. > >> from random import randint, random >> >> def create_chromosome(min, max, length): >> chromosome = [] >> for i in range(length): >> chromosome.append(randint(min, max)) >> return chromosome >> >> def fitness(chrm, ffunc=sum): >> return ffunc(chrm) > > fitness = sum > > has the same effect, without the extra indirection. > >> def create_population(nelem, min, max, length): >> return [create_chromosome(min, max, length) for i in range(nelem)] >> >> def get_fitness_and_population(population): >> return [(fitness(x), x) for x in population] >> >> def get_roulette_wheel(population): >> roulette_wheel = [] >> index = 0 >> >> for x in get_fitness_and_population(population): >> for j in range(x[0]): >> roulette_wheel.append(index) >> index += 1 > > Make that > > for index, x in enumerate(get_fitness_and_population(population)): > ... > > I'd also pass the the fitness function explicitly around instead of > making it a global. > >> return roulette_wheel >> >> pop = create_population(5, 0, 1, 10) >> rw = get_roulette_wheel(pop) >> print(rw) >> print(len(rw)) >> ri = randint(0, len(rw) - 1) >> print("Random index:", rw[ri], ", value:", pop[rw[ri]]) > > But these are minor nits :) > > Here's a slightly different approach: > > from random import randint, choice > > def create_chromosome(min, max, length): > return [randint(min, max) for i in range(length)] > > def create_population(nelem, min, max, length): > return [create_chromosome(min, max, length) for i in range(nelem)] > > def get_fitness_and_population(population, fitness): > return [(fitness(x), x) for x in population] > > def get_roulette_wheel(weight_value_pairs): > roulette_wheel = [] > for weight, value in weight_value_pairs: > roulette_wheel += [value]*weight > return roulette_wheel > > if __name__ == "__main__": > pop = create_population(5, 0, 1, 10) > fap = get_fitness_and_population(pop, sum) rw = > get_roulette_wheel(fap) > print("Random value:", choice(rw)) > > Note how get_roulette_wheel() is now completeley independent of the > concrete problem you are using it for. > > Peter The last question: how can I improve readability in this piece of code? def crossover(pop, prob=0.6): """ With a crossover probability cross over the parents to form new offspring. If no crossover was performed, offspring is the exact copy of parents. """ cpop = [] for i in range(0, len(pop), 2): # crossover if prob > random(): crossover_point = randint(0, len(pop[i])-1) nchromosome1 = pop[i][:crossover_point] + pop[i+1][crossover_point:] nchromosome2 = pop[i+1][:crossover_point] + pop[i][crossover_point:] else: nchromosome1 = pop[i][:] nchromosome2 = pop[i+1][:] cpop += [nchromosome1] + [nchromosome2] return cpop And with this one my example is complete! -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Thu, 05 Mar 2009 15:48:01 +, mattia ha scritto: > Il Wed, 04 Mar 2009 21:30:54 +0100, Peter Otten ha scritto: > >> mattia wrote: >> >>> Hi everyone, I'm new to python and I want to create some simple code >>> in order to code the classical genetic algorithm example: given a >>> population of chromosomes, encoded using 1 and 0, find the chromosome >>> with the maximum number of 1s. Now, despite all the code used to >>> implement the solution, I'm wondering if there is a better way to use >>> the so-called roulette wheel selection in this problem. Here I paste >>> the code of my solution, any advice will be helpful: >> >> Your code looks good to me. >> >>> from random import randint, random >>> >>> def create_chromosome(min, max, length): >>> chromosome = [] >>> for i in range(length): >>> chromosome.append(randint(min, max)) >>> return chromosome >>> >>> def fitness(chrm, ffunc=sum): >>> return ffunc(chrm) >> >> fitness = sum >> >> has the same effect, without the extra indirection. >> >>> def create_population(nelem, min, max, length): >>> return [create_chromosome(min, max, length) for i in range(nelem)] >>> >>> def get_fitness_and_population(population): >>> return [(fitness(x), x) for x in population] >>> >>> def get_roulette_wheel(population): >>> roulette_wheel = [] >>> index = 0 >>> >>> for x in get_fitness_and_population(population): >>> for j in range(x[0]): >>> roulette_wheel.append(index) >>> index += 1 >> >> Make that >> >> for index, x in >> enumerate(get_fitness_and_population(population)): >> ... >> >> I'd also pass the the fitness function explicitly around instead of >> making it a global. >> >>> return roulette_wheel >>> >>> pop = create_population(5, 0, 1, 10) >>> rw = get_roulette_wheel(pop) >>> print(rw) >>> print(len(rw)) >>> ri = randint(0, len(rw) - 1) >>> print("Random index:", rw[ri], ", value:", pop[rw[ri]]) >> >> But these are minor nits :) >> >> Here's a slightly different approach: >> >> from random import randint, choice >> >> def create_chromosome(min, max, length): >> return [randint(min, max) for i in range(length)] >> >> def create_population(nelem, min, max, length): >> return [create_chromosome(min, max, length) for i in range(nelem)] >> >> def get_fitness_and_population(population, fitness): >> return [(fitness(x), x) for x in population] >> >> def get_roulette_wheel(weight_value_pairs): >> roulette_wheel = [] >> for weight, value in weight_value_pairs: >> roulette_wheel += [value]*weight >> return roulette_wheel >> >> if __name__ == "__main__": >> pop = create_population(5, 0, 1, 10) >> fap = get_fitness_and_population(pop, sum) rw = >> get_roulette_wheel(fap) >> print("Random value:", choice(rw)) >> >> Note how get_roulette_wheel() is now completeley independent of the >> concrete problem you are using it for. >> >> Peter > > The last question: how can I improve readability in this piece of code? > > def crossover(pop, prob=0.6): > """ > With a crossover probability cross over the parents to form new > offspring. If no crossover was performed, offspring is the exact > copy of parents. """ > cpop = [] > for i in range(0, len(pop), 2): > # crossover > if prob > random(): > crossover_point = randint(0, len(pop[i])-1) nchromosome1 = > pop[i][:crossover_point] + pop[i+1][crossover_point:] > nchromosome2 = pop[i+1][:crossover_point] + > pop[i][crossover_point:] > else: > nchromosome1 = pop[i][:] > nchromosome2 = pop[i+1][:] > cpop += [nchromosome1] + [nchromosome2] > return cpop > > And with this one my example is complete! Wow, lots of things to learn here (remember that I'm new to python). You both used zip and yield, I'll go check it out in the python documentation ;-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
> for a, b in zip(*[iter(pop)]*2): In the python documentation there is a similar example, well, the obscure thing here is the usage of *[iter(pop)]! Then I believe that I can safely say that you iterate over the values 0 and 1, 2 and 3 etc. because the zip automatically updates the index, in this case using the same sequence the index counter is shared and the second sequence start from that shared counter. Am I right? -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Thu, 05 Mar 2009 18:07:29 +0100, Peter Otten ha scritto: > mattia wrote: > >> The last question: how can I improve readability in this piece of code? >> >> def crossover(pop, prob=0.6): >> """ >> With a crossover probability cross over the parents to form new >> offspring. If no crossover was performed, offspring is the exact >> copy of parents. """ >> cpop = [] >> for i in range(0, len(pop), 2): >> # crossover >> if prob > random(): >> crossover_point = randint(0, len(pop[i])-1) nchromosome1 = >> pop[i][:crossover_point] + pop[i+1][crossover_point:] >> nchromosome2 = pop[i+1][:crossover_point] + >> pop[i][crossover_point:] >> else: >> nchromosome1 = pop[i][:] >> nchromosome2 = pop[i+1][:] > >> cpop += [nchromosome1] + [nchromosome2] > > I'd write that as > > cpop.append(nchromosome1) > cpop.append(nchromosome2) > > thus avoiding the intermediate lists. > >> return cpop >> >> And with this one my example is complete! > > Just for fun here's an alternative version of your function > > def generate_crossover(pop, prob): > for a, b in zip(*[iter(pop)]*2): > if prob > random(): > cut = randrange(len(a)) > a, b = a[:cut] + b[cut:], b[:cut] + a[cut:] > yield a > yield b > > def crossover(pop, prob=0.6): > return list(generate_crossover(pop, prob)) > > but as the original is a bit clearer I recommend that you stick with it. > > Peter OK, yield gives me a generator function, so I can use for x in my_func(), good. Last readability snip: def selection(fitness, population): """ Select the parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected) """ selected_population = [] fap = get_fitness_and_population(fitness, population) pop_len = len(population) # elitism (it prevents a loss of the best found solution) elite_population = sorted(fap) selected_population += [elite_population[pop_len-1][1]] + [elite_population[pop_len-2][1]] def get_fitness_and_population(fitness, population): return [(fitness(x), x) for x in population] -- http://mail.python.org/mailman/listinfo/python-list
Is there a better way of doing this?
Hi, I'm new to python, and as the title says, can I improve this snippet (readability, speed, tricks): def get_fitness_and_population(fitness, population): return [(fitness(x), x) for x in population] def selection(fitness, population): ''' Select the parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected) ''' selected_population = [] fap = get_fitness_and_population(fitness, population) pop_len = len(population) # elitism (it prevents a loss of the best found solution) # take the only 2 best solutions elite_population = sorted(fap) selected_population += [elite_population[pop_len-1][1]] + [elite_population[pop_len-2][1]] # go on with the rest of the elements for i in range(pop_len-2): # do something -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a better way of doing this?
Il Fri, 06 Mar 2009 10:19:22 +, mattia ha scritto: > Hi, I'm new to python, and as the title says, can I improve this snippet > (readability, speed, tricks): > > def get_fitness_and_population(fitness, population): > return [(fitness(x), x) for x in population] > > def selection(fitness, population): > ''' > Select the parent chromosomes from a population according to their > fitness (the better fitness, the bigger chance to be selected) ''' > selected_population = [] > fap = get_fitness_and_population(fitness, population) pop_len = > len(population) > # elitism (it prevents a loss of the best found solution) # take the > only 2 best solutions > elite_population = sorted(fap) > selected_population += [elite_population[pop_len-1][1]] + > [elite_population[pop_len-2][1]] > # go on with the rest of the elements for i in range(pop_len-2): > # do something Great, the for statement has not to deal with fap anymore, but with another sequence, like this: def get_roulette_wheel(weight_value_pairs): roulette_wheel = [] for weight, value in weight_value_pairs: roulette_wheel += [value]*weight return roulette_wheel def selection(fitness, population): ... rw = get_roulette_wheel(fap) for i in range(pop_len-2): selected_population += [choice(rw)] return selected_population I think that using [choice(rw)]*len(fap) will produce the same sequence repeted len(fap) times... -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a better way of doing this?
Il Fri, 06 Mar 2009 03:43:22 -0800, Chris Rebert ha scritto: > On Fri, Mar 6, 2009 at 3:07 AM, mattia wrote: >> Great, the for statement has not to deal with fap anymore, but with >> another sequence, like this: >> >> def get_roulette_wheel(weight_value_pairs): >> roulette_wheel = [] >> for weight, value in weight_value_pairs: >> roulette_wheel += [value]*weight >> return roulette_wheel >> >> def selection(fitness, population): >> ... >> rw = get_roulette_wheel(fap) >> for i in range(pop_len-2): >> selected_population += [choice(rw)] >> return selected_population >> >> I think that using [choice(rw)]*len(fap) will produce the same sequence >> repeted len(fap) times... > > Revision to this new code: > > def get_roulette_wheel(weight_value_pairs): >return [[value]*weight for weight, value in weight_value_pairs] > > def selection(fitness, population): >... >rw = get_roulette_wheel(fap) >for i in range(len(fap)): >selected_population.append(choice(rw)) >return selected_population > > Cheers, > Chris Great, append is equivalent to += right? or more efficient? -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a better way of doing this?
Il Fri, 06 Mar 2009 14:06:14 +0100, Peter Otten ha scritto: > mattia wrote: > >> Hi, I'm new to python, and as the title says, can I improve this >> snippet (readability, speed, tricks): >> >> def get_fitness_and_population(fitness, population): >> return [(fitness(x), x) for x in population] >> >> def selection(fitness, population): >> ''' >> Select the parent chromosomes from a population according to their >> fitness (the better fitness, the bigger chance to be selected) ''' >> selected_population = [] >> fap = get_fitness_and_population(fitness, population) pop_len = >> len(population) >> # elitism (it prevents a loss of the best found solution) # take >> the only 2 best solutions >> elite_population = sorted(fap) >> selected_population += [elite_population[pop_len-1][1]] + >> [elite_population[pop_len-2][1]] >> # go on with the rest of the elements for i in range(pop_len-2): >> # do something > > def selection1(fitness, population, N=2): > rest = sorted(population, key=fitness, reverse=True) best = rest[:N] > del rest[:N] > # work with best and rest > > > def selection2(fitness, population, N=2): > decorated = [(-fitness(p), p) for p in population] > heapq.heapify(decorated) > > best = [heapq.heappop(decorated)[1] for _ in range(N)] rest = [p for > f, p in decorated] > # work with best and rest > > Both implementations assume that you are no longer interested in the > individuals' fitness once you have partitioned the population in two > groups. > > In theory the second is more efficient for "small" N and "large" > populations. > > Peter Ok, but the fact is that I save the best individuals of the current population, than I'll have to choose the others elements of the new population (than will be N-2) in a random way. The common way is using a roulette wheel selection (based on the fitness of the individuals, if the total fitness is 200, and one individual has a fitness of 10, that this individual will have a 0.05 probability to be selected to form the new population). So in the selection of the best solution I have to use the fitness in order to get the best individual, the last individual use the fitness to have a chance to be selected. Obviously the old population anf the new population must have the same number of individuals. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a better way of doing this?
Il Fri, 06 Mar 2009 22:28:00 +0100, Peter Otten ha scritto: > mattia wrote: > >> Il Fri, 06 Mar 2009 14:06:14 +0100, Peter Otten ha scritto: >> >>> mattia wrote: >>> >>>> Hi, I'm new to python, and as the title says, can I improve this >>>> snippet (readability, speed, tricks): >>>> >>>> def get_fitness_and_population(fitness, population): >>>> return [(fitness(x), x) for x in population] >>>> >>>> def selection(fitness, population): >>>> ''' >>>> Select the parent chromosomes from a population according to >>>> their fitness (the better fitness, the bigger chance to be >>>> selected) ''' selected_population = [] >>>> fap = get_fitness_and_population(fitness, population) pop_len = >>>> len(population) >>>> # elitism (it prevents a loss of the best found solution) # take >>>> the only 2 best solutions >>>> elite_population = sorted(fap) >>>> selected_population += [elite_population[pop_len-1][1]] + >>>> [elite_population[pop_len-2][1]] >>>> # go on with the rest of the elements for i in range(pop_len-2): >>>> # do something >>> >>> def selection1(fitness, population, N=2): >>> rest = sorted(population, key=fitness, reverse=True) best = >>> rest[:N] del rest[:N] >>> # work with best and rest >>> >>> >>> def selection2(fitness, population, N=2): >>> decorated = [(-fitness(p), p) for p in population] >>> heapq.heapify(decorated) >>> >>> best = [heapq.heappop(decorated)[1] for _ in range(N)] rest = [p >>> for f, p in decorated] >>> # work with best and rest >>> >>> Both implementations assume that you are no longer interested in the >>> individuals' fitness once you have partitioned the population in two >>> groups. >>> >>> In theory the second is more efficient for "small" N and "large" >>> populations. >>> >>> Peter >> >> Ok, but the fact is that I save the best individuals of the current >> population, than I'll have to choose the others elements of the new >> population (than will be N-2) in a random way. The common way is using >> a roulette wheel selection (based on the fitness of the individuals, if >> the total fitness is 200, and one individual has a fitness of 10, that >> this individual will have a 0.05 probability to be selected to form the >> new population). So in the selection of the best solution I have to use >> the fitness in order to get the best individual, the last individual >> use the fitness to have a chance to be selected. Obviously the old >> population anf the new population must have the same number of >> individuals. > > You're right, it was a bad idea. > > Peter Here is my last shot, where I get rid of all the old intermediate functions: def selection(fitness, population): lp = len(population) roulette_wheel = [] for x in population: roulette_wheel += [x]*fitness(x) selected_population = [[]]*lp selected_population[:2] = sorted(population, key=fitness, reverse=True)[:2] selected_population[2:] = [choice(roulette_wheel) for _ in range (lp-2)] -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a better way of doing this?
Il Fri, 06 Mar 2009 18:46:44 -0300, andrew cooke ha scritto: > i have not been following this discussion in detail, so someone may have > already explained this, but it should not be necessary to actually > construct the roulette wheel to select values from it. what you are > doing is selecting from a list where the there are different > probabilities of selecting different entries. i am pretty sure that can > be done more efficiently than by constructing a new list with many more > entries whose aim is to simulate that (which is what the roulette wheel > seems to be in your code, if i have understood correctly). > > more precisely, i think you can adapt the trick used to select a line at > random from a file by scanning the file just once. > > sorry if i have misunderstood, > andrew Well, I believe that using the right distribution I can for sure find a better way for doing the roulette wheel selection. When I'll have enough time I'll pick up my statistics book. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a better way of doing this?
Il Fri, 06 Mar 2009 14:13:47 -0800, Scott David Daniels ha scritto: > mattia wrote: >> Here is my last shot, where I get rid of all the old intermediate >> functions: >> >> def selection(fitness, population): >> lp = len(population) >> roulette_wheel = [] >> for x in population: >> roulette_wheel += [x]*fitness(x) >> selected_population = [[]]*lp >> selected_population[:2] = sorted(population, key=fitness, >> reverse=True)[:2] >> selected_population[2:] = [choice(roulette_wheel) for _ in range >> (lp-2)] > Try something like this to choose likely couples: > > import random > import bisect > > def choose_pairs(fitness_population, decider=random): > '''Pick and yield pairs weighted by fitness for crossing. > > We assume weighted_population has fitness already calculated. > decide is a parameter to allow testing. ''' > total = 0 > cumulative = [] > candidates = [] > for fitness, individual in set(fitness_population): > # calculate total weights, extract real candidates if > fitness > 0: > total += fitness > cumulative.append(total) > candidates.append(individual) > assert len(candidates) > 1 > while True: > # pick a candidate by weight > c0 = decider.random() * total > first = bisect.bisect_left(cumulative, c0) if first: > weighting = cumulative[first] - cumulative[first - 1] > else: > weighting = cumulative[0] > # pick another distinct candidate by fitness c1 = choice = > decider.random() * (total - weighting) if choice >= > cumulative[first] - weighting: > choice += weight # adjust to avoid selecting first > second = bisect.bisect_left(cumulative, choice) yield > candidates[first], candidates[second] > > --Scott David Daniels > scott.dani...@acm.org Thanks, I've found another solution here: http://www.obitko.com/tutorials/ genetic-algorithms/selection.php so here is my implementation: def create_chromosome(min, max, length): return [randint(min, max) for i in range(length)] def create_population(nelem, min, max, length): # preconditions: nelem > 1 and nelem is even if not nelem > 1: nelem = 2 if not nelem%2 == 0: print("The population must have an even number of elements. Correcting...") nelem += 1 return [create_chromosome(min, max, length) for i in range(nelem)] def get_fap(fitness, population): fap = [] total = 0 for x in population: f = fitness(x) fap += [(f, x)] total += f return sorted(fap, reverse=True), total def my_rw(): list, tot = get_fap(sum, pop) r = randint(0, tot-1) i = 0 print(r) for f, e in list: i += f print(i) if i > r: return e return [] # never reached if __name__ == "__main__": pop = create_population(5, 0, 1, 10) # selection_mat(sum, pop) #print(rw(sum, pop)) list, tot = get_fap(sum, pop) print(list) print(tot) for i in range(6): print(my_rw()) -- http://mail.python.org/mailman/listinfo/python-list
Re: Is there a better way of doing this?
Il Sat, 07 Mar 2009 00:05:53 -0200, Gabriel Genellina ha scritto: > En Fri, 06 Mar 2009 21:31:01 -0200, mattia escribió: > >> Thanks, I've found another solution here: >> http://www.obitko.com/tutorials/ >> genetic-algorithms/selection.php >> so here is my implementation: >> >> >> def get_fap(fitness, population): >> fap = [] >> total = 0 >> for x in population: >> f = fitness(x) >> fap += [(f, x)] >> total += f >> return sorted(fap, reverse=True), total > > Imagine you're working with someone side by side. You write a note in a > piece of paper, put it into an envelope, and hand it to your co-worker. > He opens the envelope, throws it away, takes the note and files it > inside a folder right at the end. And you do this over and over. What's > wrong in this story? > > Please save our trees! Don't waste so many envelopes - that's just what > this line does: > > fap += [(f, x)] > > Environmentally friendly Pythoneers avoid using discardable intermediate > envelopes: > > fap.append((f, x)) > > Please recycle! Yes, sorry, I have to recycle! But how about this: >>> rw = [[2,4], [4,5,6],[5,5]] >>> rw += [[1,1]]*2 >>> rw [[2, 4], [4, 5, 6], [5, 5], [1, 1], [1, 1]] >>> rw = [[2,4], [4,5,6],[5,5]] >>> rw.append([1,1]*2) >>> rw [[2, 4], [4, 5, 6], [5, 5], [1, 1, 1, 1]] >>> rw = [[2,4], [4,5,6],[5,5]] >>> rw.append([[1,1]]*2) >>> rw [[2, 4], [4, 5, 6], [5, 5], [[1, 1], [1, 1]]] >>> How can I recicle in this way using append? -- http://mail.python.org/mailman/listinfo/python-list
String to sequence
How can I convert the following string: 'AAR','ABZ','AGA','AHO','ALC','LEI','AOC', EGC','SXF','BZR','BIQ','BLL','BHX','BLQ' into this sequence: ['AAR','ABZ','AGA','AHO','ALC','LEI','AOC', EGC','SXF','BZR','BIQ','BLL','BHX','BLQ'] Thanks a lot, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: String to sequence
Il Sat, 14 Mar 2009 10:24:38 +0100, Ulrich Eckhardt ha scritto: > mattia wrote: >> How can I convert the following string: >> >> 'AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ' >> >> into this sequence: >> >> ['AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ'] > > import string > string.split("a,b,c", ',') > > Now, I'm not 100% clear if this fits your above example because it's not > clear what of the above is Python code and what is actual string > content, but I hope this will get you started. > > cheers! > > Uli Well, it was quite easy (although I'm new to python ;-)). I scrape a list of flights by their code [A-Z]{3} and then I save them into a sequence. Reading the documentation now I've got: dests = dests.replace("'", "") dests = dests.split(",") and everything is OK -- http://mail.python.org/mailman/listinfo/python-list
Re: String to sequence
Il Sat, 14 Mar 2009 10:30:43 +0100, Vlastimil Brom ha scritto: > 2009/3/14 mattia : >> How can I convert the following string: >> >> 'AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ' >> >> into this sequence: >> >> ['AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ'] >> >> Thanks a lot, >> Mattia >> -- >> http://mail.python.org/mailman/listinfo/python-list >> >> > Apart from the "obvious" and rather discouraged >>>> list(eval("'AAR','ABZ','AGA','AHO'")) > ['AAR', 'ABZ', 'AGA', 'AHO'] Why discouraged? > > you may try e.g.: > >>>> [item[1:-1] for item in "'AAR','ABZ','AGA','AHO'".split(",")] > ['AAR', 'ABZ', 'AGA', 'AHO'] > > hth, > vbr -- http://mail.python.org/mailman/listinfo/python-list
Re: String to sequence
Il Sat, 14 Mar 2009 10:35:59 +0100, Peter Otten ha scritto: > mattia wrote: > >> How can I convert the following string: >> >> 'AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ' >> >> into this sequence: >> >> ['AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ'] > >>>> s = "'AAR','ABZ','AGA','AHO','ALC','LEI','AOC'" >>>> csv.reader(StringIO.StringIO(s), quotechar="'").next() > ['AAR', 'ABZ', 'AGA', 'AHO', 'ALC', 'LEI', 'AOC'] > > or > >>>> s = "'AAR','ABZ','AGA','AHO','ALC','LEI','AOC'" list(compile(s, >>>> "nofile", "eval").co_consts[-1]) > ['AAR', 'ABZ', 'AGA', 'AHO', 'ALC', 'LEI', 'AOC'] > > Peter Ok, and what about if the string is "['AAR', 'ABZ', 'AGA', 'AHO', 'ALC']" I wanted to use eval(string) but it is discouraged, they say. -- http://mail.python.org/mailman/listinfo/python-list
Re: String to sequence
Il Sat, 14 Mar 2009 12:13:31 +0100, Peter Otten ha scritto: > mattia wrote: > >> Il Sat, 14 Mar 2009 10:35:59 +0100, Peter Otten ha scritto: >> >>> mattia wrote: >>> >>>> How can I convert the following string: >>>> >>>> 'AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >>>> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ' >>>> >>>> into this sequence: >>>> >>>> ['AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >>>> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ'] >>> >>>>>> s = "'AAR','ABZ','AGA','AHO','ALC','LEI','AOC'" >>>>>> csv.reader(StringIO.StringIO(s), quotechar="'").next() >>> ['AAR', 'ABZ', 'AGA', 'AHO', 'ALC', 'LEI', 'AOC'] >>> >>> or >>> >>>>>> s = "'AAR','ABZ','AGA','AHO','ALC','LEI','AOC'" list(compile(s, >>>>>> "nofile", "eval").co_consts[-1]) >>> ['AAR', 'ABZ', 'AGA', 'AHO', 'ALC', 'LEI', 'AOC'] >>> >>> Peter >> >> Ok, and what about if the string is "['AAR', 'ABZ', 'AGA', 'AHO', >> 'ALC']" I wanted to use eval(string) but it is discouraged, they say. > > If you use the csv module you can remove the [] manually > > assert s.startswith("[") > assert s.endswith("]") > s = s[1:-1] > > compile() will work without the enclosing list(...) call. > > Yet another one is > > flights = re.compile("'([A-Z]+)'").findall(s) if any(len(f) != 3 for f > in flights): >raise ValueError > > Peter Yeah, I'll also havo he handle some simple cases, for now I just used c2 = re.compile("a(?P[A-Z]{3})=\[(?P[^\]]+)\]") from_to = dict((x.group("from"), str_to_seq(x.group("seqto"))) for x in c2.finditer(rest)) Thanks a lot (I also didn't know of the any function)! -- http://mail.python.org/mailman/listinfo/python-list
Re: String to sequence
Il Sat, 14 Mar 2009 15:30:29 -0500, Tim Chase ha scritto: >> How can I convert the following string: >> >> 'AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ' >> >> into this sequence: >> >> ['AAR','ABZ','AGA','AHO','ALC','LEI','AOC', >> EGC','SXF','BZR','BIQ','BLL','BHX','BLQ'] > > Though several other options have come through: > > >>> s = "'EGC','SXF','BZR','BIQ','BLL','BHX','BLQ'" import re > >>> r = re.compile("'([^']*)',?") > >>> r.findall(s) > ['EGC', 'SXF', 'BZR', 'BIQ', 'BLL', 'BHX', 'BLQ'] > > If you want to get really fancy, you can use the built-in csv parser: > > >>> import cStringIO > >>> st = cStringIO.StringIO(s) > >>> import csv > >>> class SingleQuoteDialect(csv.Dialect): > ... quotechar = "'" > ... quoting = csv.QUOTE_MINIMAL > ... delimiter = "," > ... doublequote = True > ... escapechar = "\\" > ... lineterminator = '\r\n' > ... skipinitialspace = True > ... > >>> r = csv.reader(st, dialect=SingleQuoteDialect) r.next() > ['EGC', 'SXF', 'BZR', 'BIQ', 'BLL', 'BHX', 'BLQ'] > > > This gives you control over how any craziness gets handled, prescribing > escaping, and allowing you to stream in the data from a file if you > need. However, if they're airport codes, I suspect the easy route of > just using a regex will more than suffice. > > -tkc Yes, I'll use a regex to rule them all. -- http://mail.python.org/mailman/listinfo/python-list
List the moduels of a package
Hi all, how can I list the modules provided by a package? -- http://mail.python.org/mailman/listinfo/python-list
Set overlapping
How can I determine the common values found in two differents sets and then assign this values? E.g. dayset = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"] monthset = ["Jan", "Feb", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] this are the "fixed and standard" sets. Then I have others sets that contains one value from the previous sets (one for each). exset1 = ["SS" ,"33" ,"err" ,"val" ,"Tue" ,"XYZ" ,"40444" ,"Jan" ,"2008"] exset2 = ["53" ,"hello" ,"YY" ,"43" ,"2009" ,"Sun" ,"Feb"] I want to find: day_guess1 = value in dayset that is in exset1 day_guess2 = value in dayset that is in exset2 the same with monthset For now, I've got: for d in dayset: if d in exset1: guess_day = d break Is there something better? -- http://mail.python.org/mailman/listinfo/python-list
Correct URL encoding
I'm using urlopen in order to download some web pages. I've always to replace some characters that are in the url, so I've come up with: url.replace("|", "%7C").replace("/", "%2F").replace(" ", "+").replace (":", "%3A") There isn't a better way of doing this? -- http://mail.python.org/mailman/listinfo/python-list
Re: Correct URL encoding
Il Sun, 15 Mar 2009 17:05:16 -0700, Chris Rebert ha scritto: > On Sun, Mar 15, 2009 at 4:54 PM, gervaz wrote: >> On Mar 16, 12:38 am, Graham Breed wrote: >>> mattia wrote: >>> > I'm using urlopen in order to download some web pages. I've always >>> > to replace some characters that are in the url, so I've come up >>> > with: url.replace("|", "%7C").replace("/", "%2F").replace(" ", >>> > "+").replace (":", "%3A") >>> > There isn't a better way of doing this? >>> >>> Yeah, shame there's no function -- called "urlencode" say -- that does >>> it all for you. >>> >>> Graham >> >> Sorry, but using Python 2.6 and urlencode I've got this error: >> TypeError: not a valid non-string sequence or mapping object What I was >> looking for (found in Python3) is: from urllib.parse import quote >> urlopen(quote(url)).read() >> but seems there is nothing similar in py2.6 > > (*cough*) [Python v2.6.1 documentation] urllib.quote() - > http://docs.python.org/library/urllib.html#urllib.quote (*cough*) > > - Chris Ouch! It's time for me to go to sleep ;-) Thanks. -- http://mail.python.org/mailman/listinfo/python-list
Lists aggregation
I have 2 lists, like: l1 = [1,2,3] l2 = [4,5] now I want to obtain a this new list: l = [(1,4),(1,5),(2,4),(2,5),(3,4),(3,5)] Then I'll have to transform the values found in the new list. Now, some ideas (apart from the double loop to aggregate each element of l1 with each element of l2): - I wanted to use the zip function, but the new list will not aggregate (3,4) and (3,5) - Once I've the new list, I'll apply a map function (e.g. the exp of the values) to speed up the process Some help? Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
urllib2 (py2.6) vs urllib.request (py3)
Hi all, can you tell me why the module urllib.request (py3) add extra characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the following and urllib2 (py2.6) correctly not? py2.6 >>> import urllib2 >>> f = urllib2.urlopen("http://www.google.com";).read() >>> fd = open("google26.html", "w") >>> fd.write(f) >>> fd.close() py3 >>> import urllib.request >>> f = urllib.request.urlopen("http://www.google.com";).read() >>> with open("google30.html", "w") as fd: ... print(f, file=fd) ... >>> Opening the two html pages with ff I've got different results (the extra characters mentioned earlier), why? -- http://mail.python.org/mailman/listinfo/python-list
Re: Lists aggregation
Il Tue, 17 Mar 2009 08:18:08 +0100, Peter Otten ha scritto: > Mensanator wrote: > >> On Mar 16, 1:40 pm, Peter Otten <__pete...@web.de> wrote: >>> mattia wrote: >>> > I have 2 lists, like: >>> > l1 = [1,2,3] >>> > l2 = [4,5] >>> > now I want to obtain a this new list: l = >>> > [(1,4),(1,5),(2,4),(2,5),(3,4),(3,5)] Then I'll have to transform >>> > the values found in the new list. Now, some ideas (apart from the >>> > double loop to aggregate each element of l1 with each element of >>> > l2): >>> > - I wanted to use the zip function, but the new list will not >>> > aggregate (3,4) and (3,5) >>> > - Once I've the new list, I'll apply a map function (e.g. the exp of >>> > the values) to speed up the process >>> > Some help? >>> >>> Why would you keep the intermediate list? >>> >>> With a list comprehension: >>> >>> >>> a = [1,2,3] >>> >>> b = [4,5] >>> >>> [x**y for x in a for y in b] >>> >>> [1, 1, 16, 32, 81, 243] >>> >>> With itertools: >>> >>> >>> from itertools import product, starmap from operator import pow >>> >>> list(starmap(pow, product(a, b))) >>> >>> [1, 1, 16, 32, 81, 243] >> >> That looks nothing like [(1,4),(1,5),(2,4),(2,5),(3,4),(3,5)]. > > The point of my post was that you don't have to calculate that list of > tuples explicitly. > > If you read the original post again you'll find that Mattia wanted that > list only as an intermediate step to something else. He gave "the exp of > values" as an example. As math.exp() only takes one argument I took this > to mean "exponentiation", or **/pow() in Python. > > Peter Correct, and thanks for all the help. -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 (py2.6) vs urllib.request (py3)
Il Tue, 17 Mar 2009 10:55:21 +, R. David Murray ha scritto: > mattia wrote: >> Hi all, can you tell me why the module urllib.request (py3) add extra >> characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the >> following and urllib2 (py2.6) correctly not? >> >> py2.6 >> >>> import urllib2 >> >>> f = urllib2.urlopen("http://www.google.com";).read() fd = >> >>> open("google26.html", "w") >> >>> fd.write(f) >> >>> fd.close() >> >> py3 >> >>> import urllib.request >> >>> f = urllib.request.urlopen("http://www.google.com";).read() with >> >>> open("google30.html", "w") as fd: >> ... print(f, file=fd) >> ... >> >>> >> >>> >> Opening the two html pages with ff I've got different results (the >> extra characters mentioned earlier), why? > > The problem isn't a difference between urllib2 and urllib.request, it is > between fd.write and print. This produces the same result as your first > example: > > >>>> import urllib.request >>>> f = urllib.request.urlopen("http://www.google.com";).read() with >>>> open("temp3.html", "wb") as fd: > ... fd.write(f) > > > The "b''" is the stringified representation of a bytes object, which > is what urllib.request returns in python3. Note the 'wb', which is a > critical difference from the python2.6 case. If you omit the 'b' in > python3, it will complain that you can't write bytes to the file object. > > The thing to keep in mind is that print converts its argument to string > before writing it anywhere (that's the point of using it), and that > bytes (or buffer) and string are very different types in python3. Well... now in the saved file I've got extra characters "fef" at the begin and "0" at the end... -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 (py2.6) vs urllib.request (py3)
Il Tue, 17 Mar 2009 10:55:21 +, R. David Murray ha scritto: > mattia wrote: >> Hi all, can you tell me why the module urllib.request (py3) add extra >> characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the >> following and urllib2 (py2.6) correctly not? >> >> py2.6 >> >>> import urllib2 >> >>> f = urllib2.urlopen("http://www.google.com";).read() fd = >> >>> open("google26.html", "w") >> >>> fd.write(f) >> >>> fd.close() >> >> py3 >> >>> import urllib.request >> >>> f = urllib.request.urlopen("http://www.google.com";).read() with >> >>> open("google30.html", "w") as fd: >> ... print(f, file=fd) >> ... >> >>> >> >>> >> Opening the two html pages with ff I've got different results (the >> extra characters mentioned earlier), why? > > The problem isn't a difference between urllib2 and urllib.request, it is > between fd.write and print. This produces the same result as your first > example: > > >>>> import urllib.request >>>> f = urllib.request.urlopen("http://www.google.com";).read() with >>>> open("temp3.html", "wb") as fd: > ... fd.write(f) > > > The "b''" is the stringified representation of a bytes object, which > is what urllib.request returns in python3. Note the 'wb', which is a > critical difference from the python2.6 case. If you omit the 'b' in > python3, it will complain that you can't write bytes to the file object. > > The thing to keep in mind is that print converts its argument to string > before writing it anywhere (that's the point of using it), and that > bytes (or buffer) and string are very different types in python3. In order to get the correct encoding I've come up with this: >>> response = urllib.request.urlopen("http://www.google.com";) >>> print(response.read().decode(response.headers.get_charsets()[0])) -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib2 (py2.6) vs urllib.request (py3)
Il Tue, 17 Mar 2009 15:40:02 +, R. David Murray ha scritto: > mattia wrote: >> Il Tue, 17 Mar 2009 10:55:21 +, R. David Murray ha scritto: >> >> > mattia wrote: >> >> Hi all, can you tell me why the module urllib.request (py3) add >> >> extra characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example >> >> like the following and urllib2 (py2.6) correctly not? >> >> >> >> py2.6 >> >> >>> import urllib2 >> >> >>> f = urllib2.urlopen("http://www.google.com";).read() fd = >> >> >>> open("google26.html", "w") >> >> >>> fd.write(f) >> >> >>> fd.close() >> >> >> >> py3 >> >> >>> import urllib.request >> >> >>> f = urllib.request.urlopen("http://www.google.com";).read() with >> >> >>> open("google30.html", "w") as fd: >> >> ... print(f, file=fd) >> >> ... >> >> >>> >> >> >>> >> >> Opening the two html pages with ff I've got different results (the >> >> extra characters mentioned earlier), why? >> > >> > The problem isn't a difference between urllib2 and urllib.request, it >> > is between fd.write and print. This produces the same result as your >> > first example: >> > >> > >> >>>> import urllib.request >> >>>> f = urllib.request.urlopen("http://www.google.com";).read() with >> >>>> open("temp3.html", "wb") as fd: >> > ... fd.write(f) >> > >> > >> > The "b''" is the stringified representation of a bytes object, >> > which is what urllib.request returns in python3. Note the 'wb', >> > which is a critical difference from the python2.6 case. If you omit >> > the 'b' in python3, it will complain that you can't write bytes to >> > the file object. >> > >> > The thing to keep in mind is that print converts its argument to >> > string before writing it anywhere (that's the point of using it), and >> > that bytes (or buffer) and string are very different types in >> > python3. >> >> Well... now in the saved file I've got extra characters "fef" at the >> begin and "0" at the end... > > The 'fef' is reminiscent of a BOM. I don't see any such thing in the > data file produced by my code snippet above. Did you try running that, > or did you modify your code? If the latter, maybe if you post your > exact code I can try to run it and see if I can figure out what is going > on. > > I'm far from an expert in unicode issues, by the way :) Oh, and I'm > running 3.1a1+ from svn, by the way, so it is also possible there's been > a bug fix of some sort. The extra code were produced using python version 3.0. This afternoon I've downloaded the 3.0.1 version and everything works fine for the previous example using the "wb" params. And now knowing that urlopen returns bytes I've also figured out how to decode the result (I deal with just html pages, no jpg, pdf, etc.) so I just have to know the charset of the page (if available). -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Wed, 18 Mar 2009 09:34:57 -0700, Aahz ha scritto: > In article , Peter Otten > <__pete...@web.de> wrote: >>mattia wrote: >>> >>> cpop += [nchromosome1] + [nchromosome2] >> >>I'd write that as >> >>cpop.append(nchromosome1) >>cpop.append(nchromosome2) >> >>thus avoiding the intermediate lists. > > You could also write it as > > cpop += [nchromosome1, nchromosome2] > > which may or may not be faster, substituting one attribute lookup, one > list creation, and one method call for two attribute lookups and two > method calls. I shan't bother running timeit to check, but I certainly > agree that either your version or mine should be substituted for the > original, depending on one's esthetics (meaning that I doubt there's > enough performance difference either way to make that the reason for > choosing one). Yeah, and I believe that we can say the same for: 1 - t = [x*2 for x in range(10)] 2 - t = list(x*2 for x in range(10)) or not? -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Wed, 18 Mar 2009 13:20:14 -0700, Aahz ha scritto: > In article <49c1562a$0$1115$4fafb...@reader1.news.tin.it>, mattia > wrote: >> >>Yeah, and I believe that we can say the same for: 1 - t = [x*2 for x in >>range(10)] >>2 - t = list(x*2 for x in range(10)) >>or not? > > The latter requires generator expressions, which means it only works > with Python 2.4 or higher. Personally, I think that if the intent is to > create a list you should just use a listcomp instead of list() on a > genexp. Ok, so list(x*2 for x in range(10)) actually means: list((x*2 for x in range(10)) --> so a generator is created and then the list function is called? Also, dealing with memory, [...] will be deleted when the reference will be no longer needed and with list(...)... well, I don't know? I'm new to python so sorry if this are nonsense. -- http://mail.python.org/mailman/listinfo/python-list
Re: Roulette wheel
Il Wed, 18 Mar 2009 23:31:09 -0200, Gabriel Genellina ha scritto: > En Wed, 18 Mar 2009 18:49:19 -0200, mattia escribió: >> Il Wed, 18 Mar 2009 13:20:14 -0700, Aahz ha scritto: >>> In article <49c1562a$0$1115$4fafb...@reader1.news.tin.it>, mattia >>> wrote: >>>> >>>> Yeah, and I believe that we can say the same for: 1 - t = [x*2 for x >>>> in range(10)] >>>> 2 - t = list(x*2 for x in range(10)) >>>> or not? >>> The latter requires generator expressions, which means it only works >>> with Python 2.4 or higher. Personally, I think that if the intent is >>> to create a list you should just use a listcomp instead of list() on a >>> genexp. >> Ok, so list(x*2 for x in range(10)) actually means: list((x*2 for x in >> range(10)) --> so a generator is created and then the list function is >> called? > > Exactly. The (()) were considered redundant in this case. > >> Also, dealing with memory, [...] will be deleted when the reference >> will be no longer needed and with list(...)... well, I don't know? I'm >> new to python so sorry if this are nonsense. > > I don't completely understand your question, but *any* object is > destroyed when the last reference to it is gone (in CPython, the > destructor is called at the very moment the reference count reaches > zero; other implementations may behave differently). OK, understood. Now, as a general rule, is it correct to say: - use generator expression when I just need to iterate over the list or call a function that involve an iterator (e.g. sum) and get the result, so the list is not necessary anymore - use list comprehensions when I actually have to use the list (e.g. I need to swap some values or I need to use sorted() etc.) Am I right? -- http://mail.python.org/mailman/listinfo/python-list
Simple question about yyyy/mm/dd
Hi all, I need to receive in input a date represented by a string in the form "/mm/dd" (or reversed), then I need to assure that the date is >= the current date and then split the dates in variables like year, month, day. Is there some module to do this quickly? -- http://mail.python.org/mailman/listinfo/python-list
Generator
Can you explain me this behaviour: >>> s = [1,2,3,4,5] >>> g = (x for x in s) >>> next(g) 1 >>> s [1, 2, 3, 4, 5] >>> del s[0] >>> s [2, 3, 4, 5] >>> next(g) 3 >>> Why next(g) doesn't give me 2? -- http://mail.python.org/mailman/listinfo/python-list
Re: Generator
Il Sun, 22 Mar 2009 16:52:02 +, R. David Murray ha scritto: > mattia wrote: >> Can you explain me this behaviour: >> >> >>> s = [1,2,3,4,5] >> >>> g = (x for x in s) >> >>> next(g) >> 1 >> >>> s >> [1, 2, 3, 4, 5] >> >>> del s[0] >> >>> s >> [2, 3, 4, 5] >> >>> next(g) >> 3 >> >>> >> >>> >> Why next(g) doesn't give me 2? > > Think of it this way: the generator is exactly equivalent to the > following generator function: > > def g(s): > for x in s: > yield x > > Now, if you look at the documentation for the 'for' statement, there is > a big "warning" box that talks about what happens when you mutate an > object that is being looped over: > > There is a subtlety when the sequence is being modified by the loop > (this can only occur for mutable sequences, i.e. lists). An > internal counter is used to keep track of which item is used next, > and this is incremented on each iteration. When this counter has > reached the length of the sequence the loop terminates. This means > that if the suite deletes the current (or a previous) item from the > sequence, the next item will be skipped (since it gets the index of > the current item which has already been treated). Likewise, if the > suite inserts an item in the sequence before the current item, the > current item will be treated again the next time through the loop. > > As you can see, your case is covered explicitly there. > > If you want next(g) to yield 3, you'd have to do something like: > > g = (x for x in s[:]) > > where s[:] makes a copy of s that is then iterated over. Ok, thanks. Yes, I had the idea that a counter was used in order to explain my problem. Now I know that my intuition was correct. Thanks. -- http://mail.python.org/mailman/listinfo/python-list
Code anntotations (copyright, autor, etc) in your code
Hi all, which are the usual comments that you put at the beginning of your code to explain e.g. the author, the usage, the license etc? I've found useful someting like: #- # Name:About.py # Purpose: # # Author: # # Created: 2009 # Copyright: (c) 2009 # Licence: GPL #- others put something like __author__ = "Name Surname" __year__ = 2009 What do you use? -- http://mail.python.org/mailman/listinfo/python-list
Help with dict and iter
Hi all, I a list of jobs and each job has to be processed in a particular order by a list of machines. A simple representation is: # Ordering of machines JOB1 = [3, 1, 2, 4] JOB2 = [2, 3, 1, 4] JOBS = [JOB1, JOB2] NJOBS = len(JOBS) Now, I have a list of jobs and I want to have the associated list of machines, e.g: [JOB1, JOB1, JOB2] --> [3, 1, 2] My original idea was to have a dict with associated the job number and an iterator associated by the list of machines: job_machine = dict((x+1, iter(JOBS[x])) for x in range(NJOBS)) Now, something like: for x in job_list: print(next(job_machine[x])) Works good, but imagine I have a list of job_list, now obviously I have a StopIteration exception after the first list. So, I'm looking for a way to "reset" the next() value every time i complete the scan of a list. Is it possible? Another solution can be: empty = dict((x+1, (0, JOBS[x])) for x in range(NJOBS)) job_machine = dict((x+1, (0, JOBS[x])) for x in range(NJOBS)) and then every time do: for job_list in p: for x in job_list: print(job_machine[x][1][job_machine[x][0]]) job_machine.update({x:(job_machine[x][0]+1, JOBS[x-1])}) job_machine = empty.copy() Can you suggest me a more python way? Ciao, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: Help with dict and iter
Il Sun, 29 Mar 2009 11:17:50 -0400, andrew cooke ha scritto: > mattia wrote: >> Hi all, I a list of jobs and each job has to be processed in a >> particular order by a list of machines. >> A simple representation is: >> # Ordering of machines >> JOB1 = [3, 1, 2, 4] >> JOB2 = [2, 3, 1, 4] >> JOBS = [JOB1, JOB2] >> NJOBS = len(JOBS) >> Now, I have a list of jobs and I want to have the associated list of >> machines, e.g: >> [JOB1, JOB1, JOB2] --> [3, 1, 2] >> My original idea was to have a dict with associated the job number and >> an iterator associated by the list of machines: job_machine = >> dict((x+1, iter(JOBS[x])) for x in range(NJOBS)) Now, something like: >> for x in job_list: >> print(next(job_machine[x])) >> Works good, but imagine I have a list of job_list, now obviously I have >> a StopIteration exception after the >> first list. So, I'm looking for a way to "reset" the next() value every >> time i complete the scan of a list. > > don't you just want to have a new job machine? > > for job_list in job_list_list: > job_machine = dict((x+1, iter(JOBS[x])) for x in range(NJOBS)) for x > in job_list: > print(next(job_machine[x])) > > you can certainly write a generator that resets itself - i've done it by > adding a special reset exception and then using generator.throw(Reset()) > - but it's a lot more complicated and i don't think you need it. if you > do, you need to look at writing generators explicitly using yield, and > then at "enhanced generators" for using throw(). > > andrew Well, you are right, just creating every time a new dict is a good solution, I was just adding complexity to a simple problem, thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: Help with dict and iter
Il Sun, 29 Mar 2009 12:00:38 -0400, andrew cooke ha scritto: > mattia wrote: >>[i wrote]: >>> don't you just want to have a new job machine? >>> >>> for job_list in job_list_list: >>> job_machine = dict((x+1, iter(JOBS[x])) for x in range(NJOBS)) for x >>> in job_list: >>> print(next(job_machine[x])) > > ok - btw you can probably simplify the code. > > this might work: > > job_machine = list(map(iter, JOBS)) > > andrew > > [...] >> Well, you are right, just creating every time a new dict is a good >> solution, I was just adding complexity to a simple problem, thanks. Yes, it works perfectly, the only change is next(job_machine[x-1]). Thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: Help with dict and iter
Il Thu, 02 Apr 2009 13:44:38 +, Sion Arrowsmith ha scritto: > mattia wrote: >> So, I'm looking for a way to "reset" the next() value every >>time i complete the scan of a list. > > itertools.cycle ? Perfect, thanks. -- http://mail.python.org/mailman/listinfo/python-list
Screenshot of a web page
Are you aware of any python module that automatically gives you a screenshot of a web page? -- http://mail.python.org/mailman/listinfo/python-list
Re: KeyboardInterrupt
Il Thu, 10 Dec 2009 04:56:33 +, Brad Harms ha scritto: > On Thu, 10 Dec 2009 00:29:45 +0000, mattia wrote: > >> Il Wed, 09 Dec 2009 16:19:24 -0800, Jon Clements ha scritto: >> >>> On Dec 9, 11:53 pm, mattia wrote: >>>> Hi all, can you provide me a simple code snippet to interrupt the >>>> execution of my program catching the KeyboardInterrupt signal? >>>> >>>> Thanks, >>>> Mattia >>> >>> Errr, normally you can just catch the KeyboardInterrupt exception -- >>> is that what you mean? >>> >>> Jon. >> >> Ouch, so the simplest solution is just insert in the 'main' function a >> try/catch? I believed there was the necessity to create a signal and >> than attach the KeyboardInterrupt to it... > > > KeyboardInterrupt is just an exception that gets raised when CTLR+C (or > the OS's equivalent keyboard combo) gets pressed. It can occur at any > point in a script since you never know when the user will press it, > which is why you put the try: except KeyboardInterrupt: around as much > of your script as possible. The signal that the OS sends to the Python > interpreter is irrelevant. Ok, so can you tell me why this simple script doesn't work (i.e. I'm not able to catch the keyboard interrupt)? import time import sys from threading import Thread def do_work(): for _ in range(1000): try: time.sleep(1) print(".", end="") sys.stdout.flush() except KeyboardInterrupt: sys.exit() def go(): threads = [Thread(target=do_work, args=()) for _ in range(2)] for t in threads: t.start() for t in threads: t.join() go() -- http://mail.python.org/mailman/listinfo/python-list
Re: KeyboardInterrupt
Il Thu, 10 Dec 2009 23:10:02 +, Matthew Barnett ha scritto: > mattia wrote: >> Il Thu, 10 Dec 2009 04:56:33 +, Brad Harms ha scritto: >> >>> On Thu, 10 Dec 2009 00:29:45 +, mattia wrote: >>> >>>> Il Wed, 09 Dec 2009 16:19:24 -0800, Jon Clements ha scritto: >>>> >>>>> On Dec 9, 11:53 pm, mattia wrote: >>>>>> Hi all, can you provide me a simple code snippet to interrupt the >>>>>> execution of my program catching the KeyboardInterrupt signal? >>>>>> >>>>>> Thanks, >>>>>> Mattia >>>>> Errr, normally you can just catch the KeyboardInterrupt exception -- >>>>> is that what you mean? >>>>> >>>>> Jon. >>>> Ouch, so the simplest solution is just insert in the 'main' function >>>> a try/catch? I believed there was the necessity to create a signal >>>> and than attach the KeyboardInterrupt to it... >>> >>> KeyboardInterrupt is just an exception that gets raised when CTLR+C >>> (or the OS's equivalent keyboard combo) gets pressed. It can occur at >>> any point in a script since you never know when the user will press >>> it, which is why you put the try: except KeyboardInterrupt: around as >>> much of your script as possible. The signal that the OS sends to the >>> Python interpreter is irrelevant. >> >> Ok, so can you tell me why this simple script doesn't work (i.e. I'm >> not able to catch the keyboard interrupt)? >> >> import time >> import sys >> from threading import Thread >> >> def do_work(): >> for _ in range(1000): >> try: >> time.sleep(1) >> print(".", end="") >> sys.stdout.flush() >> except KeyboardInterrupt: >> sys.exit() >> >> def go(): >> threads = [Thread(target=do_work, args=()) for _ in range(2)] for t >> in threads: >> t.start() >> for t in threads: >> t.join() >> >> go() > > Only the main thread can receive the keyboard interrupt. Ok, so is there any way to stop all the threads if the keyboard interrupt is received? -- http://mail.python.org/mailman/listinfo/python-list
Re: KeyboardInterrupt
Il Wed, 09 Dec 2009 16:19:24 -0800, Jon Clements ha scritto: > On Dec 9, 11:53 pm, mattia wrote: >> Hi all, can you provide me a simple code snippet to interrupt the >> execution of my program catching the KeyboardInterrupt signal? >> >> Thanks, >> Mattia > > Errr, normally you can just catch the KeyboardInterrupt exception -- is > that what you mean? > > Jon. Ouch, so the simplest solution is just insert in the 'main' function a try/catch? I believed there was the necessity to create a signal and than attach the KeyboardInterrupt to it... -- http://mail.python.org/mailman/listinfo/python-list
insert unique data in a list
How can I insert non-duplicate data in a list? I mean, is there a particular option in the creation of a list that permit me not to use something like: def append_unique(l, val): if val not in l: l.append(val) Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
Il Sun, 13 Dec 2009 16:37:20 +, mattia ha scritto: > How can I insert non-duplicate data in a list? I mean, is there a > particular option in the creation of a list that permit me not to use > something like: > def append_unique(l, val): > if val not in l: > l.append(val) > > Thanks, > Mattia Ok, so you all suggest to use a set. Now the second question, more interesting. Why can't I insert a list into a set? I mean, I have a function that returns a list. I call this function several times and maybe the list returned is the same as another one already returned. I usually put all this lists into another list. How can I assure that my list contains only unique lists? Using set does'n work (i.e. the python interpreter tells me: TypeError: unhashable type: 'list')... -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto: > On 12월14일, 오후12시42분, Steven D'Aprano > wrote: >> On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote: >> > this makes the set type hashable. >> >> > class Set(set): >> > __hash__ = lambda self: id(self) >> >> That's a *seriously* broken hash function. >> >> >>> key = "voila" >> >>> d = { Set(key): 1 } >> >>> d >> >> {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ] >> >> Traceback (most recent call last): >> File "", line 1, in >> KeyError: Set(['i', 'a', 'l', 'o', 'v']) >> >> -- >> Steven > > of course it is broken as long as it uses it's instance id. i added this > to notify that unhashable can become hashable implementing __hash__ > inside the class. which probably set to None by default. Ok, nice example, but I believe that using id() as the hash function can lead to unexpected collisions. -- http://mail.python.org/mailman/listinfo/python-list
print format
Hi all, I wanto to print just the first 5 characters of a string, why this doesn't work (py3.1)? >>> print("{0:5}".format("123456789")) 123456789 I know I could use print("123456789"[:5]), yeah it's a stupid example, but isn't format for string formatting? Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: insert unique data in a list
Il Mon, 14 Dec 2009 21:53:38 +, Steven D'Aprano ha scritto: > On Mon, 14 Dec 2009 17:13:24 +, mattia wrote: > >> Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto: >> >>> On 12월14일, 오후12시42분, Steven D'Aprano >>> wrote: >>>> On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote: >>>> > this makes the set type hashable. >>>> >>>> > class Set(set): >>>> > __hash__ = lambda self: id(self) >>>> >>>> That's a *seriously* broken hash function. >>>> >>>> >>> key = "voila" >>>> >>> d = { Set(key): 1 } >>>> >>> d >>>> >>>> {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ] >>>> >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> KeyError: Set(['i', 'a', 'l', 'o', 'v']) >>>> >>>> -- >>>> Steven >>> >>> of course it is broken as long as it uses it's instance id. i added >>> this to notify that unhashable can become hashable implementing >>> __hash__ inside the class. which probably set to None by default. >> >> Ok, nice example, but I believe that using id() as the hash function >> can lead to unexpected collisions. > > No, you have that backwards. Using id() as the hash function means that > equal keys will hash unequal -- rather than unexpected collisions, it > leads to unexpected failure-to-collide-when-it-should-collide. > > And it isn't a "nice example", it is a terrible example. > > Firstly, the example fails to behave correctly. It simply doesn't work > as advertised. > > Secondly, and much worse, it encourages people to do something dangerous > without thinking about the consequences. If it is so easy to hash > mutable objects, why don't built-in lists and sets don't have a __hash__ > method? Do you think that the Python developers merely forgot? > > No, there is good reason why mutable items shouldn't be used as keys in > dicts or in sets, and this example simply papers over the reasons why > and gives the impression that using mutable objects as keys is easy and > safe when it is neither. > > Using mutable objects as keys or set elements leads to surprising, > unexpected behaviour and hard-to-find bugs. Consider the following set > with lists as elements: > > L = [1, 2] > s = Set() # set that allows mutable elements s.add(L) > s.add([1, 2, 3]) > > So far so good. But what should happen now? > > L.append(3) > > The set now has two equal elements, which breaks the set invariant that > it has no duplicates. > > Putting the problem of duplicates aside, there is another problem: > > L = [1, 2] > s = Set([L]) > L.append(3) > > There are two possibilities: either the hash function of L changes when > the object changes, or it doesn't. Suppose it changes. Then the hash of > L after the append will be different from the hash of L before the > append, and so membership testing (L in s) will fail. > > Okay, suppose we somehow arrange matters so that the hash of the object > doesn't change as the object mutates. This will solve the problem above: > we can mutate L as often as we like, and L in s will continue to work > correctly. But now the hash of an object doesn't just depend on it's > value, but on its history. That means that two objects which are > identical can hash differently, and we've already seen this is a > problem. > > There is one final approach which could work: we give the object a > constant hash function, so that all objects of that type hash > identically. This means that the performance of searches and lookups in > sets and dicts will fall to that of lists. There is no point in paying > all the extra overhead of a dict to get behaviour as slow, or slower, > than a list. > > In other words, no matter what you do, using mutable objects as keys or > set elements leads to serious problems that need to be dealt with. It > simply isn't true that all you need to do to make mutable objects usable > in dicts or sets is to add a hash function. That part is trivial. I agree with you, and in fact I'm inserting tuples in my set. All the workaroun to use somehow mutable object are poor attemps to solve in a quick-and-dirty way a difficult problem like hashing. But I think that during the discussion we have slowly forgot the main topic of my question, that was insert unique objects in a container. Hash are good to retrieve items in constant time, and when we are dealing with collisions we have to provide some solutions, like chaining or open addressing. Note also that in hash collisions happen and the hash function is used to retrieve items, not to insert unique items. -- http://mail.python.org/mailman/listinfo/python-list
Sort the values of a dict
Hi all, I have a dictionary that uses dates and a tuples ad key, value pairs. I need to sort the values of the dict and insert everything in a tuple. The additional problem is that I need to sort the values looking at the i-th element of the list. I'm not that good at python (v3.1), but this is my solution: >>> d = {1:('a', 1, 12), 5:('r', 21, 10), 2:('u', 9, 8)} >>> t = [x for x in d.values()] >>> def third(mls): ... return mls[2] ... >>> s = sorted(t, key=third) >>> pres = [] >>> for x in s: ... for k in d.keys(): ... if d[k] == x: ... pres.append(k) ... break ... >>> res = [] >>> for x in pres: ... res.append((x, d[x])) ... >>> res [(2, ('u', 9, 8)), (5, ('r', 21, 10)), (1, ('a', 1, 12))] >>> Can you provide me a much pythonic solution (with comments if possible, so I can actually learn something)? Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort the values of a dict
Actually, in order to use duplicate values I need something like: >>> import copy >>> d = {1:('a', 1, 12), 5:('r', 21, 10), 2:('u', 9, 8), 3:('u', 9, 8) } >>> dc = copy.deepcopy(d) >>> t = [x for x in d.values()] >>> def third(mls): ... return mls[2] ... >>> s = sorted(t, key=third) >>> pres = [] >>> for x in s: ... for k in d.keys(): ... if d[k] == x: ... pres.append(k) ... del d[k] # speedup and use duplicate values ... break ... >>> res = [] >>> for x in pres: ... res.append((x, dc[x])) ... >>> res [(2, ('u', 9, 8)), (3, ('u', 9, 8)), (5, ('r', 21, 10)), (1, ('a', 1, 12))] >>> -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort the values of a dict
Il Fri, 18 Dec 2009 18:00:42 -0500, David Robinow ha scritto: > On Fri, Dec 18, 2009 at 5:34 PM, mattia wrote: >> Hi all, I have a dictionary that uses dates and a tuples ad key, value >> pairs. I need to sort the values of the dict and insert everything in a >> tuple. The additional problem is that I need to sort the values looking >> at the i-th element of the list. I'm not that good at python (v3.1), >> but this is my solution: >> >>>>> d = {1:('a', 1, 12), 5:('r', 21, 10), 2:('u', 9, 8)} t = [x for x in >>>>> d.values()] >>>>> def third(mls): >> ... return mls[2] >> ... >>>>> s = sorted(t, key=third) >>>>> pres = [] >>>>> for x in s: >> ... for k in d.keys(): >> ... if d[k] == x: >> ... pres.append(k) >> ... break >> ... >>>>> res = [] >>>>> for x in pres: >> ... res.append((x, d[x])) >> ... >>>>> res >> [(2, ('u', 9, 8)), (5, ('r', 21, 10)), (1, ('a', 1, 12))] >>>>> >>>>> >> Can you provide me a much pythonic solution (with comments if possible, >> so I can actually learn something)? >> >> Thanks, Mattia >> -- >> http://mail.python.org/mailman/listinfo/python-list >> > I won't engage in any arguments about pythonicity but it seems simpler > if you convert to a list of tuples right away. > > d = {1:('a', 1, 12), 5:('r',21,10), 2:('u',9,8)} l = [(x, d[x]) for x in > d.keys()] > def third(q): > return q[1][2] > > s = sorted(l, key=third) > print s Thanks, I'm not yet aware of all the wonderful conversions python can do, amazing. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sort the values of a dict
Il Sat, 19 Dec 2009 17:30:27 +1100, Lie Ryan ha scritto: > On 12/19/2009 9:34 AM, mattia wrote: >> Can you provide me a much pythonic solution (with comments if possible, >> so I can actually learn something)? > > If you only need to get i'th element sometimes, sorting the dict is > fine. Otherwise, you might want to use collections.OrderedDict. Well, in the python doc OrderedDict is described as a dict that remembers the order that keys were first inserted and I don't need this. The fact is that I use a structure composed by a date and a list of possible solutions found, like (2009/12/21, (('e', 12, 33), ('r', 4, 11), ('r', 1, 33))) then every solution is inserted concurrently in a dictionary. I want to sort the solution found to provide, e.g., the first 10 dates found and the best result of every date based on the i-th element of the date's list. -- http://mail.python.org/mailman/listinfo/python-list
py itertools?
Hi all, I need to create the permutation of two strings but without repeat the values, e.g. 'ab' for me is equal to 'ba'. Here is my solution, but maybe the python library provides something better: >>> def mcd(a, b): ... if b == 0: ... return a ... else: ... return mcd(b, a % b) ... >>> def mcm(a, b): ... return int((a * b) / mcd(a, b)) ... >>> s1 = 'abc' >>> s2 = 'wt' >>> m = mcm(len(s1), len(s2)) >>> set(zip(s1*m, s2*m)) {('a', 'w'), ('a', 't'), ('b', 'w'), ('c', 't'), ('b', 't'), ('c', 'w')} Any help? Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: py itertools?
Il Sat, 19 Dec 2009 10:54:58 +, mattia ha scritto: > Hi all, I need to create the permutation of two strings but without > repeat the values, e.g. 'ab' for me is equal to 'ba'. Here is my > solution, but maybe the python library provides something better: > >>>> def mcd(a, b): > ... if b == 0: > ... return a > ... else: > ... return mcd(b, a % b) > ... >>>> def mcm(a, b): > ... return int((a * b) / mcd(a, b)) ... >>>> s1 = 'abc' >>>> s2 = 'wt' >>>> m = mcm(len(s1), len(s2)) >>>> set(zip(s1*m, s2*m)) > {('a', 'w'), ('a', 't'), ('b', 'w'), ('c', 't'), ('b', 't'), ('c', 'w')} > > Any help? > > Thanks, Mattia Well, this is the code I'm using: import itertools def cheapest_travel(l): """Given a list of departure and return dates, return the cheapest solution""" s = set(itertools.permute(l[0], l[1])) return sorted(t, key = lambda s: s[0][2] + s[1][2]) # example using a dict d = { '2009/12/21' : [[('d', 1, 2), ('d', 3, 4), ('d', 2, 3)], [('r', 3, 5), ('r', 3, 8)]], '2009/12/19' : [[('d', 1, 2), ('d', 2, 3)], [('r', 1, 4), ('r', 6, 4), ('r', 3, 5), ('r', 3, 8)]], '2009/12/23' : [[('d', 2, 5), ('d', 2, 4)], [('r', 4, 5)]], '2009/12/26' : [[('d', 2, 5), ('d', 1, 4)], [('r', 3, 6)]], '2009/12/28' : [[('d', 2, 5)], [('r', 4, 4)]] } for k, v in d.items(): print(k) res = cheapest_travel(v) for x in res: print(x[0], "-->", x[1], "cost", x[0][2] + x[1][2], "EUR") -- http://mail.python.org/mailman/listinfo/python-list
console command to get the path of a function
Hi all, is there a way in the python shell to list the path of a library function (in order to look at the source code?). Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: py itertools?
Il Sun, 20 Dec 2009 03:49:35 -0800, Chris Rebert ha scritto: >> On Dec 19, 12:48 pm, Chris Rebert wrote: >>> On Sat, Dec 19, 2009 at 2:54 AM, mattia wrote: >>> > Hi all, I need to create the permutation of two strings but without >>> > repeat the values, e.g. 'ab' for me is equal to 'ba'. Here is my >>> > solution, but maybe the python library provides something better: >>> >>> >>>> def mcd(a, b): >>> > ... if b == 0: >>> > ... return a >>> > ... else: >>> > ... return mcd(b, a % b) >>> > ... >>> >>>> def mcm(a, b): >>> > ... return int((a * b) / mcd(a, b)) ... >>> >>>> s1 = 'abc' >>> >>>> s2 = 'wt' >>> >>>> m = mcm(len(s1), len(s2)) >>> >>>> set(zip(s1*m, s2*m)) >>> > {('a', 'w'), ('a', 't'), ('b', 'w'), ('c', 't'), ('b', 't'), ('c', >>> > 'w')} >>> >>> > Any help? >>> >>> Surprised you didn't think of the seemingly obvious approach: >>> >>> def permute_chars(one, two): >>> for left in set(one): >>> for right in set(two): >>> yield (left, right) >>> >>> >>> list(permute_chars('abc', 'wt')) >>> >>> [('a', 'w'), ('a', 't'), ('b', 'w'), ('b', 't'), ('c', 'w'), ('c', >>> 't')] > > On Sun, Dec 20, 2009 at 3:21 AM, Parker wrote: >>>>> a = 'qwerty' >>>>> b = '^%&$#' >>>>> c = [(x,y) for x in a for y in b] >>>>> c >> [('q', '^'), ('q', '%'), ('q', '&'), ('q', '$'), ('q', '#'), ('w', >> '^'), ('w', '%'), ('w', '&'), ('w', '$'), ('w', '#'), ('e', '^'), ('e', >> '%'), ('e', '&'), ('e', '$'), ('e', '#'), ('r', '^'), ('r', '%'), ('r', >> '&'), ('r', '$'), ('r', '#'), ('t', '^'), ('t', '%'), ('t', '&'), ('t', >> '$'), ('t', '#'), ('y', '^'), ('y', '%'), ('y', '&'), ('y', '$'), ('y', >> '#')] >> >> >> This one is better and simple. > > But fails if either of the input strings has repeated characters. > (Although writing it as a comprehension is indeed much briefer.) > > Whether this matters, who knows, since the OP's spec for the function > was rather vague... > > Cheers, > Chris Having non-repeating values metter. -- http://mail.python.org/mailman/listinfo/python-list
Re: console command to get the path of a function
Il Sun, 20 Dec 2009 13:53:18 +0100, Irmen de Jong ha scritto: > On 12/20/2009 1:45 PM, mattia wrote: >> Hi all, is there a way in the python shell to list the path of a >> library function (in order to look at the source code?). >> >> Thanks, Mattia > > something like this? > > >>> import inspect > >>> import os > >>> inspect.getsourcefile(os.path.split) > 'C:\\Python26\\lib\\ntpath.py' > >>> print inspect.getsource(os.path.split) > def split(p): > """Split a pathname. > ... > ... > > > --irmen Perfect, thank you. -- http://mail.python.org/mailman/listinfo/python-list
Re: console command to get the path of a function
Il Sun, 20 Dec 2009 13:53:18 +0100, Irmen de Jong ha scritto: > On 12/20/2009 1:45 PM, mattia wrote: >> Hi all, is there a way in the python shell to list the path of a >> library function (in order to look at the source code?). >> >> Thanks, Mattia > > something like this? > > >>> import inspect > >>> import os > >>> inspect.getsourcefile(os.path.split) > 'C:\\Python26\\lib\\ntpath.py' > >>> print inspect.getsource(os.path.split) > def split(p): > """Split a pathname. > ... > ... > > > --irmen Ok, but how can I retrieve information about built-in functions (if any)? >>> inspect.getsourcefile(itertools.product) Traceback (most recent call last): File "", line 1, in File "C:\Python31\lib\inspect.py", line 439, in getsourcefile filename = getfile(object) File "C:\Python31\lib\inspect.py", line 406, in getfile raise TypeError('arg is a built-in class') TypeError: arg is a built-in class -- http://mail.python.org/mailman/listinfo/python-list
dict initialization
Is there a function to initialize a dictionary? Right now I'm using: d = {x+1:[] for x in range(50)} Is there any better solution? -- http://mail.python.org/mailman/listinfo/python-list
Re: dict initialization
Il Tue, 22 Dec 2009 23:09:04 +0100, Peter Otten ha scritto: > mattia wrote: > >> Is there a function to initialize a dictionary? Right now I'm using: >> d = {x+1:[] for x in range(50)} >> Is there any better solution? > > There is a dictionary variant that you don't have to initialize: > > from collections import defaultdict > d = defaultdict(list) > > Peter Great, thanks. Now when I call the dict key I also initialize the value, good also using something like: if d[n]: d[n].append(val) -- http://mail.python.org/mailman/listinfo/python-list
Re: dict initialization
Il Tue, 22 Dec 2009 23:09:04 +0100, Peter Otten ha scritto: > mattia wrote: > >> Is there a function to initialize a dictionary? Right now I'm using: >> d = {x+1:[] for x in range(50)} >> Is there any better solution? > > There is a dictionary variant that you don't have to initialize: > > from collections import defaultdict > d = defaultdict(list) > > Peter ...and it's also the only way to do something like: >>> def zero(): ... return 0 ... >>> d = defaultdict(zero) >>> s = ['one', 'two', 'three', 'four', 'two', 'two', 'one'] >>> for x in s: ... d[x] += 1 ... >>> d defaultdict(, {'four': 1, 'three': 1, 'two': 3, 'one': 2 }) >>> -- http://mail.python.org/mailman/listinfo/python-list
Join a thread and get the return value of a function
Hi all, is there a way in python to get back the value of the function passed to a thread once the thread is finished? Something like pthread_join() in C? Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: Join a thread and get the return value of a function
Il Fri, 25 Dec 2009 00:35:55 +1100, Lie Ryan ha scritto: > On 12/25/2009 12:23 AM, mattia wrote: >> Hi all, is there a way in python to get back the value of the function >> passed to a thread once the thread is finished? Something like >> pthread_join() in C? >> >> Thanks, Mattia > > use a Queue to pass the value out? Yes, it can be a solution, but are you indirectly telling me that there is no way then? -- http://mail.python.org/mailman/listinfo/python-list
Re: Join a thread and get the return value of a function
Il Fri, 25 Dec 2009 05:19:46 +1100, Lie Ryan ha scritto: > import threading > > class MyThread(threading.Thread): > def join(self): > super(MyThread, self).join() > return self.result > > class Worker(MyThread): > def run(self): > total = 0 > for i in range(random.randrange(1, 10)): > total += i > self.result = total > > import random > ts = [Worker() for i in range(100)] > for t in ts: > t.start() > > for t in ts: > print t.join() Thank you. And merry christmas! -- http://mail.python.org/mailman/listinfo/python-list
() vs []
Any particular difference in using for a simple collection of element () over [] or vice-versa? Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
print()
Is there a way to print to an unbuffered output (like stdout)? I've seen that something like sys.stdout.write("hello") works but it also prints the number of characters! -- http://mail.python.org/mailman/listinfo/python-list
Re: print()
Il Fri, 16 Oct 2009 22:40:34 -0700, Dennis Lee Bieber ha scritto: > On Fri, 16 Oct 2009 23:39:38 -0400, Dave Angel > declaimed the following in gmane.comp.python.general: > > >> You're presumably testing this in the interpreter, which prints extra >> stuff. In particular, it prints the result value of any expressions >> entered at the interpreter prompt. So if you type >> >> sys.stdout.write("hello") >> >> then after the write() method is done, the return value of the method >> (5) will get printed by the interpreter. >> > I was about to respond that way myself, but before doing so I wanted > to produce an example in the interpreter window... But no effect? > > C:\Documents and Settings\Dennis Lee Bieber>python ActivePython 2.5.2.2 > (ActiveState Software Inc.) based on Python 2.5.2 (r252:60911, Mar 27 > 2008, 17:57:18) [MSC v.1310 32 bit (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. import sys sys.stdout.write("hello") > hello>>> > > > PythonWin 2.5.2 (r252:60911, Mar 27 2008, 17:57:18) [MSC v.1310 32 bit > (Intel)] on win32. > Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin' > for further copyright information. import sys sys.stdout.write("This is a test") > This is a test print sys.stdout.write("Hello") > HelloNone > > No count shows up... neither PythonWin or Windows command line/ shell Indeed I'm using py3. But now everythong is fine. Everything I just wanted to know was just to run this simple script (I've also sent the msg 'putchar(8)' to the newsgroup): import time import sys val = ("|", "/", "-", "\\", "|", "/", "-", "\\") for i in range(100+1): print("=", end="") # print("| ", end="") print(val[i%len(val)], " ", sep="", end="") print(i, "%", sep="", end="") sys.stdout.flush() time.sleep(0.1) if i > 9: print("\x08"*5, " "*5, "\x08"*5, sep="", end="") else: print("\x08"*4, " "*4, "\x08"*4, sep="", end="") print(" 100%\nDownload complete!") -- http://mail.python.org/mailman/listinfo/python-list
Re: putchar(8)
Il Sat, 17 Oct 2009 06:48:10 -0400, Dave Angel ha scritto: > Dave Angel wrote: >> >> Jason Tackaberry wrote: >>> On Fri, 2009-10-16 at 12:01 -0700, gervaz wrote: >>> Hi all, is there in python the equivalent of the C function int putchar (int c)? I need to print putchar(8). >>> >> print '\x08' >> >> >>> or: >>> >>> >> print chr(8) >> >> >>> >>> >>> >> If I recall correctly, putchar() takes an int value 0-255 and outputs a >> single character to stdout. So the equivalent would be: >> >> sys.stdout.write(char(c)) >> >> print does other stuff, which you presumably do not want. >> >> DaveA >> >> >> > Oops. Instead of char(), I meant to type chr(). > > > sys.stdout.write(chr(c)) > > chr() is a built-in that converts an integer to a single character. > > DaveA Yes, noticed ;-) -- http://mail.python.org/mailman/listinfo/python-list
Re: print()
Il Fri, 16 Oct 2009 21:04:08 +, mattia ha scritto: > Is there a way to print to an unbuffered output (like stdout)? I've seen > that something like sys.stdout.write("hello") works but it also prints > the number of characters! Another question (always py3). How can I print only the first number after the comma of a division? e.g. print(8/3) --> 2.667 I just want 2.6 (or 2.66) Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: print()
Il Sat, 17 Oct 2009 10:38:55 -0400, Dave Angel ha scritto: > mattia wrote: >> Il Fri, 16 Oct 2009 22:40:34 -0700, Dennis Lee Bieber ha scritto: >> >> >>> On Fri, 16 Oct 2009 23:39:38 -0400, Dave Angel >>> declaimed the following in gmane.comp.python.general: >>> >>> >>> >>>> You're presumably testing this in the interpreter, which prints extra >>>> stuff. In particular, it prints the result value of any expressions >>>> entered at the interpreter prompt. So if you type >>>> >>>> sys.stdout.write("hello") >>>> >>>> then after the write() method is done, the return value of the method >>>> (5) will get printed by the interpreter. >>>> >>>> >>> I was about to respond that way myself, but before doing so I >>> >> wanted >> >>> to produce an example in the interpreter window... But no effect? >>> >>> C:\Documents and Settings\Dennis Lee Bieber>python ActivePython >>> 2.5.2.2 (ActiveState Software Inc.) based on Python 2.5.2 (r252:60911, >>> Mar 27 2008, 17:57:18) [MSC v.1310 32 bit (Intel)] on win32 Type >>> "help", "copyright", "credits" or "license" for more information. >>> >>>>>> import sys >>>>>> sys.stdout.write("hello") >>>>>> >>> hello>>> >>> >>> >>> PythonWin 2.5.2 (r252:60911, Mar 27 2008, 17:57:18) [MSC v.1310 32 bit >>> (Intel)] on win32. >>> Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin' >>> for further copyright information. >>> >>>>>> import sys >>>>>> sys.stdout.write("This is a test") >>>>>> >>> This is a test >>> >>>>>> print sys.stdout.write("Hello") >>>>>> >>> HelloNone >>> >>> >>> No count shows up... neither PythonWin or Windows command line/ >>> >> shell >> >> Indeed I'm using py3. But now everythong is fine. Everything I just >> wanted to know was just to run this simple script (I've also sent the >> msg 'putchar(8)' to the newsgroup): >> >> import time >> import sys >> >> val = ("|", "/", "-", "\\", "|", "/", "-", "\\") for i in range(100+1): >> print("=", end="") >> # print("| ", end="") >> print(val[i%len(val)], " ", sep="", end="") print(i, "%", sep="", >> end="") >> sys.stdout.flush() >> time.sleep(0.1) >> if i > 9: >> print("\x08"*5, " "*5, "\x08"*5, sep="", end="") >> else: >> print("\x08"*4, " "*4, "\x08"*4, sep="", end="") >> print(" 100%\nDownload complete!") >> >> > Seems to me you're spending too much energy defeating the things that > print() is automatically doing for you. The whole point of write() is > that it doesn't do anything but ship your string to the file/device. So > if you want control, do your own formatting. > > Consider: > > import time, sys, itertools > > val = ("|", "/", "-", "\\", "|", "/", "-", "\\") sys.stdout.write(" > ") > pattern = "\x08"*8 + " {0}{1:02d}%" for percentage, string in > enumerate(itertools.cycle(val)): > if percentage>99 : break > paddednum = pattern.format(string, percentage) > sys.stdout.write(paddednum) > sys.stdout.flush() > time.sleep(0.1) > print("\x08\x08\x08\x08 100%\nDownload complete!") > > > Note the use of cycle() which effectively repeats a list indefinitely. > And enumerate, which makes an index for you automatically when you're > iterating through a list. And str.format() that builds our string, > including using 0 padding so the percentages are always two digits. > > > DaveA It's always good to learn something new, thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: print()
Il Sat, 17 Oct 2009 10:02:27 -0400, Dave Angel ha scritto: > mattia wrote: >> Il Fri, 16 Oct 2009 21:04:08 +0000, mattia ha scritto: >> >> >>> Is there a way to print to an unbuffered output (like stdout)? I've >>> seen that something like sys.stdout.write("hello") works but it also >>> prints the number of characters! >>> >>> >> Another question (always py3). How can I print only the first number >> after the comma of a division? >> e.g. print(8/3) --> 2.667 >> I just want 2.6 (or 2.66) >> >> Thanks, Mattia >> >> > Just as sys.stdout.write() is preferable to print() for your previous > question, understanding str.format() is important to having good control > over what your output looks like. It's certainly not the only way, but > the docs seem to say it's the preferred way in version 3.xIt was > introduced in 2.6, so there are other approaches you might want if you > need to work in 2.5 or earlier. > > x = 8/3 > dummy0=dummy1=dummy2=42 > s = "The answer is approx. {3:07.2f} after rounding".format(dummy0, > dummy1, dummy2, x) > print(s) > > > will print out the following: > > The answer is approx. 0002.67 after rounding > > A brief explanation of the format string {3:07.2f} is as follows: > 3 selects argument 3 of the function, which is x 0 means to > zero-fill the value after conversion 7 means 7 characters total > width (this helps determine who many > zeroes are inserted) > 2 means 2 digits after the decimal > f means fixed point format > > You can generally leave out the parts you don't need, but this gives you > lots of control over what things should look like. There are lots of > other parts, but this is most of what you might need for controlled > printing of floats. > > The only difference from what you asked is that this rounds, where you > seemed (!) to be asking for truncation of the extra columns. If you > really need to truncate, I'd recommend using str() to get a string, then > use index() to locate the decimal separator, and then slice it yourself. > > DaveA Yes, reading the doc I've come up with s = "%(0)03.02f%(1)s done" % {"0": 100.0-100.0*(size/tot), "1": "%"} but to it is not a good idea to use a dict here.. -- http://mail.python.org/mailman/listinfo/python-list
Re: print()
Il Sun, 18 Oct 2009 20:04:11 -0200, Gabriel Genellina ha scritto: > En Sun, 18 Oct 2009 10:35:34 -0200, mattia escribió: > >> Il Sat, 17 Oct 2009 10:02:27 -0400, Dave Angel ha scritto: >>> mattia wrote: >>>> Il Fri, 16 Oct 2009 21:04:08 +, mattia ha scritto: >>>> >>>> Another question (always py3). How can I print only the first number >>>> after the comma of a division? >>>> e.g. print(8/3) --> 2.667 >>>> I just want 2.6 (or 2.66) >>>> >>> x = 8/3 >>> dummy0=dummy1=dummy2=42 >>> s = "The answer is approx. {3:07.2f} after rounding".format(dummy0, >>> dummy1, dummy2, x) >>> print(s) >>> >>> will print out the following: >>> >>> The answer is approx. 0002.67 after rounding >> >> Yes, reading the doc I've come up with s = "%(0)03.02f%(1)s done" % >> {"0": 100.0-100.0*(size/tot), "1": "%"} but to it is not a good idea to >> use a dict here.. > > No need for a dict, you could use instead: > > s = "%03.02f%s done" % (100.0-100.0*(size/tot), "%") > > or (%% is the way to embed a single %): > > s = "%03.02f%% done" % (100.0-100.0*(size/tot),) > > or even: > > s = "%03.02f%% done" % (100.0-100.0*(size/tot)) > > but the new str.format() originally suggested by Dave Angel is better: > > s = "{0:03.02f}% done".format(100.0-100.0*(size/tot)) > > (BTW, why 03.02f? The output will always have at least 4 chars, so 03 > doesn't mean anything... Maybe you want {0:06.2f} (three places before > the decimal point, two after it, filled with 0's on the left)?) No need of 03, you are right, thanks. -- http://mail.python.org/mailman/listinfo/python-list
Working threads progress
hi all, I have a simple program that uses multiple threads to carry out some work. Every threads share the same queue.Queue() (that is synchronized) in order to get some data and then carry out the work. Suppose I have something like: q = queue.Queue() for x in range(100): q.put(x) then I have: lock = threading.Lock() d = {} threads = [Thread(target=do_work, args=(q, lock, d)) for _ in range(5)] every thread thakes an item in the queue (using .get()) and compute some expensive work. Then updates the dictionary using the queue item as a key and the result obtained as the value (using the lock before and after the dict update). Now, I would like to know the activity done (e.g. every two seconds) so I create another thread that checks the queue size (using .qsize()). Have you any suggestion to improve the code? -- http://mail.python.org/mailman/listinfo/python-list
Re: Working threads progress
Il Wed, 28 Oct 2009 20:04:45 -0700, ryles ha scritto: > On Oct 28, 7:02 pm, mattia wrote: >> Now, I would like to know the activity done (e.g. every two seconds) so >> I create another thread that checks the queue size (using .qsize()). >> Have you any suggestion to improve the code? > > It's not uncommon to pass each thread a second queue for output, which > in your case might be tuples of (item, result). You can read from this > queue in a single thread, and since the results now accumulate > incrementally, you can easily collect more detailed status/progress > information (if that's what you're looking for). OK, but what do you mean by 'read from this queue'? I mean, I can have a variable initialized when the working queue is full (e.g. ready to be used) and then every time i use queue.get() decrement the variable (using a lock in order to prevent concurrent access). Is what you had in mind? -- http://mail.python.org/mailman/listinfo/python-list
Web page special characters encoding
Hi all, I'm using py3k and the urllib package to download web pages. Can you suggest me a package that can translate reserved characters in html like "è", "ò", "é" in the corresponding correct encoding? Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: Web page special characters encoding
Il Sat, 10 Jul 2010 18:09:12 +0100, MRAB ha scritto: > mattia wrote: >> Hi all, I'm using py3k and the urllib package to download web pages. >> Can you suggest me a package that can translate reserved characters in >> html like "è", "ò", "é" in the corresponding >> correct encoding? >> > import re > from html.entities import entitydefs > > # The downloaded web page will be bytes, so decode it to a string. > webpage = downloaded_page.decode("iso-8859-1") > > # Then decode the HTML entities. > webpage = re.sub(r"&(\w+);", lambda m: entitydefs[m.group(1)], webpage) Thanks, very useful, didn't know about the entitydefs dictionary. -- http://mail.python.org/mailman/listinfo/python-list
Re: Web page special characters encoding
Il Sat, 10 Jul 2010 16:24:23 +, mattia ha scritto: > Hi all, I'm using py3k and the urllib package to download web pages. Can > you suggest me a package that can translate reserved characters in html > like "è", "ò", "é" in the corresponding correct > encoding? > > Thanks, > Mattia Basically I'm trying to get an html page and stripping out all the tags to obtain just plain text. John Nagle and Christian Heimes somehow figured out what I'm trying to do ;-) So far what I've done, thanks to you suggestions: import lxml.html import lxml.html.clean import urllib.request import urllib.parse from html.entities import entitydefs import re import sys HEADERS = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"} def replace(m): if m.group(1) in entitydefs: return entitydefs[m.group(1)] else: return m.group(1) def test(page): req = urllib.request.Request(page, None, HEADERS) page = urllib.request.urlopen(req) charset = page.info().get_content_charset() if charset is not None: html = page.read().decode(charset) else: html = page.read().decode("iso-8859-1") html = re.sub(r"&(\w+);", replace, html) cleaner = lxml.html.clean.Cleaner(safe_attrs_only = True, style = True) html = cleaner.clean_html(html) # create the element tree tree = lxml.html.document_fromstring(html) txt = tree.text_content() for x in txt.split(): # DOS shell is not able to print characters like u'\u20ac' - why??? try: print(x) except: continue if __name__ == "__main__": if len(sys.argv) < 2: print("Usage:", sys.argv[0], "") print("Example:", sys.argv[0], "http://www.bing.com";) sys.exit() test(sys.argv[1]) Every new tips will be appreciated. Ciao, Mattia -- http://mail.python.org/mailman/listinfo/python-list
lxml question
I would like to click on an image in a web page that I retrieve using urllib in order to trigger an event. Here is the piece of code with the image that I want to click: I don't know how to do it (I'm trying using lxml, but any suggestion can help). Thanks, Mattia -- http://mail.python.org/mailman/listinfo/python-list
Re: Download and save a picture - urllib
You were right, the problem was with the print function, using a normal write everythong works fine. Il Thu, 10 Sep 2009 18:56:07 +0200, Diez B. Roggisch ha scritto: > mattia wrote: > >> Hi all, in order to download an image. In order to correctly retrieve >> the image I need to set the referer and handle cookies. >> >> opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler >> (), urllib.request.HTTPCookieProcessor()) >> urllib.request.install_opener(opener) req = >> urllib.request.Request("http://myurl/image.jpg";) >> req.add_header("Referer", "http://myulr/referer.jsp";) r = >> urllib.request.urlopen(req) >> with open("image.jpg", "w" ) as fd: >> print(r.read(), file=fd) >> >> I'm not able to correctly save the image. In fact it seems that it it >> saved in hex format. Any suggestion? > > How do you come to the conclusion that it's saved as "hex"? It sure > isn't - either the request fails because the website doesn't allow it > due to missing cookies or similar stuff - or you get the binary data. > > But you should be aware that in the interpreter, strings are printed out > with repr() - which will convert non-printable characters to their > hex-representation to prevent encoding/binary-data-on-teriminal-issues. > > Diez -- http://mail.python.org/mailman/listinfo/python-list
Download and save a picture - urllib
Hi all, in order to download an image. In order to correctly retrieve the image I need to set the referer and handle cookies. opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler (), urllib.request.HTTPCookieProcessor()) urllib.request.install_opener(opener) req = urllib.request.Request("http://myurl/image.jpg";) req.add_header("Referer", "http://myulr/referer.jsp";) r = urllib.request.urlopen(req) with open("image.jpg", "w" ) as fd: print(r.read(), file=fd) I'm not able to correctly save the image. In fact it seems that it it saved in hex format. Any suggestion? -- http://mail.python.org/mailman/listinfo/python-list
c/c++ and python
Hi to all! I have a little problem. I want to develop an application in c/c++ that creates a window with gtk+ accordinly to the information on a xml file. The funcions that are called for manage the event should be written in python. I don't know how to do it, can you help me? Is it possible? Thanks a lot! -- http://mail.python.org/mailman/listinfo/python-list
Re: c/c++ and python
Thanks a lot, very clear and usefull anser! Yes, I know PyGTK and wxPython, but I want to develop a plugin for another application that requires c++. I guess that embedding is the appropriate way. Thanks. -- http://mail.python.org/mailman/listinfo/python-list
security
Hi to all. I'm intristing in write a plugin for browsers that can execute python code. I know the main problem is security. Many thread were opened about this in the ng. I would know if fork python rewriting some library could avoid problems. I.e. one problem is the possibility to access files. If I rewrite the open() function so that raises exception if the program try to access a file out of a defined directory. I'm sure not a security expert, so please be patient if my question is stupid. Thanks to all. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python on Vista installation issues
will ha scritto: > Vista is a 64 bit OS and there is no port of pywin32 for either Vista > or 64-bit XP Vista exists in BOTH 32 bit and 64 bit versions. -- |\/|55: Mattia Gentilini e 55 curve di seguito con gli sci |/_| ETICS project at CNAF, INFN, Bologna, Italy |\/| www.getfirefox.com www.getthunderbird.com * Using Mac OS X 10.4.9 powered by Cerebros (Core 2 Duo) * -- http://mail.python.org/mailman/listinfo/python-list
Re: just a bug
Richard Brodie ha scritto: > For HTML, yes. it accepts all sorts of garbage, like most > browsers; I've never, before now, seen it accept an invalid > XML document though. It *could* depend on Content-Type. I've seen that Firefox treats XHTML as HTML (i.e. not trying to validate it) if you set Content-Type to text/html. However, the same document with Content-Type application/xhtml+xml is checked for well-formedness (if the DOM inspector is installed). So probably Firefox treats that bad-encoded document ad text/html (maybe as a failsafe setting), this could explain why it accepts that. -- |\/|55: Mattia Gentilini e 55 = log2(che_palle_sta_storia) (by mezzo) |/_| ETICS project at CNAF, INFN, Bologna, Italy |\/| www.getfirefox.com www.getthunderbird.com * Using Mac OS X 10.4.9 powered by Cerebros (Core 2 Duo) * -- http://mail.python.org/mailman/listinfo/python-list
Re: How to do this in python with regular expressions
Thorsten Kampe ha scritto: >> I'm trying to parsing html with re module. > Just don't. Use an HTML parser like BeautifulSoup Or HTMLParser/htmllib -- |\/|55: Mattia Gentilini e 55 = log2(che_palle_sta_storia) (by mezzo) |/_| ETICS project at CNAF, INFN, Bologna, Italy |\/| www.getfirefox.com www.getthunderbird.com * Using Mac OS X 10.4.9 powered by Cerebros (Core 2 Duo) * -- http://mail.python.org/mailman/listinfo/python-list
Re: How to do this in python with regular expressions
Thorsten Kampe ha scritto: >> I'm trying to parsing html with re module. > Just don't. Use an HTML parser like BeautifulSoup Or HTMLParser/htmllib. of course you can mix those and re, it'll be easier than re only. -- |\/|55: Mattia Gentilini e 55 = log2(che_palle_sta_storia) (by mezzo) |/_| ETICS project at CNAF, INFN, Bologna, Italy |\/| www.getfirefox.com www.getthunderbird.com * Using Mac OS X 10.4.9 powered by Cerebros (Core 2 Duo) * -- http://mail.python.org/mailman/listinfo/python-list
Re: 32 OS on 64-bit machine
SamG ha scritto: > If anyone has a x86_64 machine and is running a 32bit OS on top of > that could you tell me what output would you get for the following > program I have a Athlon64 X2 with Debian unstable i386: [EMAIL PROTECTED] pts/0 ~]$ python Python 2.4.4 (#2, Apr 26 2007, 00:02:45) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import platform >>> print platform.processor() >>> print platform.architecture() ('32bit', '') >>> I also have a MacBook with a Core 2 Duo and Mac OS X 10.4.9 : [EMAIL PROTECTED] ttyp4 ~/Desktop/ETICS]$ python Python 2.3.5 (#1, Aug 19 2006, 21:31:42) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import platform >>> print platform.processor() i386 >>> print platform.architecture() ('32bit', '') >>> -- Mattia Gentilini Collaborator for ETICS project - http://eu-etics.org/ INFN - CNAF - R&D Division - Bologna, Italy * Using Mac OS X 10.4.9 powered by Cerebros (Core 2 Duo) * -- http://mail.python.org/mailman/listinfo/python-list