multiprocessing
Hi guys, I want to try out some pooling of processors, but I'm not sure if it is possible to do what I want to do. Basically, I want to have a global object, that is updated during the execution of a function, and I want to be able to run this function several times on parallel processors. The order in which the function runs doesn't matter, and the value of the object doesn't matter to the function, but I do want the processors to take turns 'nicely' when updating the object, so there are no collisions. Here is an extremely simplified and trivial example of what I have in mind: from multiprocessing import Pool import random p=Pool(4) myDict={} def update(value): global myDict index=random.random() myDict[index]+=value total=1000 p.map(update,range(total)) After, I would also like to be able to use several processors to access the global object (but not modify it). Again, order doesn't matter: p1=Pool(4) def getValues(index): global myDict print myDict[index] p1.map(getValues,keys.myDict) Is there a way to do this? Thanks, Elsa. -- http://mail.python.org/mailman/listinfo/python-list
For loop searching takes too long!
Hi guys, I've got a problem with my program, in that the code just takes too long to run. Here's what I'm doing. If anyone has any tips, they'd be much appreciated! So, say I have a list of lists that looks something like this (I'm using a list of lists, rather than a list of tuples, as I need it to be mutable): myList = [[786,0],[45, 1],[673,1],...[23,46]] there are enough entries in the outer list, that the sum of myList[i] [0] across all i could be as high as 10^7. Now, what I need to do is randomly choose one myList[i], however the distribution of my random choice needs to be proportional to the values of myList[i][0]. So, for this list above, I'd have a much higher chance of choosing myList[0] than myList[1]. Here is how I'm doing it at the moment: def chooseI(myList): mySum=0 choice = random.choice(range(1,sum([i[0] for i in myList])+1)) for i in range(len(myList)): mySum+=myList[i][0] if mySum>=choice: return i break This works just fine if sum([i[0] for i in myList]) is less than 10,000, say. However if its up around 10^7, the whole thing crashes. Is there a faster way of doing this, that doesn't involve as many computational steps? Thanks! elsa -- http://mail.python.org/mailman/listinfo/python-list
Still too slow
Hello again, Thanks for the tips r.e random.ranint(). This improved matters somewhat, however my program is still too slow. If anyone has any further tips on how to speed it up, they would be much appreciated! So, I'm calling evolve(L,limit) from the interactive prompt. L is initally [[100],['NA']]. Ideally limit would be 10^7. Here is my program: import random n=100 def evolve(L,limit): global n while nhttp://mail.python.org/mailman/listinfo/python-list
Re: Still too slow
Hi John and others, sorry about my etiquette errors. As you can tell I'm a newbie, and appreciate all the help I can get. I'm trying to master this thing with only the net and a couple of books as tutors. Here is what I'm running at the interactive prompt: >>> import myBDM >>> L=[[100,'NA']] >>> myBDM.evolve(L,10) Ideally, I'd like to bump it up to myBDM.evolve(L,1000). L keeps track of the number of subpopulations, how many individuals are in each subpopulation, and the parent subpopulation each subpopulation arose from. Here is my code again: import random n=100 def evolve(L,limit): """ evolves the population until the population size reaches limit, by choosing an individual of a particular subpopulation type, then randomly performing a birth, death, or mutation on this individual """ global n while nhttp://mail.python.org/mailman/listinfo/python-list
find integers in f.readline()
Hi people, I'm having a problem getting the info I need out of a file. I've opened the file with f=open('myFile','r'). Next, I take out the first line with line=f.readline() line looks like this: '83927 300023_25_5_09_FL 9086 9134 F3LQ2BE01AQLXF 1 49 + 80 ZA8Z89HIB7M' I then split it into parts with parts = line.split() ['83927', '300023_25_5_09_FL', '9086', '9134', 'F3LQ2BE01AQLXF', '1', '49', '+', '80', 'ZA8Z89HIB7M'] Now, I need to test whether I can call int(parts[0]) or not. Some of the lines in my file start with a value which represents and integer (as above), while others are just strings of characters. I want to extract just the lines like the one above, that start with an integer. Any suggestions? Thanks, Elsa. -- http://mail.python.org/mailman/listinfo/python-list
parsing tab and newline delimited text
Hi, I have a large file of text I need to parse. Individual 'entries' are separated by newline characters, while fields within each entry are separated by tab characters. So, an individual entry might have this form (in printed form): Titledate position data with each field separated by tabs, and a newline at the end of data. So, I thought I could simply open a file, read each line in in turn, and parse it f=open('MyFile') line=f.readline() parts=line.split('\t') etc... However, 'data' is a fairly random string of characters. Because the files I'm processing are large, there is a good chance that in every file, there is a data field that might look like this: 88dlKKlS\lk3#kdf\nKK99 or like this: LLLSDKJJJdkkf334$\ks)))K99 so, you see the random strings '\n' and '\t' are stopping me from being able to parse my file correctly. Any suggestions on how to overcome this problem would be greatly appreciated. Many thanks, Elsa -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing tab and newline delimited text
On Aug 4, 12:49 pm, Tim Chase wrote: > On 08/03/10 21:14, elsa wrote: > > > > > I have a large file of text I need to parse. Individual 'entries' are > > separated by newline characters, while fields within each entry are > > separated by tab characters. > > > So, an individual entry might have this form (in printed form): > > > Title date position data > > > with each field separated by tabs, and a newline at the end of data. > > So, I thought I could simply open a file, read each line in in turn, > > and parse it > > > f=open('MyFile') > > line=f.readline() > > parts=line.split('\t') > > > etc... > > > However, 'data' is a fairly random string of characters. Because the > > files I'm processing are large, there is a good chance that in every > > file, there is a data field that might look like this: > > > 88dlKKlS\lk3#kdf\nKK99 > > My first question is whether the line contains actual newline/tab > characters within the field data, or the string-representation of > the line. For one of the lines in question, what does > > print repr(line) here is what I get at the interactive prompt: >>> line = >>> """IG=4448>IG666HII;;;IIE??55 ... :E?IFHGCACI699;66IG11G???G???GGGIIGG?;; 9>CCIIIGHHIIIGEEDBB?9951//6=ABB=EEGII98AEIECCC>>;A=F@;; 44//11::=<>> line 'IG=4448>IG666HII;;;IIE?? 55\n:E?IFHGCACI699;66IG11G???G???GGGIIGG?;; 9>CCIIIGHHIIIGEEDBB?9951//6=ABB=EEGII98AEIECCC>>;A=F@;; 44//11::=<>> print repr(line) 'IG=4448>IG666HII;;;IIE?? 55\n:E?IFHGCACI699;66IG11G???G???GGGIIGG?;; 9>CCIIIGHHIIIGEEDBB?9951//6=ABB=EEGII98AEIECCC>>;A=F@;; 44//11::=<http://mail.python.org/mailman/listinfo/python-list
sgmllib.py
Hi all, I'm new to both this forum and Python, and I've got a bit stuck trying to learn how to parse HTML here is my problem I'm using a textbook that uses sgmllib.py for all its examples. I'm aware that sgmllib is not in the current release, however I want to get it to work, as I have python 2.5, and the text book uses it. So, the first example says to type something like (to test the sgmllib): python sgmllib.py "path/to/my/file.html" example (1) this doesn't work for me. I think I have figured out the problem - the error says "/System/Library/Frameworks/Python.framework/Versions/2.5/Resources/ Python.app/Contents/MacOS/Python: can't open file 'sgmllib.py': [Errno 2] No such file or directory" the problem is that this path is wrong. My sgmllib.py is in: "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/sgmllib.py" if I substitute this path for sgmllib.py in example (1), everything works fine. However, I don't want to do all that typing everytime I want to use sgmllib.py. So, I thought maybe the problem was with PYTHONPATH. I executed the following command: export PYTHONPATH=/System/Library/Frameworks/Python.framework/Versions/ 2.5/li/python2.5:$PYTHONPATH this seemed to work - no errors raised. However, when I retyped example (1), I got the same original error. Any assistance would be much appreciated. I'm working on max os x leopard. Thanks, elsa -- http://mail.python.org/mailman/listinfo/python-list
elementtree
I have a question about elementtree: I know how to turn HTML into an ElementTree object, but I don't know how to then view the structure of this object. Is there a method or module that you can give an ElementTree object to, and it returns some kind of graphical or printed representation of the tree? Otherwise, if you can't see you're tree's structure, how do you know what is a sensible way of iterating over the tree to access the info you need? Thanks! -- http://mail.python.org/mailman/listinfo/python-list
map
Hi, i have a question about the built in map function. Here 'tis: say I have a list, myList. Now say I have a function with more than one argument: myFunc(a, b='None') now, say I want to map myFunc onto myList, with always the same argument for b, but iterating over a: map(myFunc(b='booHoo'), myList) Why doesn't this work? is there a way to make it work? (Ultimately, I want to call myFunc(myList[0], 'booHoo'), myFunc(myList [1], 'booHoo'), myFunc(myList[2], 'booHoo') etc. However, I might want to call myFunc(myList[0], 'woo'), myFunc(myList[1], 'woo'), myFunc (myList[2], 'woo') some other time). Thanks, Elsa -- http://mail.python.org/mailman/listinfo/python-list
BeautifulSoup
Hi all, if I have some HTML that looks like this: http://BioCyc.org/ECOLI/NEW-IMAGE? type=GENE-IN-CHROM-BROWSER&object=EG12309" onmouseover="return overlib('<b>Gene:</b> yjtD<BR><b>Product:</ b> predicted rRNA methyltransferase, subunit of predicted rRNA methyltransferase<BR><b>Intergenic distances (bp):</ b> yjjY< +400 yjtD +214 >thrL');">Gene: yjtDProduct: predicted rRNA methyltransferase, subunit of predicted rRNA methyltransferaseIntergenic distances (bp): yjjY< +400 yjtD +214 >thrL');" onmouseout="return nd();"> is there an easy way to use BeautifulSoup to extract just the value of the href attribute? Thanks, elsa -- http://mail.python.org/mailman/listinfo/python-list
Re: map
On Aug 31, 11:44 pm, Hendrik van Rooyen wrote: > On Monday 31 August 2009 11:31:34 Piet van Oostrum wrote: > > > But ultimately it is also very much a matter of taste, preference and > > habit. > > This is true, but there is another reason that I posted - I have noticed that > there seems to be a tendency amongst newcomers to the group to go to great > lengths to find something that will do exactly what they want, irrespective > of the inherent complexity or lack thereof of that which they are trying to > do. > > Now I cannot understand why this is - one could say that it is caused by an > eagerness to understand all the corners of the tool that is python, but > somehow it does not feel like that to me - I see it almost as a crisis of > confidence - as if the newbie lacks the self confidence to believe that he or > she is capable of doing anything independently. > > So whenever I can, I try to encourage people to just do it their way, and to > see what happens, and to hack using the interactive interpreter, to build > confidence by experimenting and making mistakes, and realizing that when you > have made a mistake, it is not the end of the world, - you can fix it and > move on. > > Don't know if this rant makes any sense... > > - Hendrik in my own defense - firstly, I was able to implement what I wanted to do with loops, and I used this to solve the problem I needed to. However, by asking *why* map didn't work, I now understand how map works, what contexts it may indeed be useful for, and what the alternatives are. To boot, you have all given me about 10 different ways of solving the problem, some of which use prettier (and probably faster) code than the loops I wrote... -- http://mail.python.org/mailman/listinfo/python-list
choose value from custom distribution
Hello, I'm trying to find a way to collect a set of values from real data, and then sample values randomly from this data - so, the data I'm collecting becomes a kind of probability distribution. For instance, I might have age data for some children. It's very easy to collect this data using a list, where the index gives the value of the data, and the number in the list gives the number of times that values occurs: [0,0,10,20,5] could mean that there are no individuals that are no people aged 0, no people aged 1, 10 people aged 2, 20 people aged 3, and 5 people aged 4 in my data collection. I then want to make a random sample that would be representative of these proportions - is there any easy and fast way to select an entry weighted by its value? Or are there any python packages that allow you to easily create your own distribution based on collected data? Two other things to bear in mind are that in reality I'm collating data from up to around 5 million individuals, so just making one long list with a new entry for each individual won't work. Also, it would be good if I didn't have to decide before hand what the possible range of values is (which unfortunately I have to do with the approach I'm currently working on). Thanks in advance for your help, elsa. -- http://mail.python.org/mailman/listinfo/python-list
Python Custom Shapes Question
Hi, I am using Python for a programming class at school. However, we are being asked to create a function that enables python to re-size an image that we custom made (in this case a Greek Meander.) I wasn't sure if you knew how to do this. I having trouble. Thank you, Elsa -- https://mail.python.org/mailman/listinfo/python-list