Cameron!!! You are 'da man!!
Read your exaplanation.. good stuff to recheck/test and investigate over time.... In the short term, I'll implement some tests!! thanks! On Thu, Mar 30, 2017 at 6:51 PM, Cameron Simpson <c...@zip.com.au> wrote: > I wrote a long description of how .communicate can deadlock. > > Then I read the doco more carefully and saw this: > > Warning: Use communicate() rather than .stdin.write, .stdout.read > or .stderr.read to avoid deadlocks due to any of the other OS > pipe buffers filling up and blocking the child process. > > This suggests that .communicate uses Threads to send and to gather data > independently, and that therefore the deadlock situation may not arise. > > See what lsof and strace tell you; all my other advice stands regardless, > and > the deadlock description may or may not be relevant. Still worth reading and > understanding it when looking at this kind of problem. > > Cheers, > Cameron Simpson <c...@zip.com.au> > > > On 31Mar2017 09:43, Cameron Simpson <c...@zip.com.au> wrote: >> >> On 30Mar2017 13:51, bruce <badoug...@gmail.com> wrote: >>> >>> Trying to understand the "correct" way to run a sys command ("curl") >>> and to get the potential stderr. Checking Stackoverflow (SO), implies >>> that I should be able to use a raw/text cmd, with "shell=true". >> >> >> I strongly recommend avoiding shell=True if you can. It has many problems. >> All stackoverflow advice needs to be considered with caution. However, that >> is not the source of your deadlock. >> >>> If I leave the stderr out, and just use >>> s=proc.communicate() >>> the test works... >>> >>> Any pointers on what I might inspect to figure out why this hangs on >>> the proc.communicate process/line?? >> >> >> When it is hung, run "lsof" on the processes from another terminal i.e. >> lsof the python process and also lsof the curl process. That will make clear >> the connections between them, particularly which file descriptors ("fd"s) >> are associated with what. >> >> The run "strace" on the processes. That shoud show you what system calls >> are in progress in each process. >> >> My expectation is that you will see Python reading from one file >> descriptor and curl writing to a different one, and neither progressing. >> >> Personally I avoid .communicate and do more work myself, largerly to know >> precisely what is going on with my subprocesses. >> >> The difficulty with .communicate is that Python must read both stderr and >> stdout separately, but it will be doing that sequentially: read one, then >> read the other. That is just great if the command is "short" and writes a >> small enough amount of data to each. The command runs, writes, and exits. >> Python reads one and sees EOF after the data, because the command has >> exited. Then Python reads the other and collects the data and sees EOF >> because the command has exited. >> >> However, if the output of the command is large on whatever stream Python >> reads _second_, the command will stall writing to that stream. This is >> because Python is not reading the data, and therefore the buffers fill >> (stdio in curl plus the buffer in the pipe). So the command ("curl") stalls >> waiting for data to be consumed from the buffers. And because it has >> stalled, the command does not exit, and therefore Python does not see EOF on >> the _first_ stream. So it sits waiting for more data, never reading from the >> second stream. >> >> [...snip...] >>> >>> cmd='[r" curl -sS ' >>> #cmd=cmd+'-A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) >>> Gecko/20100101 Firefox/38.0"' >>> cmd=cmd+"-A '"+user_agent+"'" >>> ##cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' ' >>> cmd=cmd+' --cookie-jar '+ff+' --cookie '+ff+' ' >>> #cmd=cmd+'-e "'+referer+'" -d "'+tt+'" ' >>> #cmd=cmd+'-e "'+referer+'" ' >>> cmd=cmd+"-L '"+url1+"'"+'"]' >>> #cmd=cmd+'-L "'+xx+'" ' >> >> >> Might I recommand something like this: >> >> cmd_args = [ 'curl', '-sS' ] >> cmd_args.extend( [ '-A', user_agent ] ) >> cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] ) >> cmd_args.extend( [ '-L', url ] >> >> and using shell=False. This totally avoids any need to "quote" strings in >> the command, because the shell is not parsing the string - you're invoking >> "curl" directly instead of asking the shell to read a string and invoke >> "curl" for you. >> >> Constructing shell commands is tedious and fiddly; avoid it when you don't >> need to. >> >>> try_=1 >> >> >> It is preferable to say: >> >> try_ = true >> >>> while(try_): >> >> >> You don't need and brackets here: >> >> while try_: >> >> More readable, because less punctuation. >> >>> proc=subprocess.Popen(cmd, >>> shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE) >> >> >> proc = subprocess.Popen(cmd_args, >> stdout=subprocess.PIPE, >> stderr=subprocess.PIPE) >> >>> s,err=proc.communicate() >>> s=s.strip() >>> err=err.strip() >>> if(err==0): >>> try_='' >> >> >> It is preferable to say: >> >> try_ = False >> >> Also, you should be looking at proc.returncode, _not_ err. Many programs >> write informative messages to stderr, and a nonempty stderr does not imply >> failure. >> >> instead, all programs set their exit status to 0 for success and to >> various nonzero values for failure. So check: >> >> if proc.returncode == 0: >> try_ = False >> >> Or you could bypass try_ altogether and go: >> >> while True: >> ... subprocess ... >> if proc.returncode == 0: >> break >> >> That may not fit your larger scheme. >> >> Cheers, >> Cameron Simpson <c...@zip.com.au> >> _______________________________________________ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> https://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor