Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread DFS
On 5/3/2016 12:06 AM, Michael Torrie wrote: Now if you want to talk about processing the data once you have it, there we can talk about speeds and optimization. Be glad to. Helps me learn python, so bring whatever challenge you want and I'll try to keep up. One small comparison I was able

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread Michael Torrie
On 05/02/2016 01:37 AM, DFS wrote: > So python matches or beats VBScript at this much larger file. Kewl. If you download something large enough to be meaningful, you'll find the runtime speeds should all converge to something showing your internet connection speed. Try downloading a 4 GB file, f

Re: You gotta love a 2-line python solution

2016-05-02 Thread jfong
Stephen Hansen at 2016/5/3 11:49:22AM wrote: > On Mon, May 2, 2016, at 08:27 PM, jf...@ms4.hinet.net wrote: > > But when I try to get this forum page, it does get a html file but can't > > be viewed normally. > > What does that mean? > > -- > Stephen Hansen > m e @ i x o k a i . i o The page

Re: You gotta love a 2-line python solution

2016-05-02 Thread DFS
On 5/2/2016 11:27 PM, jf...@ms4.hinet.net wrote: DFS at 2016/5/3 9:12:24AM wrote: try from urllib.request import urlretrieve http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3 I'm running python 2.7.11 (32-bit) Alright, it works...someway. I try to get a zip fi

Re: You gotta love a 2-line python solution

2016-05-02 Thread Stephen Hansen
On Mon, May 2, 2016, at 08:27 PM, jf...@ms4.hinet.net wrote: > But when I try to get this forum page, it does get a html file but can't > be viewed normally. What does that mean? -- Stephen Hansen m e @ i x o k a i . i o -- https://mail.python.org/mailman/listinfo/python-list

Re: You gotta love a 2-line python solution

2016-05-02 Thread jfong
DFS at 2016/5/3 9:12:24AM wrote: > try > > from urllib.request import urlretrieve > > http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3 > > > I'm running python 2.7.11 (32-bit) Alright, it works...someway. I try to get a zip file. It works, the file can be unzippe

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread DFS
On 5/2/2016 10:00 PM, Chris Angelico wrote: On Tue, May 3, 2016 at 11:51 AM, DFS wrote: On 5/2/2016 3:19 AM, Chris Angelico wrote: There's an easier way to test if there's caching happening. Just crank the iterations up from 10 to 100 and see what happens to the times. If your numbers are per

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread Chris Angelico
On Tue, May 3, 2016 at 11:51 AM, DFS wrote: > On 5/2/2016 3:19 AM, Chris Angelico wrote: > >> There's an easier way to test if there's caching happening. Just crank >> the iterations up from 10 to 100 and see what happens to the times. If >> your numbers are perfectly fair, they should be perfectl

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread DFS
On 5/2/2016 3:19 AM, Chris Angelico wrote: There's an easier way to test if there's caching happening. Just crank the iterations up from 10 to 100 and see what happens to the times. If your numbers are perfectly fair, they should be perfectly linear in the iteration count; eg a 1.8 second ten-it

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread DFS
On 5/2/2016 4:42 AM, Peter Otten wrote: DFS wrote: Is VB using a local web cache, and Python not? I'm not specifying a local web cache with either (wouldn't know how or where to look). If you have Windows, you can try it. I don't have Windows, but if I'm to believe http://stackoverflow.co

Re: You gotta love a 2-line python solution

2016-05-02 Thread DFS
On 5/2/2016 8:45 PM, jf...@ms4.hinet.net wrote: DFS at 2016/5/2 UTC+8 11:39:33AM wrote: To save a webpage to a file: - 1. import urllib 2. urllib.urlretrieve("http://econpy.pythonanywhere.com /ex/001.html","D:\file.html") -

Re: You gotta love a 2-line python solution

2016-05-02 Thread jfong
DFS at 2016/5/2 UTC+8 11:39:33AM wrote: > To save a webpage to a file: > - > 1. import urllib > 2. urllib.urlretrieve("http://econpy.pythonanywhere.com > /ex/001.html","D:\file.html") > - > > That's it! Why my system can

Re: Need help understanding list structure

2016-05-02 Thread Ben Finney
moa47...@gmail.com writes: > Am I correct in assuming that parsing a large text file would be > quicker returning pointers instead of strings? What do you mean by “return a pointer”? Python doesn't have pointers. In the Python language, a container type (such as ‘set’, ‘list’, ‘dict’, etc.) cont

Re: Need help understanding list structure

2016-05-02 Thread Michael Torrie
On 05/02/2016 04:33 PM, moa47...@gmail.com wrote: > Yes, that does help. You're right. The author of the library I'm > using didn't implement either a __str__ or __repr__ method. Am I > correct in assuming that parsing a large text file would be quicker > returning pointers instead of strings? I've

Re: What should Python apps do when asked to show help?

2016-05-02 Thread cs
On 02May2016 14:07, Grant Edwards wrote: On 2016-05-01, c...@zip.com.au wrote: Didn't the OP specify that he was writing a command-line utility for Linux/Unix? Discussing command line operation for Windows or OS-X seems rather pointless. OS-X _is_ UNIX. I spent almost all my time on this M

Re: Need help understanding list structure

2016-05-02 Thread moa47401
> When Python's "print" statement/function is invoked, it will print the > textual representation of the object according to its class's __str__ or > __repr__ method. That is, the print function prints out whatever text > the class says it should. > > For classes which don't implement a __str__

Re: Need help understanding list structure

2016-05-02 Thread Erik
On 02/05/16 22:30, moa47...@gmail.com wrote: Can someone help me understand why or under what circumstances a list shows pointers instead of the text data? When Python's "print" statement/function is invoked, it will print the textual representation of the object according to its class's __str

Need help understanding list structure

2016-05-02 Thread moa47401
I've been using an old text parsing library and have been able to accomplish most of what I wanted to do. But I don't understand the list structure it uses well enough to build additional methods. If I print the list, it has thousands of elements within its brackets separated by commas as I wou

Re: Python3 html scraper that supports javascript

2016-05-02 Thread zljubisic
> Why? As important as it is to show code, you need to show what actually > happens and what error message is produced. If you run the code you will see that html that I got doesn't have link to the flash video. I should somehow do something (press play video button maybe) in order to get html

Re: Best way to clean up list items?

2016-05-02 Thread DFS
On 5/2/2016 2:27 PM, Jussi Piitulainen wrote: DFS writes: On 5/2/2016 12:57 PM, Jussi Piitulainen wrote: DFS writes: Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] Want: list1 = ['Item 1','Item 2'] . . Funny-looking data you have. I know - sadly, it's actual data: ---

Re: Best way to clean up list items?

2016-05-02 Thread Jussi Piitulainen
DFS writes: > On 5/2/2016 12:57 PM, Jussi Piitulainen wrote: >> DFS writes: >> >>> Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] >>> Want: list1 = ['Item 1','Item 2'] . . >> Funny-looking data you have. > > I know - sadly, it's actual data: > > -

Re: Best way to clean up list items?

2016-05-02 Thread Stephen Hansen
On Mon, May 2, 2016, at 11:09 AM, DFS wrote: > I'd prefer to get clean data in the first place, but I don't know a > better way to extract it from the HTML. Ah, right. I didn't know you were scraping HTML. Scraping HTML is rarely clean so you have to do a lot of cleanup. -- Stephen Hansen m e

Re: Best way to clean up list items?

2016-05-02 Thread DFS
On 5/2/2016 12:57 PM, Jussi Piitulainen wrote: DFS writes: Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] Want: list1 = ['Item 1','Item 2'] I wrote this, which works fine, but maybe it can be tidier? 1. list2 = [t.replace("\r\n", "") for t in list1] #remove \r\n 2. list3 = [t.stri

Re: Best way to clean up list items?

2016-05-02 Thread DFS
On 5/2/2016 1:25 PM, Stephen Hansen wrote: On Mon, May 2, 2016, at 09:33 AM, DFS wrote: Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] I'm curious how you got to this point, it seems like you can solve the problem in how this is generated. ---

Re: Python3 html scraper that supports javascript

2016-05-02 Thread Stephen Hansen
On Mon, May 2, 2016, at 08:33 AM, zljubi...@gmail.com wrote: > I tried to use the following code: > > from bs4 import BeautifulSoup > from selenium import webdriver > > PHANTOMJS_PATH = > 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' > > url = > 'https://hrti

Re: Best way to clean up list items?

2016-05-02 Thread Peter Otten
DFS wrote: > Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] > Want: list1 = ['Item 1','Item 2'] > > > I wrote this, which works fine, but maybe it can be tidier? > > 1. list2 = [t.replace("\r\n", "") for t in list1] #remove \r\n > 2. list3 = [t.strip(' ') for t in list2]#

Re: Best way to clean up list items?

2016-05-02 Thread Stephen Hansen
On Mon, May 2, 2016, at 09:33 AM, DFS wrote: > Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] I'm curious how you got to this point, it seems like you can solve the problem in how this is generated. > Want: list1 = ['Item 1','Item 2'] That said: list1 = [t.strip() for t in list1 if t a

Re: Best way to clean up list items?

2016-05-02 Thread justin walters
On May 2, 2016 10:03 AM, "Jussi Piitulainen" wrote: > > DFS writes: > > > Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] > > Want: list1 = ['Item 1','Item 2'] > > > > > > I wrote this, which works fine, but maybe it can be tidier? > > > > 1. list2 = [t.replace("\r\n", "") for t in list1]

Re: Best way to clean up list items?

2016-05-02 Thread Jussi Piitulainen
DFS writes: > Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] > Want: list1 = ['Item 1','Item 2'] > > > I wrote this, which works fine, but maybe it can be tidier? > > 1. list2 = [t.replace("\r\n", "") for t in list1] #remove \r\n > 2. list3 = [t.strip(' ') for t in list2]#tr

Re: Python3 html scraper that supports javascript

2016-05-02 Thread DFS
On 5/2/2016 11:33 AM, zljubi...@gmail.com wrote: I tried to use the following code: from bs4 import BeautifulSoup from selenium import webdriver PHANTOMJS_PATH = 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' url = 'https://hrti.hrt.hr/#/video/show/22036

Best way to clean up list items?

2016-05-02 Thread DFS
Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n '] Want: list1 = ['Item 1','Item 2'] I wrote this, which works fine, but maybe it can be tidier? 1. list2 = [t.replace("\r\n", "") for t in list1] #remove \r\n 2. list3 = [t.strip(' ') for t in list2]#trim whitespace 3. list1 =

Re: You gotta love a 2-line python solution

2016-05-02 Thread Manolo Martínez
On 05/02/16 at 11:24am, Larry Martell wrote: > That reminds me of something I heard many years ago. > > Every non-trivial program can be simplified by at least one line of code. > Every non trivial program has at least one bug. > > Therefore every non-trivial program can be reduced to one line of

Re: Python3 html scraper that supports javascript

2016-05-02 Thread zljubisic
I tried to use the following code: from bs4 import BeautifulSoup from selenium import webdriver PHANTOMJS_PATH = 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' url = 'https://hrti.hrt.hr/#/video/show/2203605/trebizat-prica-o-jednoj-vodi-i-jednom-narodu-dok

Re: You gotta love a 2-line python solution

2016-05-02 Thread Larry Martell
On Mon, May 2, 2016 at 11:15 AM, DFS wrote: > Of course. Taken to its extreme, I could eventually replace you with one > line of code :) That reminds me of something I heard many years ago. Every non-trivial program can be simplified by at least one line of code. Every non trivial program has a

Re: You gotta love a 2-line python solution

2016-05-02 Thread DFS
On 5/2/2016 5:26 AM, BartC wrote: On 02/05/2016 04:39, DFS wrote: To save a webpage to a file: - 1. import urllib 2. urllib.urlretrieve("http://econpy.pythonanywhere.com /ex/001.html","D:\file.html") - That's it! Comin

Re: starting docker container messes up terminal settings

2016-05-02 Thread Larry Martell
On Mon, May 2, 2016 at 10:08 AM, Joaquin Alzola wrote: >>I am starting a docker container from a subprocess.Popen and it works, but >>when the script returns, the terminal settings of my shell are messed up. >>Nothing is echoed and return doesn't cause a >newline. I can fix this with >>'tset' i

Re: What should Python apps do when asked to show help?

2016-05-02 Thread Grant Edwards
On 2016-05-01, c...@zip.com.au wrote: >>Didn't the OP specify that he was writing a command-line utility for >>Linux/Unix? >> >>Discussing command line operation for Windows or OS-X seems rather >>pointless. > > OS-X _is_ UNIX. I spent almost all my time on this Mac in terminals. It is a > very

RE: starting docker container messes up terminal settings

2016-05-02 Thread Joaquin Alzola
>I am starting a docker container from a subprocess.Popen and it works, but >when the script returns, the terminal settings of my shell are messed up. >Nothing is echoed and return doesn't cause a >newline. I can fix this with >'tset' in the terminal, but I don't want to require that. Has anyone

starting docker container messes up terminal settings

2016-05-02 Thread Larry Martell
I am starting a docker container from a subprocess.Popen and it works, but when the script returns, the terminal settings of my shell are messed up. Nothing is echoed and return doesn't cause a newline. I can fix this with 'tset' in the terminal, but I don't want to require that. Has anyone here wo

Re: Private message regarding: Howw to prevent the duplication of any value in a column within a CSV file (python)

2016-05-02 Thread Ian Kelly
On Mon, May 2, 2016 at 3:52 AM, Adam Davis wrote: > Hi Ian, > > I'm really struggling to implement a set into my code as I'm a beginner, > it's taking me a while to grasp the idea of it. If I was to show you my code > so you get an idea of my aim/function of the code, would you be able to help > m

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread Tim Chase
On 2016-05-02 00:06, DFS wrote: > Then I tested them in loops - the VBScript is MUCH faster: 0.44 for > 10 iterations, vs 0.88 for python. In addition to the other debugging recommendations in sibling threads, a couple other things to try: 1) use a local debugging proxy so that you can compare th

Re: You gotta love a 2-line python solution

2016-05-02 Thread Steven D'Aprano
On Mon, 2 May 2016 08:12 pm, Marko Rauhamaa wrote: > For example, the urlretrieve() function above blocks. You can't use it > with the asyncio or select modules. The urlretrieve function is one of the oldest functions in the std library. It literally only exists because Guido was working on a co

Re: You gotta love a 2-line python solution

2016-05-02 Thread Marko Rauhamaa
BartC : > On 02/05/2016 04:39, DFS wrote: >> 2. urllib.urlretrieve("http://econpy.pythonanywhere.com >> /ex/001.html","D:\file.html") > [...] > > It seems Python provides a higher level solution compared with VBS. > Python presumably also has to do those Opens and Sends, but they are > hidden

loading multiple module with same name using importlib.machinery.SourceFileLoader

2016-05-02 Thread ulf . worsoe
I have observed this behaviour, for some reason only on OS X (and Python 3.5.1): I use importlib.machinery.SourceFileLoader to load a long list of modules. The modules are not located in the loader path, and many of them have the same name, i.e. I would have: m1 = importlib.machinery.SourceFile

Re: You gotta love a 2-line python solution

2016-05-02 Thread BartC
On 02/05/2016 04:39, DFS wrote: To save a webpage to a file: - 1. import urllib 2. urllib.urlretrieve("http://econpy.pythonanywhere.com /ex/001.html","D:\file.html") - That's it! Coming from VB/A background, some of the

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread Peter Otten
DFS wrote: >> Is VB using a local web cache, and Python not? > > I'm not specifying a local web cache with either (wouldn't know how or > where to look). If you have Windows, you can try it. I don't have Windows, but if I'm to believe http://stackoverflow.com/questions/5235464/how-to-make-micr

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread Stephen Hansen
On Mon, May 2, 2016, at 12:37 AM, DFS wrote: > On 5/2/2016 2:27 AM, Stephen Hansen wrote: > > I'm again going back to the point of: its fast enough. When comparing > > two small numbers, "twice as slow" is meaningless. > > Speed is always meaningful. > > I know python is relatively slow, but it's

Re: Code Opinion - Enumerate

2016-05-02 Thread Sayth Renshaw
As a reference here is a functional implementation of conways GOL. http://programmablelife.blogspot.com.au/2012/08/conways-game-of-life-in-clojure.html The author first does it in clojure and then transliterates it to python. Just good for a different view. Sayth -- https://mail.python.org/mail

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread DFS
On 5/2/2016 2:27 AM, Stephen Hansen wrote: On Sun, May 1, 2016, at 10:59 PM, DFS wrote: startTime = time.clock() for i in range(loops): r = urllib2.urlopen(webpage) f = open(webfile,"w") f.write(r.read()) f.close endTime = time.clock() print "Finished urllib2 in %

Re: Fastest way to retrieve and write html contents to file

2016-05-02 Thread Chris Angelico
On Mon, May 2, 2016 at 4:47 PM, DFS wrote: > I'm not specifying a local web cache with either (wouldn't know how or where > to look). If you have Windows, you can try it. > --- > Option Explicit > Dim xmlHTTP, fso, fOut, startTime, e