On Mon, Oct 26, 2009 at 3:05 AM, elca <high...@gmail.com> wrote: > > Hello, > i was open anther new thread ,old thread is too long.
Too long for what? > first of all,i really appreciate other many people's help in this newsgroup. > im making webscraper now. > but still problem with my script source. > http://elca.pastebin.com/m52e7d8e0 > i was upload my script in here. > so anybody can modify it. > main problem is , if you see line number 74 ,75 in my source, > you can see this line "thepage = urllib.urlopen(theurl).read()". > i want to change this line to work with Pamie not urllib. Why? > if anyone help me,really appreciate. > thanks in advance. I just took a look at your code. I don't want to be mean but your code is insane. 1.) you import HTMLParser and fromstring but don't use them. 2.) the page_check() function is useless. All it does is sleep for len("www.naver.com") seconds. Why are you iterating through the characters in that string anyway? 3.) On line 21 you have a pointless pass statement. 4.) The whole "if x:" statement on line 19 is pointless because both branches do exactly the same thing. 5.) The variables start_line and end_line you're using strings. This is not php. Strings are not automatically converted to integers. 6.) Because you never change end_line anywhere, and because you don't use break anywhere in the loop body, the while loop on line 39 will never end. 7.) The while loop on line 39 defines the getit() function (over and over again) but never calls it. 8.) On line 52 you define a list call "results" and then never use it anywhere. 9.) In getit() the default value for howmany is 0, but on line 68 you subtract 1 from it and the next line you return if not howmany. This means if you ever forget to call getit() with a value of howmany above zero that if statement will never return. 8.) In the for loop on line 54, in the while loop on line 56, you recursively call getit() on line 76. wtf? I suspect lines 73-76 are at the wrong indentation level. 9.) On line 79 you have a "bare" except, which just calls exit(1) on the next line. This replaces the exception you had (which contains important information about the error encountered) with a SystemExit exception (which does not.) Note that an uncaught exception will exit your script with a non-zero return code, so all you're doing here is throwing away debugging information. 10.) On line 81 you have 'return()'. This line will never be reached because you just called exit() on the line before. Also, return is not a function, you do not need '()' after it. 11.) Why do you sleep for half a second on line 83? I cannot believe that this script does anything useful. I would recommend playing with the interactive interpreter for awhile until you understand python and what you're doing. Then worry about Pamie vs. urllib. -- http://mail.python.org/mailman/listinfo/python-list