On 9/6/2017 9:26 PM, Christopher Reimer wrote:
On Sep 6, 2017, at 9:14 PM, Stefan Ram wrote:
I can run this (your code) without an error here (Python 3.6.0),
from a file named "Scraper1.py":
I'll check tomorrow. I recently switched from 3.5.x to 3.6.1 in the PyCharm
IDE. It's probably FU
On 9/6/2017 7:41 PM, Stefan Ram wrote:
The following code runs here:
Your code runs but that's not how I have mine code set up. Here's the
revised code:
class Requestor(object):
def __init__(self, user_id, user_name ):
self._page_start = -1
@property
def page_start(sel
Greetings,
My web scraper program has a top-level class for managing the other
classes. I went to set up a property for the top-level class that
changes the corresponding property in a different class.
class Scraper(object):
def __init__(self, user_id, user_name):
self.requestor
Greetings,
After reading everyone's comments and doing a little more research, I
re-implemented my function as a callable class.
def __call__(self, key, value):
if key not in self._methods:
return value
return self._methods[key](value)
This behaves like my prev
Greetings,
I was playing around this piece of example code (written from memory).
def filter_text(key, value):
def do_nothing(text): return text
return {'this': call_this,
'that': call_that,
'what': do_nothing
}[key](value)
Is
Ah, shoot me. I had a .join() statement on the output queue but not on
in the input queue. So the threads for the input queue got terminated
before BeautifulSoup could get started. I went down that same rabbit
hole with CSVWriter the other day. *sigh*
Thanks for everyone's help.
Chris R.
--
h
On 8/27/2017 1:50 PM, MRAB wrote:
What if you don't sort the list? I ask because it sounds like you're
changing 2 variables (i.e. list->queue, sorted->unsorted) at the same
time, so you can't be sure that it's the queue that's the problem.
If I'm using a list, I'm using a for loop to input ite
On 8/27/2017 1:31 PM, Peter Otten wrote:
Here's a simple example that extracts titles from generated html. It seems
to work. Does it resemble what you do?
Your example is similar to my code when I'm using a list for the input
to the parser. You have soup_threads and write_threads, but no read_t
On 8/27/2017 1:12 PM, MRAB wrote:
What do you mean by "queue (random order)"? A queue is sequential
order, first-in-first-out.
With 20 threads requesting 20 different pages, they're not going into
the queue in sequential order (i.e., 0, 1, 2, ..., 17, 18, 19) and
coming in at different time
On 8/27/2017 11:54 AM, Peter Otten wrote:
The documentation
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
says you can make the BeautifulSoup object from a string or file.
Can you give a few more details where the queue comes into play? A small
code sample would be ide
Greetings,
I have Python 3.6 script on Windows to scrape comment history from a
website. It's currently set up this way:
Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter
(single thread)
It takes 15 minutes to process ~11,000 comments.
When I replaced the list with a qu
On 5/20/2017 1:19 AM, dieter wrote:
If your (590) pages are linked together (such that you must fetch
a page to get the following one) and page fetching is the limiting
factor, then this would limit the parallelizability.
The pages are not linked together. The URL requires a page number. If I
Greetings,
I was playing around with a piece of code to remove lowercase letters
and leave behind uppercase letters from a string when I got unexpected
results.
string = 'Whiskey Tango Foxtrot'
list(filter((lambda x: not x.islower()), string))
['W', ' ', 'T', ' ', 'F']
Note the
On 4/26/2016 8:56 PM, Random832 wrote:
what exactly do you mean by property decorators? If you're just
accessing them in a dictionary what's the benefit over having the
values be simple attributes rather than properties?
After considering the feedback I got for sanity checking my code, I've
d
14 matches
Mail list logo