Greetings,
I have Python 3.6 script on Windows to scrape comment history from a
website. It's currently set up this way:
Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter
(single thread)
It takes 15 minutes to process ~11,000 comments.
When I replaced the list with a qu
Christopher Reimer via Python-list wrote:
> Greetings,
>
> I have Python 3.6 script on Windows to scrape comment history from a
> website. It's currently set up this way:
>
> Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter
> (single thread)
>
> It takes 15 minutes to proce
On 8/27/2017 11:54 AM, Peter Otten wrote:
The documentation
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
says you can make the BeautifulSoup object from a string or file.
Can you give a few more details where the queue comes into play? A small
code sample would be ide
On 2017-08-27 20:35, Christopher Reimer via Python-list wrote:
On 8/27/2017 11:54 AM, Peter Otten wrote:
The documentation
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
says you can make the BeautifulSoup object from a string or file.
Can you give a few more details w
Christopher Reimer via Python-list wrote:
> On 8/27/2017 11:54 AM, Peter Otten wrote:
>
>> The documentation
>>
>> https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
>>
>> says you can make the BeautifulSoup object from a string or file.
>> Can you give a few more details wher
On 8/27/2017 1:12 PM, MRAB wrote:
What do you mean by "queue (random order)"? A queue is sequential
order, first-in-first-out.
With 20 threads requesting 20 different pages, they're not going into
the queue in sequential order (i.e., 0, 1, 2, ..., 17, 18, 19) and
coming in at different time
Hello,
I am a (self-learned) python developer and I write a lot of python code
everyday. I try to do as much unit testing as possible. But I want to be better
at it, I want to write more test cases, specially that rely on database
insertions and reads and file IO. Here are my use-cases for test
hi,
liking py, i follow py discuss at pretty some places,
i can say that upto now, py mailing lists are awesome
just make a drop on irc ...
Keep it up guys !
Abdur-Rahmaan Janhangeer,
Mauritius
abdurrahmaanjanhangeer.wordpress.com
On 21 Aug 2017 18:38, "Hamish MacDonald" wrote:
I wanted to
On 2017-08-27 21:35, Christopher Reimer via Python-list wrote:
On 8/27/2017 1:12 PM, MRAB wrote:
What do you mean by "queue (random order)"? A queue is sequential
order, first-in-first-out.
With 20 threads requesting 20 different pages, they're not going into
the queue in sequential order (i
On 8/27/2017 1:31 PM, Peter Otten wrote:
Here's a simple example that extracts titles from generated html. It seems
to work. Does it resemble what you do?
Your example is similar to my code when I'm using a list for the input
to the parser. You have soup_threads and write_threads, but no read_t
On 8/27/2017 1:50 PM, MRAB wrote:
What if you don't sort the list? I ask because it sounds like you're
changing 2 variables (i.e. list->queue, sorted->unsorted) at the same
time, so you can't be sure that it's the queue that's the problem.
If I'm using a list, I'm using a for loop to input ite
Christopher Reimer writes:
> I have 20 read_threads requesting and putting pages into the output
> queue that is the input_queue for the parser.
Given how slow parsing is, you probably want to scrap the pages into
disk files, and then run the parser in parallel processes that read from
the disk.
Christopher Reimer via Python-list wrote:
> On 8/27/2017 1:31 PM, Peter Otten wrote:
>
>> Here's a simple example that extracts titles from generated html. It
>> seems to work. Does it resemble what you do?
> Your example is similar to my code when I'm using a list for the input
> to the parser.
Ah, shoot me. I had a .join() statement on the output queue but not on
in the input queue. So the threads for the input queue got terminated
before BeautifulSoup could get started. I went down that same rabbit
hole with CSVWriter the other day. *sigh*
Thanks for everyone's help.
Chris R.
--
h
Anubhav Yadav writes:
> I want to write more test cases, specially that rely on database
> insertions and reads and file IO.
Thanks for taking seriously the importance of test cases for your code!
One important thing to recognise is that a unit test is only one type of
test. It tests one unit o
15 matches
Mail list logo