[web2py] Re: Thread with own db connection

Alfonso Serra Mon, 16 May 2016 13:00:56 -0700

The difference is, from my modest knowledge about the scheduler, the 
following.


The scenario is:
- Users have to be able to import a csv to one of the tables. csv's may be 
big 8mb, 40k rows. (worst case)
- Users may do this whenever they want, so concurrency would occur.

Implementation problems:
i cannot process the rows in my controller because responses longer than 5 
minutes are timeout by the hosting service (this is pythonanywhere). Ok 
this make sense, i launch a background process then.

Scheduler problems:
- Enabling the scheduler adds overhead to the database, a lot i might say. 
- I have to manually run the scheduler on the server. This is bad because 
after a few days it becomes unresponsive and i have to kill it and restart 
it again, manually. PythonAnywhere stops their servers 30mins once in a 
month for mantainance, so i have to watch for it as well.

- The scheduler is writing each few seconds for worker's heartbeats, in 
order to know how many workers are avalaible. The more workers the more 
overhead.

- Since many users should be able to import at the same time i have to 
declare multiple workers beforehand 
Even if no one is importing anything, the db is continuosly doin io 
operations.

Ok i  dont know how many users would be importing at the same time so i 
declare like 3 or 4 workers to be running just in case. So only 4 users 
would be able to import data at the same time and the db is writing 4 times 
each few seconds, all the time.

Going on with all these, i have to make a progress bar each process.
So i write the task's run output to indicate the percentage done each 5%. 
This way im not writing the percentage each time a row is inserted. A lot 
more overhead to the db.

Not only that, the client's browser has to ask the server for the 
percentage. So it has to ask the db as well, lets say  each 5 seconds 
(progress bar update interval). x number of clients importing at the same 
time.

To sum up, while im performing an intensive db operation:

- Scheduler is writing heartbeats each few seconds for each available 
worker.
- Running Tasks are writing percentages each 5%. 
- Browser is asking the db each 5 seconds for task's progress.

No wonder why is slower in comparison. 
The thread and all statistics are in memory just while its running, Im not 
limited to a fixed amount of users importing at the same time. They are 
launched on demand instead all the time.

The times are those more or less, same importing function of course, bot 
using DAL. 
Thread DAL: 4 mins aprox
Threads mysql.connector: 2/3mins
Scheduler: 20/30+mins

Of course ill stick with DAL

The scheduler might be good for mailing operations or mantainance but 
importing bulk data, not so much.

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[web2py] Re: Thread with own db connection

Reply via email to