Re: convert script awk in python

2021-03-29 Thread Loris Bennett
Michael Torrie  writes:

> On 3/25/21 1:14 AM, Loris Bennett wrote:
>> Does any one have a better approach?
>
> Not as such.  Running a command and parsing its output is a relatively
> common task. Years ago I wrote my own simple python wrapper function
> that would make it easier to run a program with arguments, and capture
> its output.  I ended up using that wrapper many times, which saved a lot
> of time.
>
> When it comes to converting a bash pipeline process to Python, it's
> worth considering that most of pipelines seem to involve parsing using
> sed or awk (as yours do), which is way easier to do from python without
> that kind of pipelining. However there is a fantastic article I read
> years ago about how generators are python's equivalent to a pipe.
> Anyone wanting to replace a bash script with python should read this:
>
> https://www.dabeaz.com/generators/Generators.pdf

Thanks for the link - very instructive.

> Also there's an interesting shell scripting language based on Python
> called xonsh which makes it much easier to interact with processes like
> bash does, but still leveraging Python to process the output.
> https://xon.sh/ .

That looks very interesting, too.

Cheers,

Loris

-- 
This signature is currently under construction.
-- 
https://mail.python.org/mailman/listinfo/python-list


memory consumption

2021-03-29 Thread Alexey
Hello everyone!
I'm experiencing problems with memory consumption.

I have a class which is doing ETL job. What`s happening inside:
 - fetching existing objects from DB via SQLAchemy
 - iterate over raw data
 - create new/update existing objects
 - commit changes

Before processing data I create internal cache(dictionary) and store all 
existing objects in it.
Every 1 items I do bulk insert and flush. At the end I run commit command.

Problem. Before executing, my interpreter process weighs ~100Mb, after first 
run memory increases up to 500Mb
and after second run it weighs 1Gb. If I will continue to run this class, 
memory wont increase, so I think
it's not a memory leak, but rather Python wont release allocated memory back to 
OS. Maybe I'm wrong.

What I tried after executing:
 - gc.collect()
 - created snapshots with tracemalloc and searched for some garbage, diff = 
   smapshot_before_run - smapshot_after_run
 - searched for links with "objgraph" library to internal cache(dictionary 
   containing elements from DB)
 - cleared the cache(dictionary)
 - db.session.expire_all()

This class is a periodic celery task. So when each worker executes this class 
at least two times,
all celery workers need 1Gb of RAM. Before celery there was a cron script and 
this class was executed via API call
and the problem was the same. So no matter how I run, interpreter consumes 1Gb 
of RAM after two runs.

I see few solutions to this problem
1. Execute this class in separate process. But I had few errors when the same 
SQLAlchemy connection being shared
between different processes.
2. Restart celery worker after executing this task by throwing exception.
3. Use separate queue for such tasks, but then worker will stay idle most of 
the time.
All this is looks like a crutch. Do I have any other options ?

I'm using:
Python - 3.6.13
Celery - 4.1.0
Flask-RESTful - 0.3.6
Flask-SQLAlchemy - 2.3.2

Thanks in advance!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Code Formatter Questions

2021-03-29 Thread Matt Wheeler


> On 29 Mar 2021, at 04:45, Cameron Simpson  wrote:
> 
> yapf has many tunings. Worth a look. It is my preferred formatter. By 
> comparison, black is both opinionated and has basicly no tuning, 
> something I greatly dislike.

This is not a mark or a vote against yapf (I’ve never used it), but I think 
Black’s lack of tuning is one of its greatest strengths.
I’ve found in 2 teams in 2 very different companies, implementing `black 
—check` in CI made the code review process significantly more pleasant for both 
reviewers and review-ees.

I don’t know for sure, but I think the opinionated nature of Black was what 
enabled us to actually get the implementation off the ground (in company 1 
several other attempts to get a code formatter running across the board had 
failed over the years). We completely sidestepped any discussion about which 
features of $tool were going to be enabled or not.

Black has a couple of minor opinions that I disagree with, but the net benefit 
of having everyone producing code in the same style (and importantly no-one 
having to *think* about the style while coding) vastly outweighs any annoyance 
I might have with those minor points, to the extent that I can’t actually 
remember what those disagreements are.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Lars Liedtke
Hello Alexej,

May I stupidly ask, why you care about that in general? Please don't get
me wrong I don't want to criticize you, this is rather meant to be a
(thought) provoking question.
Normally your OS-Kernel and the Python-Interpreter get along pretty well
and whenthere is free memory to be had, or not the necessity to release
allocated memory then why force this? Python will release memory when
needed by running the gc.

Have you tried running your task over all the data you have? Did it
crash your system or prevent other processes from having enough memory?
If not: why care?

I know that there can be (good) reasons to care, but as long as your
tasks run fine, without clogging your system, in my opinion there might
be nothing to worry about.

Cheers

Lars

Am 29.03.21 um 12:12 schrieb Alexey:
> Hello everyone!
> I'm experiencing problems with memory consumption.
>
> I have a class which is doing ETL job. What`s happening inside:
>  - fetching existing objects from DB via SQLAchemy
>  - iterate over raw data
>  - create new/update existing objects
>  - commit changes
>
> Before processing data I create internal cache(dictionary) and store all 
> existing objects in it.
> Every 1 items I do bulk insert and flush. At the end I run commit command.
>
> Problem. Before executing, my interpreter process weighs ~100Mb, after first 
> run memory increases up to 500Mb
> and after second run it weighs 1Gb. If I will continue to run this class, 
> memory wont increase, so I think
> it's not a memory leak, but rather Python wont release allocated memory back 
> to OS. Maybe I'm wrong.
>
> What I tried after executing:
>  - gc.collect()
>  - created snapshots with tracemalloc and searched for some garbage, diff = 
>smapshot_before_run - smapshot_after_run
>  - searched for links with "objgraph" library to internal cache(dictionary 
>containing elements from DB)
>  - cleared the cache(dictionary)
>  - db.session.expire_all()
>
> This class is a periodic celery task. So when each worker executes this class 
> at least two times,
> all celery workers need 1Gb of RAM. Before celery there was a cron script and 
> this class was executed via API call
> and the problem was the same. So no matter how I run, interpreter consumes 
> 1Gb of RAM after two runs.
>
> I see few solutions to this problem
> 1. Execute this class in separate process. But I had few errors when the same 
> SQLAlchemy connection being shared
> between different processes.
> 2. Restart celery worker after executing this task by throwing exception.
> 3. Use separate queue for such tasks, but then worker will stay idle most of 
> the time.
> All this is looks like a crutch. Do I have any other options ?
>
> I'm using:
> Python - 3.6.13
> Celery - 4.1.0
> Flask-RESTful - 0.3.6
> Flask-SQLAlchemy - 2.3.2
>
> Thanks in advance!

-- 
---
punkt.de GmbH
Lars Liedtke
.infrastructure

Kaiserallee 13a 
76133 Karlsruhe

Tel. +49 721 9109 500
https://infrastructure.punkt.de
i...@punkt.de

AG Mannheim 108285
Geschäftsführer: Jürgen Egeling, Daniel Lienert, Fabian Stein

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Alexey
Hello Lars!
Thanks for your interest.

The problem appears when all celery workers
require 1Gb of RAM each in idle state. They
hold this memory constantly and when they do
something useful, they grab more memory. I
think 8Gb+ in idle state is quite a lot for my
app.

> Did it crash your system or prevent other 
processes from having enough memory?
yes. More over, sometimes corporate watchdog
just kills my app
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Julio Oña
It looks like the problem is on celery.
The mentioned issue is still open, so not sure if it was corrected.

https://manhtai.github.io/posts/memory-leak-in-celery/

Julio


El lun, 29 de mar. de 2021 a la(s) 08:31, Alexey (zen.supag...@gmail.com)
escribió:

> Hello Lars!
> Thanks for your interest.
>
> The problem appears when all celery workers
> require 1Gb of RAM each in idle state. They
> hold this memory constantly and when they do
> something useful, they grab more memory. I
> think 8Gb+ in idle state is quite a lot for my
> app.
>
> > Did it crash your system or prevent other
> processes from having enough memory?
> yes. More over, sometimes corporate watchdog
> just kills my app
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Alexey
понедельник, 29 марта 2021 г. в 15:57:43 UTC+3, Julio Oña:
> It looks like the problem is on celery. 
> The mentioned issue is still open, so not sure if it was corrected. 
> 
> https://manhtai.github.io/posts/memory-leak-in-celery/ 

As I mentioned in my first message, I tried to run
this task(class) via Flask API calls, without Celery. 
And results are the same. Flask worker receives the API call and
executes MyClass().run() inside of view. After a few calls 
worker size increases to 1Gb of RAM. In production I have 8 workers,
 so in idle they will hold 8Gb.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Stestagg
On Mon, Mar 29, 2021 at 2:32 PM Alexey  wrote:

> понедельник, 29 марта 2021 г. в 15:57:43 UTC+3, Julio Oña:
> > It looks like the problem is on celery.
> > The mentioned issue is still open, so not sure if it was corrected.
> >
> > https://manhtai.github.io/posts/memory-leak-in-celery/
>
> As I mentioned in my first message, I tried to run
> this task(class) via Flask API calls, without Celery.
> And results are the same. Flask worker receives the API call and
> executes MyClass().run() inside of view. After a few calls
> worker size increases to 1Gb of RAM. In production I have 8 workers,
>  so in idle they will hold 8Gb.
>


Memory statistics in modern OSs are incredibly complex to reason about.
It's /possible/ that, while it certainly looks bad in the monitoring tools,
there actually isn't a problem here.

Some questions here to help understand more:

1. Do you have any actual problems caused by running 8 celery workers
(beyond high memory reports)? What are they?
2. Can you try a test with 16 or 32 active workers (i.e. number of
workers=2x available memory in GB), do they all still end up with 1gb
usage? or do you get any other memory-related issues running this?

Thanks

Steve



> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Alexey
понедельник, 29 марта 2021 г. в 17:19:02 UTC+3, Stestagg:
> On Mon, Mar 29, 2021 at 2:32 PM Alexey  wrote: 

> Some questions here to help understand more: 
>
> 1. Do you have any actual problems caused by running 8 celery workers 
> (beyond high memory reports)? What are they? 
No. Everything works fine.

> 2. Can you try a test with 16 or 32 active workers (i.e. number of 
> workers=2x available memory in GB), do they all still end up with 1gb 
> usage? or do you get any other memory-related issues running this? 
Yes. They will consume 1Gb each. It doesn't matter how many workers I have, 
they behave exactly the same. We can even forget about Flask and Celery. 
If I run this code in Python console, behavior will remain the same.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Dieter Maurer
Alexey wrote at 2021-3-29 06:26 -0700:
>понедельник, 29 марта 2021 г. в 15:57:43 UTC+3, Julio Oña:
>> It looks like the problem is on celery.
>> The mentioned issue is still open, so not sure if it was corrected.
>>
>> https://manhtai.github.io/posts/memory-leak-in-celery/
>
>As I mentioned in my first message, I tried to run
>this task(class) via Flask API calls, without Celery.
>And results are the same. Flask worker receives the API call and
>executes MyClass().run() inside of view. After a few calls
>worker size increases to 1Gb of RAM. In production I have 8 workers,
> so in idle they will hold 8Gb.

Depending on your system (this works for `glibc` systems),
you can instruct the memory management via the envvar
`MALLOC_ARENA_MAX` to use a common memory pool (called "arena")
for all threads.
It is known that this can drastically reduce memory consumption
in multi thread systems.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory consumption

2021-03-29 Thread Stestagg
> > 2. Can you try a test with 16 or 32 active workers (i.e. number of
> > workers=2x available memory in GB), do they all still end up with 1gb
> > usage? or do you get any other memory-related issues running this?
> Yes. They will consume 1Gb each. It doesn't matter how many workers I
> have,
> they behave exactly the same. We can even forget about Flask and Celery.
> If I run this code in Python console, behavior will remain the same.
>
>
Woah, funky, so you got to a situation where your workers were allocating
2x more ram than your system had available?  and they were still working?
Were you hitting lots of swap?

If no big swap load, then it looks like no problem, it's just that the
metrics you're looking at aren't saying what they appear to be.



> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Code Formatter Questions

2021-03-29 Thread dn via Python-list
On 29/03/2021 23.15, Matt Wheeler wrote:
>> On 29 Mar 2021, at 04:45, Cameron Simpson  wrote:
>>
>> yapf has many tunings. Worth a look. It is my preferred formatter. By 
>> comparison, black is both opinionated and has basicly no tuning, 
>> something I greatly dislike.
> 
> This is not a mark or a vote against yapf (I’ve never used it), but I think 
> Black’s lack of tuning is one of its greatest strengths.
> I’ve found in 2 teams in 2 very different companies, implementing `black 
> —check` in CI made the code review process significantly more pleasant for 
> both reviewers and review-ees.
> 
> I don’t know for sure, but I think the opinionated nature of Black was what 
> enabled us to actually get the implementation off the ground (in company 1 
> several other attempts to get a code formatter running across the board had 
> failed over the years). We completely sidestepped any discussion about which 
> features of $tool were going to be enabled or not.
> 
> Black has a couple of minor opinions that I disagree with, but the net 
> benefit of having everyone producing code in the same style (and importantly 
> no-one having to *think* about the style while coding) vastly outweighs any 
> annoyance I might have with those minor points, to the extent that I can’t 
> actually remember what those disagreements are.


+1 (although not for Black - sorry)

Very good point: I'd much rather you spent time helping me with a
design/coding problem, helping debug, and/or reviewing/improving my code
(and I for you); than we had not time left-over after spending many
hours and much mental energy arguing about whether this format is [more]
right than that!
-- 
Regards,
=dn
-- 
https://mail.python.org/mailman/listinfo/python-list