Re: Counting Python threads vs C/C++ threads

2019-07-17 Thread Barry Scott



> On 16 Jul 2019, at 20:48, Dan Stromberg  wrote:
> 
> 
> 
> On Tue, Jul 16, 2019 at 11:13 AM Barry Scott  > wrote:
> I'm going to assume you are on linux.
> Yes, I am.  Ubuntu 16.04.6 LTS sometimes, Mint 19.1 other times.
> 
> On 16 Jul 2019, at 18:35, Dan Stromberg  > wrote:
> > 
> > I'm looking at a performance problem in a large CPython 2.x/3.x codebase
> > with quite a few dependencies.
> > 
> > I'm not sure what's causing the slowness yet.  The CPU isn't getting hit
> > hard, and I/O on the system appears to be low - but throughput is poor.
> > I'm wondering if it could be CPU-bound Python threads causing the problem
> > (because of the threading+GIL thing).
> 
> Does top show the process using 100% CPU?
> Nope.  CPU utilization and disk use are both low.

Then your problem is latency. You need to find the slow operation.

> We've been going into top, and then hitting '1' to see things broken down by 
> CPU core (there are 32 of them, probably counting hyperthreads as different 
> cores), but the CPU use is in the teens or so.
> 
> I've also tried dstat and csysdig.  The hardware isn't breaking a sweat, but 
> throughput is poor.

> > The non-dependency Python portions don't Appear to have much in the way of
> > threading going on based on a quick grep, but csysdig says a process
> > running the code has around 32 threads running - the actual thread count
> > varies, but that's the ballpark.
> > 
> > I'm wondering if there's a good way to find two counts of those threads -
> > how many are from CPython code that could run afoul of the GIL, and how
> > many of them are from C/C++ extension modules that wouldn't be responsible
> > for a GIL issue.
> 
> >From the docs on threading:
> 
> threading.active_count()
>  
> 
> Return the number of Thread 
> 
>  objects currently alive. The returned count is equal to the length of the 
> list returned by enumerate() 
> .
> 
> Are you on a Mac?

Opss a file: link sorry should have search the online docs.

I use many operating systems: Fedora, macOS, Windows, NetBSD, CentOS and others 
in the past.

> 
> https://docs.python.org/2/library/threading.html 
>  appears to have some good 
> info. I'll probably try logging threading.active_count()
>  
> A question arises though: Does threading.active_count() only show Python 
> threads created with the threading module?  What about threads created with 
> the thread module?

Only pythons threads, if you think about it why would python care about threads 
it does not control?


> 
> Try running strace on the process to see what system calls its making.
> I've tried it, but thank you.  It's a good suggestion.
> 
> I often find that when strace'ing a program, there's a bunch of 
> mostly-irrelevant stuff at Initial Program Load (IPL), but then the main loop 
> fails into a small cycle of system calls.

And what are thoses sys calls and what is the timing of them?
If you are use select/poll how long before the call returns.
If you in a read how long before it returns.

> 
> Not with this program.  Its main loop is busy and large.

Does the code log any metrics or telemetry to help you?
I work on a product that produces time-series data to show key information 
about the service.
TPS, cache hit rates etc.

Should have mention before you can run the code under python's cprofile.

Do a test run against the process and then run analysis on the data that 
cprofile
produces to find out elapse times and cpu times of the code.

> 
> You could also connect gdb to the process and find out what code the threads 
> are running.
> 
> I used to use gdb, and wrappers for gdb, when I was doing C code, but I don't 
> have much experience using it on a CPython interrpreter.
> 
> Would I be doing a "thread apply all bt" or what?  I'm guessing those 
> backtraces could facilitate identifying the origin of a thread.

Yes  thread apply all bt works great on a python process. recent gdb releases 
knows how to format the stack and show you the python stack,
forgot the command, but its easy to google for.

Barry


> 
> Thanks a bunch.
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Counting Python threads vs C/C++ threads

2019-07-17 Thread Thomas Jollans
On 17/07/2019 09.58, Barry Scott wrote:
>
>> On 16 Jul 2019, at 20:48, Dan Stromberg  wrote:
>>
>>
>>  
>> A question arises though: Does threading.active_count() only show Python 
>> threads created with the threading module?  What about threads created with 
>> the thread module?
> Only pythons threads, if you think about it why would python care about 
> threads it does not control?


As the docs say, this counts threading.Thread objects. It does not count
all threads started from Python: threads started with the _thread
module, for instance, are not included.

What is more, threads started in external libraries can acquire the GIL
and run Python code. A great example of this are QThreads in a PyQt5
application: QThreads are started by the Qt runtime, which calls a Qt
slot. This Qt slot then might happen to be implemented in Python. I'm
sure other libraries do similar things.

Example with _thread just to check active_count behaviour:

#!/usr/bin/env python3

import threading
import _thread
import time

def thread_func(i):
    print('Starting thread', i)
    time.sleep(0.5)
    print('Thread done', i)

print('Using threading.Thread')
t1 = threading.Thread(target=thread_func, args=(1,))
t1.start()
time.sleep(0.1)
print('active threads:', threading.active_count())
t1.join()


print('Using threading & _thread')
t1 = threading.Thread(target=thread_func, args=(1,))
t1.start()
t2_id = _thread.start_new_thread(thread_func, (2,))
time.sleep(0.1)
print('active threads:', threading.active_count())
time.sleep(0.6)
print('Done, hopefully')



-- 
https://mail.python.org/mailman/listinfo/python-list


Why an InitVar pseudo field in dataclasses cannot have a default_factory?

2019-07-17 Thread Jacobo de Vera
Hi all,

I was surprised by an error when trying to set a default_factory for an
InitVar pseudo-field in a dataclass. Inspecting the code in dataclasses.py
led me to this:

# Special restrictions for ClassVar and InitVar.
if f._field_type in (_FIELD_CLASSVAR, _FIELD_INITVAR):
if f.default_factory is not MISSING:
raise TypeError(f'field {f.name} cannot have a '
'default factory')
# Should I check for other field settings? default_factory
# seems the most serious to check for.  Maybe add others.  For
# example, how about init=False (or really,
# init=)?  It makes no sense for
# ClassVar and InitVar to specify init=.

So this case is very explicitly prevented but I could not see why. Does
anybody know what problem this is trying to prevent or what is the
rationale behind this restriction?

I asked in stackoverflow[1] and I was suggested to ask here.

[1] https://stackoverflow.com/questions/57056029

Thanks,
Jacobo de Vera
@jovianjake
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT: Is there a name for this transformation?

2019-07-17 Thread kamaraju kusumanchi
On Wed, Jul 10, 2019 at 3:08 PM Peter J. Holzer  wrote:
>
> On 2019-07-10 08:57:29 -0400, kamaraju kusumanchi wrote:
> > Given a csv file with the following contents
> >
> > 20180701, A
> > 20180702, A, B
> > 20180703, A, B, C
> > 20180704, B, C
> > 20180705, C
> >
> > I would like to transform the underlying data into a dataframe such as
> >
> > date, A, B, C
> > 20180701,  True, False, False
> > 20180702,  True,  True, False
> > 20180703,  True,  True,  True
> > 20180704, False,  True,  True
> > 20180705, False, False,  True
> >
> > the idea is that the first field in each line of the csv is the row
> > index of the dataframe. The subsequent fields will be its column names
> > and the values in the dataframe tell whether that element is present
> > or not in the line.
> >
> > Is there a name for this transformation?
>
> This type of output is usually called a cross table, but I don't know
> whether this specific transformation has a name (if you had only one of
> A, B, and C per line it would be a kind of pivot operation).

Thanks for telling me about cross table. I found out about
cross-tabulation functionality in Pandas using pandas.crosstab() which
is described in
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html

As for my original problem, I solved it as follows:

$cat data.csv
20180701, A
20180702, A, B
20180703, A, B, C
20180704, B, C
20180705, C


import pandas as pd
import numpy as np

# expand the data into two numpy arrays such as
# a = np.array(['20180701', '20180702', '20180702', '20180703',
'20180703', '20180703', '20180704', '20180704', '20180705'])
# b = np.array(['A', 'A', 'B', 'A', 'B', 'C', 'B', 'C', 'C'])

rows = []
cols = []

with open('data.csv') as fo:
for line in fo:
line = line.strip()
elem = line.split(',')
N = len(elem)
rows += elem[0:1] * (N-1)
cols += elem[1:]

a = np.array(rows)
b = np.array(cols)

df = pd.crosstab(a, b, rownames=['date']).astype('bool').reset_index()

which gives

print(df)
col_0  date  A  B  C
0  20180701   True  False  False
1  20180702   True   True  False
2  20180703   True   True   True
3  20180704  False   True   True
4  20180705  False  False   True

-- 
Kamaraju S Kusumanchi | http://raju.shoutwiki.com/wiki/Blog
-- 
https://mail.python.org/mailman/listinfo/python-list


Embedding Python in C

2019-07-17 Thread jesse . ibarra . 1996
I am using Python3.6:

[jibarra@redsky ~]$ python3.6
Python 3.6.8 (default, Apr 25 2019, 21:02:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.


I am 
referencing:https://docs.python.org/3.6/extending/embedding.html#beyond-very-high-level-embedding-an-overview

Is there a way to call a shared C lib using PyObjects?

Please advise.

Thank you.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python in C

2019-07-17 Thread Barry Scott



> On 17 Jul 2019, at 16:57, jesse.ibarra.1...@gmail.com wrote:
> 
> I am using Python3.6:
> 
> [jibarra@redsky ~]$ python3.6
> Python 3.6.8 (default, Apr 25 2019, 21:02:35) 
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> 
> 
> I am 
> referencing:https://docs.python.org/3.6/extending/embedding.html#beyond-very-high-level-embedding-an-overview
> 
> Is there a way to call a shared C lib using PyObjects?

If what you want to call is simple enough then you can use the ctypes library
that ships with python.

If the code you want to call is more complex you will want to use one of a 
number of libraries to help
you create a module that you can import.

I use PyCXX for this purpose that allows me to write C++ code that can call C++ 
and C libs and interface
easily with python. Home page http://cxx.sourceforge.net/ 
 the source kit contains demo code that you shows
how to cerate a module, a class and function etc. 

Example code: 
https://sourceforge.net/p/cxx/code/HEAD/tree/trunk/CXX/Demo/Python3/simple.cxx 


Barry
PyCXX maintainer

> 
> Please advise.
> 
> Thank you.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python in C

2019-07-17 Thread Jesse Ibarra
On Wednesday, July 17, 2019 at 11:55:28 AM UTC-6, Barry Scott wrote:
> > On 17 Jul 2019, at 16:57,  wrote:
> > 
> > I am using Python3.6:
> > 
> > [jibarra@redsky ~]$ python3.6
> > Python 3.6.8 (default, Apr 25 2019, 21:02:35) 
> > [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
> > Type "help", "copyright", "credits" or "license" for more information.
> > 
> > 
> > I am 
> > referencing:https://docs.python.org/3.6/extending/embedding.html#beyond-very-high-level-embedding-an-overview
> > 
> > Is there a way to call a shared C lib using PyObjects?
> 
> If what you want to call is simple enough then you can use the ctypes library
> that ships with python.
> 
> If the code you want to call is more complex you will want to use one of a 
> number of libraries to help
> you create a module that you can import.
> 
> I use PyCXX for this purpose that allows me to write C++ code that can call 
> C++ and C libs and interface
> easily with python. Home page http://cxx.sourceforge.net/ 
>  the source kit contains demo code that you shows
> how to cerate a module, a class and function etc. 
> 
> Example code: 
> https://sourceforge.net/p/cxx/code/HEAD/tree/trunk/CXX/Demo/Python3/simple.cxx
>  
> 
> 
> Barry
> PyCXX maintainer
> 
> > 
> > Please advise.
> > 
> > Thank you.
> > -- 
> > https://mail.python.org/mailman/listinfo/python-list
> >

My options seem rather limited, I need to make a Pipeline from (Smalltalk -> C 
-> Python) then go back (Smalltalk <- C <- Python). Since Smalltalk does not 
support Python directly I have to settle with the C/Python API 
(https://docs.python.org/3.6/extending/embedding.html#beyond-very-high-level-embedding-an-overview).
 Any suggestions?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding classes' names

2019-07-17 Thread DL Neil

On 16/07/19 10:57 PM, Cameron Simpson wrote:

On 16Jul2019 10:20, Chris Angelico  wrote:
On Tue, Jul 16, 2019 at 10:17 AM DL Neil 
 wrote:
When used, do you embed a class's name within its own code, as a 
literal?

[...]

So, what about other situations where one might need to access the
class's own name or that of its/a super-class? eg

class C2(C1):
    def __init__(self, fred, barney ):
    super().__init__( fred )
    self.barney = barney

    def __repr__( self ):
    return f"C2( { self.fred }, { self.barney }"
    ### note: 'common practice' of "C2" embedded as constant


How 'purist' do you go, cf YAGNI?


In the case of __repr__, I would most definitely use
self.__class__.__name__, because that way, a subclass can leave repr
untouched and still get decent behaviour.


Yeah, me too, though I spell it "type(self).__name__" for no totally 
rational reason.


+1
(looking through my code - perhaps at one time it was 'the old way'?)

--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python in C

2019-07-17 Thread Christian Gollwitzer

Am 17.07.19 um 20:39 schrieb Jesse Ibarra:

My options seem rather limited, I need to make a Pipeline from (Smalltalk -> C -> 
Python) then go back (Smalltalk <- C <- Python). Since Smalltalk does not support 
Python directly I have to settle with the C/Python API 
(https://docs.python.org/3.6/extending/embedding.html#beyond-very-high-level-embedding-an-overview).
 Any suggestions?



Ah, now you finally tell us your problem!

Depending on, how complete / advanced / efficient the bridge needs to 
be, it can be easy or hard.


What level of integration do you want to achieve? Do you want

a) to call Python functions from Smalltalk
b) call Smalltalk functions from Python
c) pass callbacks around, e.g. use a Smalltalk function within a Python 
list comprehension, and if so, which way
d) integrate the class systems - derive a Python class from a Smalltalk 
base or the other way round


e) ?


The most basic thing is a), but even getting that right might be 
non-trivial, since both C APIs will have different type systems which 
you need to match. I don't speak Smalltalk, so can't comment in detail 
on this - but in practice it will also depend on the implementation you 
are using.


Best regards,

Christian

--
https://mail.python.org/mailman/listinfo/python-list


Re: super() in Python 3

2019-07-17 Thread DL Neil

On 16/07/19 10:08 PM, אורי wrote:

Hi,

1. When we use super() in Python 3, we don't pass it the first argument
(self). Why?

What happens if the first argument is not self?

def __init__(self, *args, **kwargs):
 super().__init__(*args, **kwargs)

I think it would make more sense to use something like
self.super().__init__(*args, **kwargs) or something like this.


NB folk coming to Python from other (programming) languages are often 
surprised to discover that a sub-class does not automatically execute 
the initialiser/constructor of its parent. The extra control offered is 
(IMHO) both subtle and powerful!



I'm not sure about this (and perhaps better minds will clarify):
isn't self about an instance, whereas super() is about a class?


Another way to look at it is to refer to the super-class as the 
'parent'. Thinking of yourself, do?did you address him as "my father" 
(self.father) or her as "my mother", or is the possessive description 
(the "my"), unnecessary?


Whilst it wouldn't be at all wrong to address him as "my father", in 
English (and/or in other tongues), what happened when you tried it in 
Python?



Perhaps then, we only use the "my" descriptor when we need to 
distinguish between multiple parents, eg yours cf mine? Even with 
multiple-inheritance*, Python's "MRO" saves us from needing to do that!

* yes Python can, even if xyz-other-language couldn't!


(Python's ability to track all this sounds like something from the xkcd 
Comic: with Python you will never again cry, "I've lost my mummy"...)

--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


join and split with empty delimiter

2019-07-17 Thread Irv Kalb
I have always thought that split and join are opposite functions.  For example, 
you can use a comma as a delimiter:

>>> myList = ['a', 'b', 'c', 'd', 'e']
>>> myString = ','.join(myList)
>>> print(myString)
a,b,c,d,e

>>> myList = myString.split(',')
>>> print(myList)
['a', 'b', 'c', 'd', 'e']

Works great. But i've found a case where they don't work that way.  If I join 
the list with the empty string as the delimiter:

>>> myList = ['a', 'b', 'c', 'd']
>>> myString = ''.join(myList)
>>> print(myString)
abcd

That works great.  But attempting to split using the empty string generates an 
error:

>>> myString.split('')
Traceback (most recent call last):
  File "", line 1, in 
myString.split('')
ValueError: empty separator

I know that this can be accomplished using the list function:

>>> myString = list(myString)
>>> print(myString)
['a', 'b', 'c', 'd']

But my question is:  Is there any good reason why the split function should 
give an "empty separator" error?  I think the meaning of trying to split a 
string into a list using the empty string as a delimiter is unambiguous - it 
should just create a list of single characters strings like the list function 
does here.  

My guess is that by definition, the split function attempts to separate the 
string wherever it finds the delimiter between characters, and because in this 
case its the empty string, it gives an error.  But if it's going to check for 
the empty string anyway, it could just call the list function and return a list 
of characters.

Irv
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: join and split with empty delimiter

2019-07-17 Thread Chris Angelico
On Thu, Jul 18, 2019 at 7:06 AM Irv Kalb  wrote:
> If I join the list with the empty string as the delimiter:
>
> >>> myList = ['a', 'b', 'c', 'd']
> >>> myString = ''.join(myList)
> >>> print(myString)
> abcd
>
> That works great.  But attempting to split using the empty string generates 
> an error:
>
> >>> myString.split('')
> Traceback (most recent call last):
>   File "", line 1, in 
> myString.split('')
> ValueError: empty separator
>
> But my question is:  Is there any good reason why the split function should 
> give an "empty separator" error?  I think the meaning of trying to split a 
> string into a list using the empty string as a delimiter is unambiguous - it 
> should just create a list of single characters strings like the list function 
> does here.
>

Agreed. There are a number of other languages where splitting on an
empty delimiter simply fractures the string into characters (I checked
Pike, JavaScript, Tcl, and Ruby), and it's a practical and useful
feature. +1.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: join and split with empty delimiter

2019-07-17 Thread Tim Daneliuk
On 7/17/19 4:24 PM, Chris Angelico wrote:
> Agreed. There are a number of other languages where splitting on an
> empty delimiter simply fractures the string into characters (I checked
> Pike, JavaScript, Tcl, and Ruby), and it's a practical and useful
> feature. +1.

Not only that, it makes the language more symmetric/consistent. Put
me down for +1 as well.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: join and split with empty delimiter

2019-07-17 Thread MRAB

On 2019-07-18 00:24, Tim Daneliuk wrote:

On 7/17/19 4:24 PM, Chris Angelico wrote:

Agreed. There are a number of other languages where splitting on an
empty delimiter simply fractures the string into characters (I checked
Pike, JavaScript, Tcl, and Ruby), and it's a practical and useful
feature. +1.


Not only that, it makes the language more symmetric/consistent. Put
me down for +1 as well.

Since the fix in the re module in Python 3.7, it can split on an empty 
string:


>>> import re
>>> re.split('', 'abc')
['', 'a', 'b', 'c', '']

which gives us the chance to bikeshed on whether str.split should do the 
same.


(In case you're wondering, there _is_ an empty string (the delimiter) 
before the first character and after the last character.)

--
https://mail.python.org/mailman/listinfo/python-list


Re: Embedding Python in C

2019-07-17 Thread dieter
Jesse Ibarra  writes:
> ...
> My options seem rather limited, I need to make a Pipeline from (Smalltalk -> 
> C -> Python) then go back (Smalltalk <- C <- Python). Since Smalltalk does 
> not support Python directly I have to settle with the C/Python API 
> (https://docs.python.org/3.6/extending/embedding.html#beyond-very-high-level-embedding-an-overview).
>  Any suggestions?

Decades ago, I implemented something similar
(Chomsky <-> C/C++ <-> Python). Thus, it is possible.
Ensure that you acquire the GIL when you enter the Python world
(almost all Python API function have the implicit precondition
that the GIL is hold); and release it when you leave the
Python world. I forgot whether it was necessary to initialize
Python as a whole (likely, it is).

Maybe, you consider also a looser coupling.
In my scenario above, the various components could be integrated
via CORBA (= "Common Object Request Broker Architecture").

-- 
https://mail.python.org/mailman/listinfo/python-list