Re: ctypes, memory mapped files and context manager
Hans-Peter Jansen wrote: > On Mittwoch, 28. Dezember 2016 16:53:53 Hans-Peter Jansen wrote: >> On Mittwoch, 28. Dezember 2016 15:17:22 Hans-Peter Jansen wrote: >> > On Mittwoch, 28. Dezember 2016 13:48:48 Peter Otten wrote: >> > > Hans-Peter Jansen wrote: >> > > > Dear Peter, >> > > > >> > > > thanks for taking valuable time to look into my issue. >> > > >> > > You're welcome! >> > > >> > > @contextmanager >> > > def map_struct(m, n): >> > > m.resize(n * mmap.PAGESIZE) >> > > keep_me = T.from_buffer(m) >> > > yield weakref.proxy(keep_me) >> > >> > Hooray, that did the trick. Great solution, thank you very much! >> >> Sorry for bothering you again, Peter, but after applying it to the real >> project, that fails consistently similar to: > > $ python3 mmap_test_weakref.py > Traceback (most recent call last): > File "mmap_test_weakref.py", line 32, in > assert(bytes(c) == bytes(rest)) > AttributeError: 'c_ubyte_Array_8188' object has no attribute '__bytes__' > > > $ cat mmap_test_weakref.py > import ctypes > import mmap > import weakref > > from contextlib import contextmanager > > class T(ctypes.Structure): > _fields_ = [("foo", ctypes.c_uint32)] > > > @contextmanager > def map_struct(m, n, struct, offset = 0): > m.resize(n * mmap.PAGESIZE) > inst = struct.from_buffer(m, offset) > yield weakref.proxy(inst) > > SIZE = mmap.PAGESIZE > f = open("tmp.dat", "w+b") > f.write(b"\0" * SIZE) > f.seek(0) > m = mmap.mmap(f.fileno(), mmap.PAGESIZE) > > with map_struct(m, 1, T) as a: > a.foo = 1 > with map_struct(m, 2, T) as b: > b.foo = 2 > > offset = ctypes.sizeof(T) > rest = m.size() - offset > overhang = ctypes.c_ubyte * rest > with map_struct(m, 2, overhang, offset) as c: > assert(bytes(c) == bytes(rest)) > > > With weakref and mmap.resize() disabled, this acts as expected. > BTW: mmapped files need the first page initialized, the rest is done in > the kernel (filled with zeros on resize). The minimal example is >>> import weakref, ctypes >>> T = ctypes.c_ubyte * 3 >>> t = T() >>> bytes(t) == b"\0" * 3 True >>> bytes(weakref.proxy(t)) == b"\0" * 3 Traceback (most recent call last): File "", line 1, in AttributeError: 'c_ubyte_Array_3' object has no attribute '__bytes__' That looks like a leaky abstraction. While I found a workaround >>> bytes(weakref.proxy(t)[:]) == b"\0" * 3 True to me your whole approach is beginning to look very questionable. You know, "If the implementation is hard to explain, it's a bad idea." What do you gain from using the mmap/ctypes combo instead of regular file operations and the struct module? Your sample code seems to touch every single byte of the file once so that there are little to no gains from caching. And then your offset is basically a file position managed manually instead of implicitly with read, write, and seek. -- https://mail.python.org/mailman/listinfo/python-list
Just added AnyChart JS Charts integration templates for easier dataviz with Python (+ Flask/Django) and MySQL
Hi all, We at AnyChart JS Charts http://www.anychart.com have just released a series of 20+ integration templates to help web developers add interactive charts, maps, stock and financial graphs, Gantt charts, and dashboards to web apps much easier, no matter what your stack is. In particular, now there are two templates for Python in our Technical Integration collection http://www.anychart.com/integrations/, all distributed under the Apache 2.0 License and forkable on GitHub: 1) Python, Flask and MySQL https://github.com/anychart-integrations/python-flask-mysql-template 2) Python, Django and MySQL https://github.com/anychart-integrations/python-django-mysql-template You are welcome to check those out and ask your questions if any. Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: ctypes, memory mapped files and context manager
On Donnerstag, 29. Dezember 2016 09:33:59 Peter Otten wrote: > Hans-Peter Jansen wrote: > > On Mittwoch, 28. Dezember 2016 16:53:53 Hans-Peter Jansen wrote: > > The minimal example is > > >>> import weakref, ctypes > >>> T = ctypes.c_ubyte * 3 > >>> t = T() > >>> bytes(t) == b"\0" * 3 > > True > > >>> bytes(weakref.proxy(t)) == b"\0" * 3 > > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'c_ubyte_Array_3' object has no attribute '__bytes__' > > That looks like a leaky abstraction. While I found a workaround > > >>> bytes(weakref.proxy(t)[:]) == b"\0" * 3 > > True I found a couple of other rough corners already, when working with the ctypes module. Obviously, this module is lacking some love. > to me your whole approach is beginning to look very questionable. You know, > > "If the implementation is hard to explain, it's a bad idea." > > What do you gain from using the mmap/ctypes combo instead of regular file > operations and the struct module? Your sample code seems to touch every > single byte of the file once so that there are little to no gains from > caching. And then your offset is basically a file position managed manually > instead of implicitly with read, write, and seek. Of course, the real code is a bit more complex... The code presented here is for demonstration purposes only. I'm not allowed to reveal the projects' code, but I can state, that using this combination allows for crawling through huge files (5-25GB) in unbelievable performance (without any further optimization), and updating parts of it. By delegating the whole I/O management to the kernel, one can observe, that python runs at full speed managing the data just by reference and assignment operations, all (mostly) in place. The resource usage is impressively low at the same time. Since the code is meant to be executed with many instances in parallel on a single machine, this is an important design criteria. While I would love to get rid of these dreaded and unpythonic del statements, I can accept them for now, until a better approach is found. Will dig through the ctypes module again, when I find time. Thanks again for taking your valuable time, Peter. Much appreciated. I wish you a Happy New Year! Cheers, Pete -- https://mail.python.org/mailman/listinfo/python-list
data
Hi all, I have a sample of data set and would like to summarize in the following way. ID,class,y 1,12,10 1,12,10 1,12,20 1,13,20 1,13,10 1,13,10 1,14,20 2,21,20 2,21,20 2,21,10 2,23,10 2,23,20 2,34,20 2,34,10 2,35,10 I want get the total count by ID, and the the number of classes by ID. The y variable is either 10 or 20 and count each by iD The result should look like as follows. ID,class,count,10's,20's 1,3,7,4,3 2,4,8,4,4 I can do this in two or more steps. Is there an efficient way of doing it? I used pd.crosstab(a['ID'],a['y'],margins=True) and got ID,10's,20's all 1,4,3,7 2,4,4,8 but I want get the class count as well like as follows ID,class,10's,20's,all 1,3,4,3,7 2,4,4,4,8 how do I do it in python? thank you in advance -- https://mail.python.org/mailman/listinfo/python-list
Re: sorting strings numerically while dealing with missing values
On Wed, Dec 28, 2016 at 3:43 PM, Ian Kelly wrote: > On Wed, Dec 28, 2016 at 2:14 PM, Larry Martell > wrote: >> >> I have a list containing a list of strings that I want to sort >> numerically by one of the fields. I am doing this: >> >> sorted(rows, key=float(itemgetter(sortby))) > > I'm guessing that you left out a lambda here since the key argument > takes a function. > >> Which works fine as long as all the sort keys convert to a float. >> Problem is that some are blank or None and those throw an exception. >> How can I handle that case and still sort? I'd want the blank or None >> fields to come out either at the beginning or end of the sorted list >> (not sure what the customer wants for this yet). > > > def sort_key(sortby, none_first=False): > def key(row): > try: > value = float(row[sortby]) > except ValueError: > value = None > return ((value is None) != none_first, value) > return key > > sorted(rows, key=sort_key(4, none_first=True)) Thanks. I got this working using a function similar to this. -- https://mail.python.org/mailman/listinfo/python-list
Re: ctypes, memory mapped files and context manager
On Thu, Dec 29, 2016 at 12:18 PM, Hans-Peter Jansen wrote: >> >>> import weakref, ctypes >> >>> T = ctypes.c_ubyte * 3 >> >>> t = T() >> >>> bytes(t) == b"\0" * 3 >> >> True >> >> >>> bytes(weakref.proxy(t)) == b"\0" * 3 >> >> Traceback (most recent call last): >> File "", line 1, in >> AttributeError: 'c_ubyte_Array_3' object has no attribute '__bytes__' >> >> That looks like a leaky abstraction. While I found a workaround >> >> >>> bytes(weakref.proxy(t)[:]) == b"\0" * 3 >> >> True > > I found a couple of other rough corners already, when working with the ctypes > module. Obviously, this module is lacking some love. That's not the fault of ctypes. There's no requirement for objects that implement the buffer protocol to also implement __bytes__. You'd have the same problem if you tried to proxy a memoryview. However, using a proxy seems particularly worthless for ctypes. Type checking is integral to the design of ctypes, and a weakproxy won't work: >>> a = (ctypes.c_ubyte * 3)() >>> ctypes.addressof(a) 139959173036992 >>> ctypes.pointer(a) <__main__.LP_c_ubyte_Array_3 object at 0x7f4ac8caba60> >>> ctypes.string_at(a, 3) b'\x00\x00\x00' >>> p = weakref.proxy(a) >>> type(p) >>> ctypes.addressof(p) Traceback (most recent call last): File "", line 1, in TypeError: invalid type >>> ctypes.pointer(p) Traceback (most recent call last): File "", line 1, in TypeError: _type_ must have storage info >>> ctypes.string_at(p, 3) Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.5/ctypes/__init__.py", line 491, in string_at return _string_at(ptr, size) ctypes.ArgumentError: argument 1: : wrong type -- https://mail.python.org/mailman/listinfo/python-list
Re: UTF-8 Encoding Error
On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote: > Try utf-8-sig > El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió: > > > On 2016年12月22日 22時38分, wrote: > > > >> I am getting the error: > >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: > >> invalid start byte > >> > > > > The following is a reflex of mine, whenever I encounter Python 2 Unicode > > errors: > > > > import sys > > reload(sys) > > sys.setdefaultencoding('utf8') > > > > A relevant Stack Exchange thread awaits you here: > > > > http://stackoverflow.com/a/21190382/2230956 > > -- > > https://mail.python.org/mailman/listinfo/python-list > > Thank you for your kind time and answers. I tried to open one file in default ASCII format in MS-Windows 7. txtf=open("/python27/TestFiles/small_file_1.txt","r").read() I could write them in UTF-8 using cd1=codecs.open("/python27/TestFiles/file1.pos","w", "utf-8-sig") cd1.write(txtf) Here, I was getting an error as, UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 150: ordinal not in range(128) Then I used, >>> import sys >>> reload(sys) >>> sys.setdefaultencoding('utf8') and then wrote >>> cd1.write(txtf) it went fine. Now in my actual problem I am writing it bit differently: with open('textfile.txt') as f: for i, g in enumerate(grouper(n, f, fillvalue=''), 1): with open('/Python27/TestFiles/small_filing_{0}.pos'.format(i * n), 'w') as fout: fout.writelines(g) I am trying to fix this. If you may kindly suggest. -- https://mail.python.org/mailman/listinfo/python-list
Re: UTF-8 Encoding Error
On Friday, December 30, 2016 at 3:35:56 AM UTC+5:30, subhaba...@gmail.com wrote: > On Monday, December 26, 2016 at 3:37:37 AM UTC+5:30, Gonzalo V wrote: > > Try utf-8-sig > > El 25 dic. 2016 2:57 AM, "Grady Martin" <> escribió: > > > > > On 2016年12月22日 22時38分, wrote: > > > > > >> I am getting the error: > > >> UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: > > >> invalid start byte > > >> > > > > > > The following is a reflex of mine, whenever I encounter Python 2 Unicode > > > errors: > > > > > > import sys > > > reload(sys) > > > sys.setdefaultencoding('utf8') > > > > > > A relevant Stack Exchange thread awaits you here: > > > > > > http://stackoverflow.com/a/21190382/2230956 > > > -- > > > https://mail.python.org/mailman/listinfo/python-list > > > > > Thank you for your kind time and answers. > > I tried to open one file in default ASCII format in MS-Windows 7. > txtf=open("/python27/TestFiles/small_file_1.txt","r").read() > I could write them in UTF-8 using > cd1=codecs.open("/python27/TestFiles/file1.pos","w", "utf-8-sig") > cd1.write(txtf) > > Here, I was getting an error as, > UnicodeDecodeError: 'ascii' codec can't decode byte 0x96 in position 150: > ordinal not in range(128) > > Then I used, > >>> import sys > >>> reload(sys) > >>> sys.setdefaultencoding('utf8') > > and then wrote > >>> cd1.write(txtf) > it went fine. > > Now in my actual problem I am writing it bit differently: > > with open('textfile.txt') as f: > for i, g in enumerate(grouper(n, f, fillvalue=''), 1): > with open('/Python27/TestFiles/small_filing_{0}.pos'.format(i * > n), 'w') as fout: > fout.writelines(g) > > I am trying to fix this. > > If you may kindly suggest. The grouper method is: def grouper(n, iterable, fillvalue=None): "Collect data into fixed-length chunks or blocks" args = [iter(iterable)] * n return izip_longest(fillvalue=fillvalue, *args) n = 3 -- https://mail.python.org/mailman/listinfo/python-list
Re: ctypes, memory mapped files and context manager
Dear Eryk, thanks for chiming in. On Donnerstag, 29. Dezember 2016 21:27:56 eryk sun wrote: > On Thu, Dec 29, 2016 at 12:18 PM, Hans-Peter Jansen wrote: > >> >>> import weakref, ctypes > >> >>> T = ctypes.c_ubyte * 3 > >> >>> t = T() > >> >>> bytes(t) == b"\0" * 3 > >> > >> True > >> > >> >>> bytes(weakref.proxy(t)) == b"\0" * 3 > >> > >> Traceback (most recent call last): > >> File "", line 1, in > >> > >> AttributeError: 'c_ubyte_Array_3' object has no attribute '__bytes__' > >> > >> That looks like a leaky abstraction. While I found a workaround > >> > >> >>> bytes(weakref.proxy(t)[:]) == b"\0" * 3 > >> > >> True > > > > I found a couple of other rough corners already, when working with the > > ctypes module. Obviously, this module is lacking some love. > > That's not the fault of ctypes. There's no requirement for objects > that implement the buffer protocol to also implement __bytes__. You'd > have the same problem if you tried to proxy a memoryview. > > However, using a proxy seems particularly wHaorthless for ctypes. Type > checking is integral to the design of ctypes, and a weakproxy won't Did you follow the discussion? I'm trying to make context manager work with ctypes.from_buffer on mmapped files: import ctypes import mmap import weakref NOPROB=False #NOPROB=True from contextlib import contextmanager class T(ctypes.Structure): _fields_ = [("foo", ctypes.c_uint32)] @contextmanager def map_struct(m, n, struct, offset = 0): m.resize(n * mmap.PAGESIZE) inst = struct.from_buffer(m, offset) yield inst SIZE = mmap.PAGESIZE * 2 f = open("tmp.dat", "w+b") f.write(b"\0" * SIZE) f.seek(0) m = mmap.mmap(f.fileno(), mmap.PAGESIZE) with map_struct(m, 1, T) as a: a.foo = 1 if NOPROB: del a with map_struct(m, 2, T) as b: b.foo = 2 if NOPROB: del b offset = ctypes.sizeof(T) rest = m.size() - offset overhang = ctypes.c_ubyte * rest with map_struct(m, 2, overhang, offset) as c: assert(bytes(c) == bytes(rest)) if NOPROB: del c Without these dreaded del statements, this code doesn't work: $ python3 mmap_test2.py Traceback (most recent call last): File "mmap_test2.py", line 30, in with map_struct(m, 2, T) as b: File "/usr/lib64/python3.4/contextlib.py", line 59, in __enter__ return next(self.gen) File "mmap_test2.py", line 16, in map_struct m.resize(n * mmap.PAGESIZE) BufferError: mmap can't resize with extant buffers exported. It will work, if you define NOPROB=True. The weakref approach was an attempt to make this work. Do you have an idea how to create the context manager in a way, that obsoletes these ugly dels? Something like ctypes.from_buffer_release() that is able to actively release the mapping is needed here, AFAICS. This code works with Python2 due to the mmap module not checking for any existing mappings which may lead to segfaults, if the mmap is resized. Thanks, Pete -- https://mail.python.org/mailman/listinfo/python-list
Re: UTF-8 Encoding Error
On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote: > On 2016年12月22日 22時38分, subhabangal...@gmail.com wrote: >>I am getting the error: >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: >>invalid start byte > > The following is a reflex of mine, whenever I encounter Python 2 Unicode > errors: > > import sys > reload(sys) > sys.setdefaultencoding('utf8') This is a BAD idea, and doing it by "reflex" without very careful thought is just cargo-cult programming. You should not thoughtlessly change the default encoding without knowing what you are doing -- and if you know what you are doing, you won't change it at all. The Python interpreter *intentionally* removes setdefaultencoding at startup for a reason. Changing the default encoding can break the interpreter, and it is NEVER what you actually need. If you think you want it because it fixes "Unicode errors", all you are doing is covering up bugs in your code. Here is some background on why setdefaultencoding exists, and why it is dangerous: https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/ If you have set the Python 2 default encoding to anything but ASCII, you are now running a broken system with subtle bugs, including in data structures as fundamental as dicts. The standard behaviour: py> d = {u'café': 1} py> for key in d: ... print key == 'caf\xc3\xa9' ... False As we should expect: the key in the dict, u'café', is *not* the same as the byte-string 'caf\xc3\xa9'. But watch how we can break dictionaries by changing the default encoding: py> reload(sys) py> sys.setdefaultencoding('utf-8') # don't do this py> for key in d: ... print key == 'caf\xc3\xa9' ... True So Python now thinks that 'caf\xc3\xa9' is a key. Or does it? py> d['caf\xc3\xa9'] Traceback (most recent call last): File "", line 1, in KeyError: 'caf\xc3\xa9' By changing the default encoding, we now have something which is both a key and not a key of the dict at the same time. > A relevant Stack Exchange thread awaits you here: > > http://stackoverflow.com/a/21190382/2230956 And that's why I don't trust StackOverflow. It's not bad for answering simple questions, but once the question becomes more complex the quality of accepted answers goes down the toilet. The highest voted answer is *wrong* and *dangerous*. And then there's this comment: Until this moment I was forced to include "# -- coding: utf-8 --" at the begining of each document. This is way much easier and works as charm I have no words for how wrong that is. And this comment: ty, this worked for my problem with python throwing UnicodeDecodeError on var = u"""vary large string""" No it did not. There is no possible way that Python will throw that exception on assignment to a Unicode string literal. It is posts like this that demonstrate how untrustworthy StackOverflow can be. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: UTF-8 Encoding Error
On Friday, December 30, 2016 at 7:16:25 AM UTC+5:30, Steve D'Aprano wrote: > On Sun, 25 Dec 2016 04:50 pm, Grady Martin wrote: > > > On 2016年12月22日 22時38分, wrote: > >>I am getting the error: > >>UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 15: > >>invalid start byte > > > > The following is a reflex of mine, whenever I encounter Python 2 Unicode > > errors: > > > > import sys > > reload(sys) > > sys.setdefaultencoding('utf8') > > > This is a BAD idea, and doing it by "reflex" without very careful thought is > just cargo-cult programming. You should not thoughtlessly change the > default encoding without knowing what you are doing -- and if you know what > you are doing, you won't change it at all. > > The Python interpreter *intentionally* removes setdefaultencoding at startup > for a reason. Changing the default encoding can break the interpreter, and > it is NEVER what you actually need. If you think you want it because it > fixes "Unicode errors", all you are doing is covering up bugs in your code. > > Here is some background on why setdefaultencoding exists, and why it is > dangerous: > > https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/ > > If you have set the Python 2 default encoding to anything but ASCII, you are > now running a broken system with subtle bugs, including in data structures > as fundamental as dicts. > > The standard behaviour: > > py> d = {u'café': 1} > py> for key in d: > ... print key == 'caf\xc3\xa9' > ... > False > > > As we should expect: the key in the dict, u'café', is *not* the same as the > byte-string 'caf\xc3\xa9'. But watch how we can break dictionaries by > changing the default encoding: > > py> reload(sys) > > py> sys.setdefaultencoding('utf-8') # don't do this > py> for key in d: > ... print key == 'caf\xc3\xa9' > ... > True > > > So Python now thinks that 'caf\xc3\xa9' is a key. Or does it? > > py> d['caf\xc3\xa9'] > Traceback (most recent call last): > File "", line 1, in > KeyError: 'caf\xc3\xa9' > > By changing the default encoding, we now have something which is both a key > and not a key of the dict at the same time. > > > > > A relevant Stack Exchange thread awaits you here: > > > > http://stackoverflow.com/a/21190382/2230956 > > And that's why I don't trust StackOverflow. It's not bad for answering > simple questions, but once the question becomes more complex the quality of > accepted answers goes down the toilet. The highest voted answer is *wrong* > and *dangerous*. > > And then there's this comment: > > Until this moment I was forced to include "# -- coding: utf-8 --" at > the begining of each document. This is way much easier and works as > charm > > I have no words for how wrong that is. And this comment: > > ty, this worked for my problem with python throwing UnicodeDecodeError > on var = u"""vary large string""" > > No it did not. There is no possible way that Python will throw that > exception on assignment to a Unicode string literal. > > It is posts like this that demonstrate how untrustworthy StackOverflow can > be. > > > > -- > Steve > “Cheer up,” they said, “things could be worse.” So I cheered up, and sure > enough, things got worse. Thanks for your detailed comment. The code is going all fine sometimes, and sometimes giving out errors. If any one may see how I am doing the problem. -- https://mail.python.org/mailman/listinfo/python-list
Re: Obtain javascript result
Hi, there's a problem in betexplorer? this php page dont response anything to get odds http://www.betexplorer.com/soccer/russia/youth-\league/matchdetails.php?matchid=rLu2Xsdi from 24 december dont work thanxs Il giorno domenica 23 ottobre 2016 20:09:30 UTC+2, epr...@gmail.com ha scritto: > Ok, I solved to this way: > > from bs4 import BeautifulSoup > from selenium import webdriver > > driver = webdriver.Chrome() > driver.get('http://www.betexplorer.com/soccer/russia/youth-\league/matchdetails.php?matchid=rLu2Xsdi') > > pg_src = driver.page_source > driver.close() > soup = BeautifulSoup(pg_src, 'html.parser') > # start from here I do something with soup ... > > Windows 10 / Python 3.5.2 > > Thanks -- https://mail.python.org/mailman/listinfo/python-list
python3 - set '\n\n' as the terminator when writing a formatted LogRecord
Is it possible to set '\n\n' as the terminator when writing a formatted LogRecord to a stream by changing the format parameter of logging.basicConfig? I know it is possible using the terminator attribute of StreamHandler class to implement this, I just wonder Is it possible to achieve this feature by changing the format parameter? I am not familiar with the format string language -- https://mail.python.org/mailman/listinfo/python-list