Re: Raw Data from Website
in 764257 20160823 081439 Steven D'Aprano wrote: >On Tuesday 23 August 2016 10:28, adam.j.k...@gmail.com wrote: > >> Hi, >> >> I am hoping someone is able to help me. >> >> Is there a way to pull as much raw data from a website as possible. The >> webpage that I am looking for is as follows: >> >http://www.homepriceguide.com.au/Research/ResearchSeeFullList.aspx?LocationType=LGA&State=QLD&LgaID= >632 >> >> The main variable that is important is the "632" at the end, by adjusting >> this it changes the postcodes. Each postcode contains a large amount of data. >> Is there a way this all able to be exported into an excel document? > >Ideally, the web site itself will offer an Excel download option. If it >doesn't, you may be able to screen-scrape the data yourself, but: > >(1) it may be against the terms of service of the website; >(2) it may be considered unethical or possibly even copyright >infringement or (worst case) even illegal; >(3) especially if you're thinking of selling the data; >(4) at the very least, unless you take care not to abuse the service, >it may be rude and the website may even block your access. > >There are many tutorials and examples of "screen scraping" or "web scraping" on >the internet -- try reading them. It's not something I personally have any >experience with, but I expect that the process goes something like this: > >- connect to the website; >- download the particular page you want; >- grab the data that you care about; >- remove HTML tags and extract just the bits needed; >- write them to a CSV file. wget does the hard part. -- https://mail.python.org/mailman/listinfo/python-list
Re: Raw Data from Website
On Wednesday 24 August 2016 17:04, Bob Martin wrote: > in 764257 20160823 081439 Steven D'Aprano > wrote: >>There are many tutorials and examples of "screen scraping" or "web scraping" >>on the internet -- try reading them. It's not something I personally have any >>experience with, but I expect that the process goes something like this: >> >>- connect to the website; >>- download the particular page you want; >>- grab the data that you care about; >>- remove HTML tags and extract just the bits needed; >>- write them to a CSV file. > > wget does the hard part. I don't think so. Just downloading a web page is easy. Parsing the potentially invalid HTML (or worse, the content is assembled in the browser by Javascript) to extract the actual data you care about is much harder. -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: degrees and radians.
For what it's worth, mathematicians naturally work with angles in radians. The mathematics of the trignonmetric functions works naturally when the angle is expressed in radians. For the older among us, logarithms also have a "natural" base, and that is the number e. Back in those days, however, even the mathematicians would use logarithm tables where the values were shown to base 10. On Wed, Aug 24, 2016 at 6:26 AM, Steven D'Aprano < steve+comp.lang.pyt...@pearwood.info> wrote: > On Wednesday 24 August 2016 14:26, Gary Herron wrote: > > > Do you really need anything more complex than this? > > > > >>> toRadians = math.pi/180.0 > > > > >>> math.sin(90*toRadians) > > 1.0 > > > > Perhaps I'm not understanding what you mean by "clunky", but this seems > > pretty clean and simple to me. > > The math module has two conversion functions, math.radians() and > math.degrees(). > > > Some other languages (Julia, by memory, and perhaps others) have dedicated > sind(), cosd(), tand() or possibly dsin(), dcos(), dtan() functions which > take > their argument in degrees and are more accurate than doing a conversion to > radians first. I'd like to see that. > > I've also seen languages with sinp() etc to calculate the sine of x*pi > without > the intermediate calculation. > > But if I were designing Python from scratch, I'd make sin(), cos() and > tan() > call dunder methods __sin__ etc: > > > def sin(obj): > if hasattr(type(obj), '__sin__'): > y = type(obj).__sin__() > if y is not NotImplemented: > return y > elif isinstance(obj, numbers.Number): > return float.__sin__(float(obj)) > raise TypeError > > Likewise for asin() etc. > > Then you could define your own numeric types, such as a Degrees type, a > PiRadians type, etc, with their own dedicated trig function > implementations, > without the caller needing to care about which sin* function they call. > > > > > -- > Steve > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
convert triply-nested array to void pointer using ctypes
I'd like to send an array containing arrays of 2-element float arrays to a foreign function, as the following struct: class _FFIArray(Structure): _fields_ = [("data", c_void_p), ("len", c_size_t)] @classmethod def from_param(cls, seq): """ Allow implicit conversions """ return seq if isinstance(seq, cls) else cls(seq) def __init__(self, seq): self.data = cast( np.array(seq, dtype=np.float64).ctypes.data_as(POINTER(c_double)), c_void_p ) self.len = len(seq) This works for data such as [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], … ], (.shape is n, 2 where n is an arbitrary length) but I'd like it to work for data such as [ [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], … ], [[7.0, 8.0], [9.0, 10.0], [11.0, 12.0], … ], … ] which has .shape p, q, 2 where p and q are of arbitrary length. Is this possible? -- https://mail.python.org/mailman/listinfo/python-list
The order of iterable de-referencing in assignment?
Hi, Given a = [1, 2] a.extend(a) makes a = [1,2, 1,2] One might guess a.extend(a) would turn into an infinite loop. It turns out here Python first gets all the items of `a' and then append them to `a', so the infinite loop is avoided. My question is, is there any doc on the behavior of things like this? Another related example might be: a[:] = a Hopefully Python first gets all the items on the *right* side and then assigns them to the left. Regards. -- https://mail.python.org/mailman/listinfo/python-list
Re: The order of iterable de-referencing in assignment?
On Wed, Aug 24, 2016 at 8:54 PM, Shiyao Ma wrote: > Given a = [1, 2] > > a.extend(a) makes a = [1,2, 1,2] > > One might guess a.extend(a) would turn into an infinite loop. It turns out > here Python first gets all the items of `a' and then append them to `a', so > the infinite loop is avoided. > Be careful: doing the same in Python may not behave that way. >>> a = [1,2] >>> for x in a: a.append(x) ... ^CTraceback (most recent call last): File "", line 1, in KeyboardInterrupt >>> len(a) 6370805 That right there, folks, is an infinite loop. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: The order of iterable de-referencing in assignment?
Shiyao Ma : > Given a = [1, 2] > > a.extend(a) makes a = [1,2, 1,2] > > One might guess a.extend(a) would turn into an infinite loop. It turns > out here Python first gets all the items of `a' and then append them > to `a', so the infinite loop is avoided. Functionally, Python's lists are not linked lists but, rather, vectors. > My question is, is there any doc on the behavior of things like this? No. It should be here: https://docs.python.org/3/reference/datamodel.html> It only states: The items of a list are arbitrary Python objects. and inexplicably muddies the waters by continuing: Lists are formed by placing a comma-separated list of expressions in square brackets. (Note that there are no special cases needed to form lists of length 0 or 1.) (which has nothing whatsoever to do with the data model). Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 tkinter graphical statistical distributions fitter
Thanks, James! 2016-08-13 12:46 GMT+02:00 : > I created a Python 3 tkinter graphical statistical distributions fitting > application that will fit a 1D data set to all of the continuous > statistical distributions in scipy.stats, with graphical display of the > distributions plotted against normalized histograms of the data. Fitted > results can be sorted by nnlf, AIC or AIC_BA. The URL on GitHub is: > > https://github.com/zunzun/tkInterStatsDistroFit > > James Phillips > -- > https://mail.python.org/mailman/listinfo/python-announce-list > > Support the Python Software Foundation: > http://www.python.org/psf/donations/ > -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3 tkinter graphical statistical distributions fitter
Please accept my apologies for sending the message to the full list 😬 2016-08-24 13:11 GMT+02:00 Daniel Riaño : > Thanks, James! > > 2016-08-13 12:46 GMT+02:00 : > >> I created a Python 3 tkinter graphical statistical distributions fitting >> application that will fit a 1D data set to all of the continuous >> statistical distributions in scipy.stats, with graphical display of the >> distributions plotted against normalized histograms of the data. Fitted >> results can be sorted by nnlf, AIC or AIC_BA. The URL on GitHub is: >> >> https://github.com/zunzun/tkInterStatsDistroFit >> >> James Phillips >> -- >> https://mail.python.org/mailman/listinfo/python-announce-list >> >> Support the Python Software Foundation: >> http://www.python.org/psf/donations/ >> > > -- https://mail.python.org/mailman/listinfo/python-list
Re: The order of iterable de-referencing in assignment?
On Wed, Aug 24, 2016 at 9:03 PM, Joaquin Alzola wrote: >>> One might guess a.extend(a) would turn into an infinite loop. It turns out >>> here Python first gets all the items of `a' and then append them to `a', so >>> the infinite loop is avoided. a = [1,2] for x in a: a.append(x) >>... >>^CTraceback (most recent call last): > > File "", line 1, in >>KeyboardInterrupt > len(a) >>6370805 > >>That right there, folks, is an infinite loop. > > If I am correct python points out an infinite loop with the "...", just > pointing to more information. That's true of self-referential objects and circular references: >>> a = [1, 2] >>> a.append(a) >>> a [1, 2, [...]] In this case, it's not an infinite loop or anything; it's simply an object that references itself: >>> id(a) 139945904550344 >>> [id(x) for x in a] [9241344, 9241376, 139945904550344] The list has three elements, one of which is the list itself. Same applies if the list has a reference to something else which has a reference to the original list, or anything along those lines. But "a.extend(a)" isn't quite like that. In its purest form, it means the same as the 'for' loop that I showed above, but... well, this comment from the CPython sources says exactly what I'm thinking of: Objects/listobject.c:795 /* Special cases: 1) lists and tuples which can use PySequence_Fast ops 2) extending self to self requires making a copy first */ > This email is confidential and may be subject to privilege. If you are not > the intended recipient, please do not copy or disclose its content but > contact the sender immediately upon receipt. > This email is confident and may be subject to white privilege, black privilege, and privileges of other colours. If you are not the intended recipient, please contact a lawyer and tell him that email footers are unenforceable. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: The order of iterable de-referencing in assignment?
On Wed, Aug 24, 2016, at 07:17, Chris Angelico wrote: > Objects/listobject.c:795 > > /* Special cases: >1) lists and tuples which can use PySequence_Fast ops >2) extending self to self requires making a copy first > */ And, of course, it is a special case - a.extend(iter(a)) is enough to break it. Frankly I'm not sure why bother to implement it when anyone who legitimately wants to do it can simply do a *= 2. -- https://mail.python.org/mailman/listinfo/python-list
Alternatives to XML?
Hi all I have mentioned in the past that I use XML for storing certain structures 'off-line', and I got a number of comments urging me to use JSON or YAML instead. In fact XML has been working very well for me, but I am looking into alternatives simply because of the issue of using '>' and '<' in attributes. I can convert them to '>' and '<', but that imposes a cost in terms of readability. Here is a simple example - This is equivalent to the following python code - if _param.auto_party_id is not None: if on_insert: value = auto_gen(_param.auto_party_id) elif not_exists: value = '' The benefit of serialising it is partly to store it in a database and read it in at runtime, and partly to allow non-trusted users to edit it without raising security concerns. I ran my XML through some online converters, but I am not happy with the results. Here is a JSON version - { "case": { "compare": { "-src": "_param.auto_party_id", "-op": "is_not", "-tgt": "$None", "case": { "on_insert": { "auto_gen": { "-args": "_param.auto_party_id" } }, "not_exists": { "literal": { "-value": "" } } } } } } I can see how it works, but it does not seem as readable to me. It is not so obvious that 'compare' has three arguments which are evaluated, and if true the nested block is executed. Here is a YAML version - case: compare: case: on_insert: auto_gen: _args: "_param.auto_party_id" not_exists: literal: _value: "" _src: "_param.auto_party_id" _op: is_not _tgt: "$None" This seems even worse from a readability point of view. The arguments to 'compare' are a long way away from the block to be executed. Can anyone offer an alternative which is closer to my original intention? Thanks Frank Millman -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
"Frank Millman" : > I have mentioned in the past that I use XML for storing certain > structures 'off-line', and I got a number of comments urging me to use > JSON or YAML instead. JSON is very good. > In fact XML has been working very well for me, but I am looking into > alternatives simply because of the issue of using '>' and '<' in > attributes. I can convert them to '>' and '<', but that imposes > a cost in terms of readability. Precise syntax requires escaping. In fact, XML is not good enough at it. XML is far too much and too little at the same time. > Here is a simple example - > > > > > > > > > > > > > > Can anyone offer an alternative which is closer to my original intention? There's S-expressions: (case (compare #:src "_param.auto_party_id" #:op "is_not" #:tgt #f (case #:on-insert (auto-gen "_param.auto_party_id") #:not-exists (literal "" Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
On Wed, 24 Aug 2016 16:58:54 +0200, Frank Millman wrote: > Hi all > > I have mentioned in the past that I use XML for storing certain > structures 'off-line', and I got a number of comments urging me to use > JSON or YAML instead. > > In fact XML has been working very well for me, but I am looking into > alternatives simply because of the issue of using '>' and '<' in > attributes. > I can convert them to '>' and '<', but that imposes a cost in > terms of readability. > are these files expected to be read/written by a human being or are they for your application to save & restore its settings? if the former then you probably need to choose a specification/format that was designed to be human readable form the start (such as the old .ini format) if it is primarily for your app then you need a format that efficiently & accurately saves the data you require, readability is a secondary (but still desirable) requirement for debugging & the rare case where a manual change is req. XLM is bulky with lots of redundant information & still not readily readable without extra tools. Json is quite terse but I find it quite readable & is well suited for saving most data structures pickle can save the data efficiently but is certainly not readable -- Even a blind pig stumbles upon a few acorns. -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
Frank Millman wrote: > I have mentioned in the past that I use XML for storing certain structures > 'off-line', and I got a number of comments urging me to use JSON or YAML > instead. > > In fact XML has been working very well for me, but I am looking into > alternatives simply because of the issue of using '>' and '<' in > attributes. I can convert them to '>' and '<', but that imposes a > cost in terms of readability. > > Here is a simple example - > > > > > > > > > > > > > > > This is equivalent to the following python code - > > if _param.auto_party_id is not None: > if on_insert: > value = auto_gen(_param.auto_party_id) > elif not_exists: > value = '' I think we have a winner here ;) > The benefit of serialising it is partly to store it in a database and read > it in at runtime, and partly to allow non-trusted users to edit it without > raising security concerns. If you store what is basically a script as XML or JSON, why is that safer than Python or Javascript? -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3: Launch multiple commands(subprocesses) in parallel (but upto 4 any time at same time) AND store each of their outputs into a variable
lax.cla...@gmail.com wrote: > Hi, > > I've been reading various forums and python documentation on subprocess, > multithreading, PIPEs, etc. But I cannot seem to mash together several of my > requirements into working code. > > I am trying to: > > 1) Use Python 3+ (specifically 3.4 if it matters) > 2) Launch N commands in background (e.g., like subprocess.call would for > individual commands) > 3) But only limit P commands to run at same time > 4) Wait until all N commands are done > 5) Have an array of N strings with the stdout+stderr of each command in it. > > What is the best way to do this? > There are literally many variations of things in the Python documentation and > Stackoverflow that I am unable to see the forest from trees (for my problem). > > Thank you very much! First off, I'm assuming that the stdout+stderr of these commands is of reasonable size rather than hundreds of megabytes. What you want is a finite pool of threads (or processes) that execute the tasks. multiprocessing.pool.Pool will do it. So will concurrent.futures, which is what I'd personally use just out of more familiarity with it. In either case your task should wrap a call to subprocess. subprocess.run is your easiest answer if you've got Python 3.5; the task would call it with stdout and stderr=subprocess.PIPE, get the CompletedProcess back, and then store the .stdout and .stderr string results. For older Python, create a subprocess.Popen (again with stdout and stderr=subprocess.PIPE) and call the communicate() method. There's probably a dozen other ways. That one there, that's your easiest. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix. -- https://mail.python.org/mailman/listinfo/python-list
Re: degrees and radians.
murdocksgra...@gmail.com wrote: > On Saturday, May 4, 2002 at 3:37:07 AM UTC-4, Jim Richardson wrote: >> >> I am trying to get the math module to deal with degrees rather than >> radians. (that it deals with radians for the angular functions like >> sin() isn't mentioned in the docs, which was sort of an eyeopener :) I >> can't find any info on doing this. I can convert from-to degrees in the >> code calling the function, but that's a bit clunky. Any pointers to an >> FM to R? : >> >> -- >> Jim Richardson >> Anarchist, pagan and proud of it >> http://www.eskimo.com/~warlock >> Linux, from watches to supercomputers, for grandmas and geeks. > > For what is is worth.. Electrical Engineers for the most part work in degrees > NOT Radians for example try doing polar to rectangular or vice versa in > polar. > I have never seen it done. > While I fully admit to thinking in degrees, any time I'm actually doing any mathematical work my units are either radians or cycles. The representation of angle in fixed point cycles actually comes out really nicely; each bit takes you from hemispheres to quadrants to octants, etc. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix. -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
Frank Millman wrote: > Hi all > > I have mentioned in the past that I use XML for storing certain structures > 'off-line', and I got a number of comments urging me to use JSON or YAML > instead. > > In fact XML has been working very well for me, but I am looking into > alternatives simply because of the issue of using '>' and '<' in attributes. > I can convert them to '>' and '<', but that imposes a cost in terms of > readability. > > Here is a simple example - > > > > > > > > > > > > > > > This is equivalent to the following python code - > > if _param.auto_party_id is not None: > if on_insert: > value = auto_gen(_param.auto_party_id) > elif not_exists: > value = '' > > The benefit of serialising it is partly to store it in a database and read > it in at runtime, and partly to allow non-trusted users to edit it without > raising security concerns. > > I ran my XML through some online converters, but I am not happy with the > results. > > Here is a JSON version - > > { > "case": { > "compare": { > "-src": "_param.auto_party_id", > "-op": "is_not", > "-tgt": "$None", > "case": { > "on_insert": { > "auto_gen": { "-args": "_param.auto_party_id" } > }, > "not_exists": { > "literal": { "-value": "" } > } > } > } > } > } > > I can see how it works, but it does not seem as readable to me. It is not so > obvious that 'compare' has three arguments which are evaluated, and if true > the nested block is executed. > > Here is a YAML version - > > case: > compare: > case: >on_insert: > auto_gen: > _args: "_param.auto_party_id" >not_exists: > literal: > _value: "" > _src: "_param.auto_party_id" > _op: is_not > _tgt: "$None" > > This seems even worse from a readability point of view. The arguments to > 'compare' are a long way away from the block to be executed. > > Can anyone offer an alternative which is closer to my original intention? > > Thanks > > Frank Millman > You've been staring at that XML too long; you've become familiar with it. It's just as unreadable as the JSON or the YAML unless you already know what it says. That said, one core difference between XML and the other two is that XML allows for the idea that order matters. JSON and YAML both consider key/value pair objects in the same way Python does a dict -- there is AN order because there has to be one, but it's not expected to be important or preserved. And when you start talking about "close to" you're talking about preserving ordering. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix. -- https://mail.python.org/mailman/listinfo/python-list
Re: Does This Scare You?
On Mon, Aug 22, 2016 at 5:24 PM, Chris Angelico wrote: > On Tue, Aug 23, 2016 at 3:13 AM, eryk sun wrote: >> On Mon, Aug 22, 2016 at 4:18 PM, Chris Angelico wrote: >>> The CON device should work if the process is attached to a console (i.e. a conhost.exe instance). >>> >>> No, I used Pike (to avoid any specifically-Python issues or >>> protections) running in a console. Attempting to write to "Logs/con" >>> wrote to the console, so I know the console device is active. >>> Attempting to write to "Logs/con.txt" failed as described. >> >> What version of Windows is this? If it's Windows 7 I'll have to check >> that later. If "Logs" is an existing directory, then both "Logs/con" >> and "Logs/con.txt" should refer to the console. If "Logs" doesn't >> exist, then both should fail. Virtual DOS devices only exist in >> existing directories. > > Yes, it was Windows 7 (running in a VM under Debian Jessie, though I > doubt that makes any difference). The Logs directory did exist (that's > why I used that otherwise-odd choice of name). I discovered why "Logs/con.txt" isn't working right in Windows 7, while "Logs/nul.txt" does get redirected correctly to r"\\.\nul". Prior to Windows 8 the console doesn't use an NT device, so the base API has a function named BaseIsThisAConsoleName that looks for names such as r"\\.CON", r"\\.CONIN$", "CON", or r"C:\Temp\con.txt" and returns either "CONIN$" or "CONOUT$" if there's a match. A match for just "CON" maps to one or the other of the latter depending on whether read or write access is desired. If there's a match, then a CreateFile call gets routed to OpenConsoleW, which uses the process ConsoleHandle to send the request to the attached instance of conhost.exe, which replies with a console pseudohandle. Note that the handle value is flagged by setting the lower 2 bits, i.e. 3, 7, 11, which allows routing to special console functions, such as WriteFile => WriteConsoleA. In Windows 8+ console handles are regular File handles (e.g. 24, 28, 32). When debugging this I observed that there's a performance hack in BaseIsThisAConsoleName. It only calls RtlIsDosDeviceName_U, which does a full check for "CON" in the path, if the name starts with '\\' , 'c', or 'C', or if it ends with 'n', 'N', ':', or '$'. This means r"C:\whatever\con.txt" works (the base path doesn't even have to exist), but not r"D:\whatever\con.txt". In your case the name starts with 'L', so BaseIsThisAConsoleName returns false and the code falls through to calling DosPathNameToRelativeNtPathName_U_WithStatus. This returns r"\??\con", which NtCreateFile fails to open. r"\??" is a virtual directory starting in Windows XP. For this directory the object manager first checks the logon session's DOS devices in r"\Sessions\0\DosDevices\[Logon Id]" and then the global DOS devices in r"\GLOBAL??". This is an improvement over the less flexible way that Windows 2000 managed DOS devices for terminal services and logon sessions. A DOS 'device' is an object symbolic link to the real NT device in the r"\Device" directory. In Windows 8+, r"\GLOBAL??\CON" is a link to r"\Device\ConDrv\Console", which is why opening "Logs/con.txt" worked 'correctly' for me in Windows 10. -- https://mail.python.org/mailman/listinfo/python-list
Re: degrees and radians.
murdocksgra...@gmail.com schreef op 2016-08-24 06:08: Also Borland C and C++ used Degrees and NOT Radians.. go look at the libraries You have a reference for that? Borland C++ Builder Help from C++ Builder 5 disagrees with you: "sin, sinl Header File math.h Category Math Routines Syntax #include double sin(double x); long double sinl(long double x); Description Calculates sine. sin computes the sine of the input value. Angles are specified in radians. sinl is the long double version; it takes a long double argument and returns a long double result. Error handling for these functions can be modified through the functions _matherr and _matherrl. Return Value sin and sinl return the sine of the input value." The VCL trigonometry functions also use radians. From the same help system: "SinCos Returns sine and cosine of an angle. Unit Math Category Trigonometry routines extern PACKAGE void __fastcall SinCos(Extended Theta, Extended &Sin, Extended &Cos); Description Call SinCos to obtain the sine and cosine of an angle expressed in radians. Theta specifies the angle. The sine and cosine are returned by the Sin and Cos parameters, respectively. SinCos is twice as fast as calling Sin and Cos separately for the same angle." I could also copy some entries from the C++Builder 2010 documentation; they all use radians too. Best regards, Roel -- The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom. -- Isaac Asimov Roel Schroeven -- https://mail.python.org/mailman/listinfo/python-list
Re: Does This Scare You?
On Thu, Aug 25, 2016 at 7:00 AM, eryk sun wrote: > I discovered why "Logs/con.txt" isn't working right in Windows 7, > while "Logs/nul.txt" does get redirected correctly to r"\\.\nul". > Prior to Windows 8 the console doesn't use an NT device, so the base > API has a function named BaseIsThisAConsoleName that looks for names > such as r"\\.CON", r"\\.CONIN$", "CON", or r"C:\Temp\con.txt" and > returns either "CONIN$" or "CONOUT$" if there's a match. A match for > just "CON" maps to one or the other of the latter depending on whether > read or write access is desired. See? This is why *even after I tested it* I wasn't sure I was right! The rules are... complicated. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
RE: The order of iterable de-referencing in assignment?
>> One might guess a.extend(a) would turn into an infinite loop. It turns out >> here Python first gets all the items of `a' and then append them to `a', so >> the infinite loop is avoided. >>> a = [1,2] >>> for x in a: a.append(x) >... >^CTraceback (most recent call last): > File "", line 1, in >KeyboardInterrupt len(a) >6370805 >That right there, folks, is an infinite loop. If I am correct python points out an infinite loop with the "...", just pointing to more information. This email is confidential and may be subject to privilege. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
On Thu, Aug 25, 2016 at 2:50 AM, Peter Otten <__pete...@web.de> wrote: >> if _param.auto_party_id is not None: >> if on_insert: >> value = auto_gen(_param.auto_party_id) >> elif not_exists: >> value = '' > > I think we have a winner here ;) Agreed. http://thedailywtf.com/articles/The_Enterprise_Rules_Engine If you create a non-code way to do code, you either have to make it far FAR simpler than code, or... you should just use code. Use code. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
"Frank Millman" wrote in message news:npkcnf$kq7$1...@blaine.gmane.org... Hi all I have mentioned in the past that I use XML for storing certain structures 'off-line', and I got a number of comments urging me to use JSON or YAML instead. Can anyone offer an alternative which is closer to my original intention? Many thanks for the replies. I will respond to them all in one place. @alister "are these files expected to be read/written by a human being or are they for your application to save & restore its settings?" Good question. My project is a business/accounting system. It provides a number of tables and columns pre-defined, but it also allows users to create their own. I allow them to define business rules to be invoked at various points. Therefore it must be in a format which is readable/writable by humans, but executable at runtime by my program. .ini is an interesting idea - I will look into it @rob "You've been staring at that XML too long; you've become familiar with it. It's just as unreadable as the JSON or the YAML unless you already know what it says." Good comment! I am sure you are right. Whichever format I settle on, I will have to provide some sort of cheat-sheet explaining to non-technical users what the format is. Having said that, I do find the XML more readable in the sense that the attributes are closely linked with their elements. I think it is easier to explain what is doing than the equivalent in JSON or YAML. @Marko I have never heard of S-expressions before, but they look interesting. I will investigate further. @Peter/Chris I don't understand - please explain. If I store the business rule in Python code, how do I prevent untrusted users putting malicious code in there? I presume I would have to execute the code by calling eval(), which we all know is dangerous. Is there another way of executing it that I am unaware of? Frank -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
On Thu, Aug 25, 2016 at 3:33 PM, Frank Millman wrote: > @Peter/Chris > I don't understand - please explain. > > If I store the business rule in Python code, how do I prevent untrusted > users putting malicious code in there? I presume I would have to execute the > code by calling eval(), which we all know is dangerous. Is there another way > of executing it that I am unaware of? The real question is: How malicious can your users be? If the XML file is stored adjacent to the Python script that runs it, anyone who can edit one can edit the other. Ultimately, that means that (a) any malicious user can simply edit the Python script, and therefore (b) anyone who's editing the other file is not malicious. If that's not how you're doing things, give some more details of what you're trying to do. How are you preventing changes to the Python script? How frequent will changes be? Can you simply put all changes through a git repository and use a pull request workflow to ensure that a minimum of two people eyeball every change? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
"Frank Millman" : > If I store the business rule in Python code, how do I prevent > untrusted users putting malicious code in there? I presume I would > have to execute the code by calling eval(), which we all know is > dangerous. Is there another way of executing it that I am unaware of? This is a key question. A couple of days back I stated the principle that a programming language is better than a rule language. That principle is followed by PostScript printers, Java applets, web pages with JavaScript, emacs configuration files etc. The question is how do you get the desired benefits without opening the door to sabotage. You have to shield CPU usage, memory usage, disk access, network access etc. You can google for solutions with search terms such as "python sandbox", "linux sandbox" and "linux container sandbox". Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
Chris Angelico : > The real question is: How malicious can your users be? Oh, yes, the simple way to manage the situation is for the server to call seteuid() before executing the code after authenticating the user. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
"Chris Angelico" wrote in message news:CAPTjJmq2bcQPmQ9itVvZrBZJPcbYe5z6vDpKGYQj=8h+qkv...@mail.gmail.com... On Thu, Aug 25, 2016 at 3:33 PM, Frank Millman wrote: @Peter/Chris > I don't understand - please explain. > > If I store the business rule in Python code, how do I prevent untrusted > users putting malicious code in there? I presume I would have to execute > the > code by calling eval(), which we all know is dangerous. Is there another > way > of executing it that I am unaware of? The real question is: How malicious can your users be? If the XML file is stored adjacent to the Python script that runs it, anyone who can edit one can edit the other. Ultimately, that means that (a) any malicious user can simply edit the Python script, and therefore (b) anyone who's editing the other file is not malicious. If that's not how you're doing things, give some more details of what you're trying to do. How are you preventing changes to the Python script? How frequent will changes be? Can you simply put all changes through a git repository and use a pull request workflow to ensure that a minimum of two people eyeball every change? All interaction with users is via a gui. The database contains tables that define the database itself - tables, columns, form definitions, etc. These are not purely descriptive, they drive the entire system. So if a user modifies a definition, the changes are immediate. Does that answer your question? I can go into a lot more detail, but I am not sure where to draw the line. Frank -- https://mail.python.org/mailman/listinfo/python-list
Re: Alternatives to XML?
On Thu, Aug 25, 2016 at 4:11 PM, Frank Millman wrote: > "Chris Angelico" wrote in message > news:CAPTjJmq2bcQPmQ9itVvZrBZJPcbYe5z6vDpKGYQj=8h+qkv...@mail.gmail.com... > > On Thu, Aug 25, 2016 at 3:33 PM, Frank Millman wrote: >> >> @Peter/Chris >> > I don't understand - please explain. >> > >> > If I store the business rule in Python code, how do I prevent untrusted >> > users putting malicious code in there? I presume I would have to execute >> > > the >> > code by calling eval(), which we all know is dangerous. Is there another >> > > way >> > of executing it that I am unaware of? > > >> The real question is: How malicious can your users be? > > >> If the XML file is stored adjacent to the Python script that runs it, >> anyone who can edit one can edit the other. Ultimately, that means that (a) >> any malicious user can simply edit the Python script, and therefore (b) >> anyone who's editing the other file is not malicious. > > >> If that's not how you're doing things, give some more details of what >> you're trying to do. How are you preventing changes to the Python script? >> How frequent will changes be? Can you simply put all changes through a git >> repository and use a pull request workflow to ensure that a minimum of two >> people eyeball every change? > > > All interaction with users is via a gui. The database contains tables that > define the database itself - tables, columns, form definitions, etc. These > are not purely descriptive, they drive the entire system. So if a user > modifies a definition, the changes are immediate. > > Does that answer your question? I can go into a lot more detail, but I am > not sure where to draw the line. Sounds to me like you have two very different concerns, then. My understanding of "GUI" is that it's a desktop app running on the user's computer, as opposed to some sort of client/server system - am I right? 1) Malicious users, as I describe above, can simply mess with your code directly, or bypass it and talk to the database, or anything. So you can ignore them. 2) Non-programmer users, without any sort of malice, want to be able to edit these scripts but not be caught out by a tiny syntactic problem. Concern #2 is an important guiding principle. You need your DSL to be easy to (a) read/write, and (b) verify/debug. That generally means restricting functionality some, but you don't have to dig into the nitty-gritty of security. If someone figures out a way to do something you didn't intend to be possible, no big deal; but if someone CANNOT figure out how to use the program normally, that's a critical failure. So I would recommend making your config files as simple and clean as possible. That might mean Python code; it does not mean XML, and probably not JSON either. It might mean YAML. It might mean creating your own DSL. It might be as simple as "Python code with a particular style guide". There are a lot of options. Here's your original XML and Python code: if _param.auto_party_id is not None: if on_insert: value = auto_gen(_param.auto_party_id) elif not_exists: value = '' Here's a very simple format, borrowing from RFC822 with a bit of Python added: if: _param.auto_party_id != None if: on_insert value: =auto_gen(_param.auto_party_id) elif: not_exists value: "if" and "elif" expect either a single boolean value, or two values separated by a known comparison operator (==, !=, <, >, etc). I'd exclude "is" and "is not" for parser simplicity. Any field name that isn't a keyword (in this case "value:") sets something to the given value, or evals the given string if it starts with an equals sign. You can mess around with the details all you like, but this is a fairly simple format (every line consists of "keyword: value" at some indentation level), and wouldn't be too hard to parse. The use of eval() means this is assuming non-malicious usage, which is (as mentioned above) essential anyway, as a full security check on this code would be virtually impossible. And, as mentioned, Python itself makes a fine DSL for this kind of thing. ChrisA -- https://mail.python.org/mailman/listinfo/python-list