Re: Refactoring in a large code base
Ben Finney : > The author points out there are times when a code base is large and > complex enough that refactoring puts the programmer in a state of not > knowing whether they're making progress, because until the whole > refactoring is complete the errors just cascade and it's hard to tell > which ones are relevant. I've been there. I think the root problem is to have a code base that's so large and complex. It *could* be avoided if the engineering director only cared. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Friday, January 22, 2016 at 1:59:15 PM UTC+5:30, Marko Rauhamaa wrote: > Ben Finney : > > > The author points out there are times when a code base is large and > > complex enough that refactoring puts the programmer in a state of not > > knowing whether they're making progress, because until the whole > > refactoring is complete the errors just cascade and it's hard to tell > > which ones are relevant. > > I've been there. I think the root problem is to have a code base that's > so large and complex. Bizarre comment... Are you saying large and complex code-bases should non-exist? > > It *could* be avoided if the engineering director only cared. Some problems are trivially solvable... for those who have the knowhow Some problems are inherently hard but easily detectable as such... once again for those who have the knowhow And some are literally (and ironically trivially) unsolvable The CS-trinity: 'normal' problems, problems in NP, undecidable problems is a classic example of this. However applying that in real-world practice can be highly non-trivial, requiring from specialized knowledge to intelligence to genius. IOW "engineering director does not care" is likely true but also a gross oversimplification -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
Rustom Mody : > On Friday, January 22, 2016 at 1:59:15 PM UTC+5:30, Marko Rauhamaa wrote: >> I've been there. I think the root problem is to have a code base >> that's so large and complex. > > Bizarre comment... Are you saying large and complex code-bases should > non-exist? Why, yes, I am. >> It *could* be avoided if the engineering director only cared. > > Some problems are trivially solvable... for those who have the knowhow > Some problems are inherently hard but easily detectable as such... > once again for those who have the knowhow And some are literally (and > ironically trivially) unsolvable The knowhow, vision and skill is apparently very rare. On the product management side, we have the famous case of Steve Jobs, who simply told the engineers to go back to the drawing boards when he didn't like the user experience. Most others would have simply surrendered to the mediocre designs and shipped the product. We need similar code sanity management. Developers are given much too much power to mess up the source code. That's why "legacy" is considered a four-letter word among developers. Marko -- https://mail.python.org/mailman/listinfo/python-list
Deprecation warnings for the future async and await keywords
I enabled the deprecation warnings in Python 3.5.1 and Python 3.6 dev, and I noticed that assigning to async or await does not issue any deprecation warning: $ python -Wd -c "import sys; print(sys.version); async = 33" 3.5.1 (default, Jan 21 2016, 19:59:28) [GCC 4.8.4] $ python -Wd -c "import sys; print(sys.version); async = 33" 3.6.0a0 (default:4b434a4770a9, Jan 12 2016, 13:01:29) [GCC 4.8.4] Is it normal? -- Marco Buttu -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa wrote: > The knowhow, vision and skill is apparently very rare. On the product > management side, we have the famous case of Steve Jobs, who simply told > the engineers to go back to the drawing boards when he didn't like the > user experience. Most others would have simply surrendered to the > mediocre designs and shipped the product. > > We need similar code sanity management. Developers are given much too > much power to mess up the source code. That's why "legacy" is considered > a four-letter word among developers. So what do you do with a huge program? Do you send it back to the developers and say "Do this is less lines of code"? CPython is a large and complex program. How do you propose doing it "right"? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
PyDev 4.5.3 Released
Release Highlights: --- * Debugger * Fixed issue in set next statement (#PyDev 651). * pydevd.settrace was stopping inside the debugger and not in user code (#PyDev 648). * subprocess.Popen could crash when running non python executable (#PyDev 650). * PyUnit view * The last pinned test suite appears as the first entry in the history. * More information is shown on the test run history. * A string representation of the test suite can be saved in the clipboard (last item in the test run history). * Indexing: fixed issue where the indexing and code-analysis could race with each other and one could become corrupt. What is PyDev? --- PyDev is an open-source Python IDE on top of Eclipse for Python, Jython and IronPython development. It comes with goodies such as code completion, syntax highlighting, syntax analysis, code analysis, refactor, debug, interactive console, etc. Details on PyDev: http://pydev.org Details on its development: http://pydev.blogspot.com What is LiClipse? --- LiClipse is a PyDev standalone with goodies such as support for Multiple cursors, theming, TextMate bundles and a number of other languages such as Django Templates, Jinja2, Kivy Language, Mako Templates, Html, Javascript, etc. It's also a commercial counterpart which helps supporting the development of PyDev. Details on LiClipse: http://www.liclipse.com/ Cheers, -- Fabio Zadrozny -- Software Developer LiClipse http://www.liclipse.com PyDev - Python Development Environment for Eclipse http://pydev.org http://pydev.blogspot.com PyVmMonitor - Python Profiler http://www.pyvmmonitor.com/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Fri, 22 Jan 2016 12:19:50 +0200, Marko Rauhamaa wrote: > We need similar code sanity management. Developers are given much too > much power to mess up the source code. That's why "legacy" is considered > a four-letter word among developers. When I started in this business, in the mid-70s, there was the prospect of my working under a "programmer-analyst" - there was, then, a whole hierarchy of programmers. I resisted that bitterly and was "lucky" enough to be at the forefront of changes - I would be able to avoid that until the concept was dead. Now, 40 years later ... it seems like a good idea to me ... but more dead than it's ever been and getting deader all the time. Part of the problem is that the whole profession is dead - now the people doing the programming are the application experts, and just programmers are considered at the level that we used to consider "operators" were at :) -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
Chris Angelico : > On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa wrote: > So what do you do with a huge program? Modularize. Treat each module as a separate product with its own release cycle, documentation, apis, ownership etc. What is a reasonable size of a module? It is something you would consider replacing with a new implementation with a moderate effort (say, in a single quarter). > CPython is a large and complex program. How do you propose doing it > "right"? I don't know CPython specifically to give solid recommendations, but I would imagine the core language engine should be in a repository separate from the standard library, and most standard library modules should be in their respective repositories and have their individual internal release cycles. A CPython release would then weave the package together from the components that were previously (internally) released. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Friday, January 22, 2016 at 4:49:19 PM UTC+5:30, Chris Angelico wrote: > On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa wrote: > > The knowhow, vision and skill is apparently very rare. On the product > > management side, we have the famous case of Steve Jobs, who simply told > > the engineers to go back to the drawing boards when he didn't like the > > user experience. Most others would have simply surrendered to the > > mediocre designs and shipped the product. > > > > We need similar code sanity management. Developers are given much too > > much power to mess up the source code. That's why "legacy" is considered > > a four-letter word among developers. > > So what do you do with a huge program? Do you send it back to the > developers and say "Do this is less lines of code"? > > CPython is a large and complex program. How do you propose doing it "right"? Put thus 'generistically' this is a rhetorical question and makes Marko look like he's making a really foolish point Specifically, what little Ive seen under the CPython hood looked distinctly improvable. egs. 1. My suggestion to have the docs re. generator-function vs generator-objects cleaned up had no takers 2. My students trying to work inside the lexer made a mess because the extant lexer is a mess. I.e. while python(3) *claims* to accept Unicode input, the actual lexer is an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to unicode These are just specific examples that I am familiar with Chris' general point still stands, viz take the large and complex program that is cpython and clean up these messinesses: You will still have a large and complex program -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
Rustom Mody : > These are just specific examples that I am familiar with Chris' > general point still stands, viz take the large and complex program > that is cpython and clean up these messinesses: You will still have a > large and complex program No, as long as the ugly parts are compartmentalized, you have a better chance at refactoring them -- or replacing them altogether. Modularization is an obvious, but under-practiced, method of managing complexity. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Fri, Jan 22, 2016 at 10:54 PM, Marko Rauhamaa wrote: > Chris Angelico : > >> On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa wrote: >> So what do you do with a huge program? > > Modularize. Treat each module as a separate product with its own release > cycle, documentation, apis, ownership etc. > > What is a reasonable size of a module? It is something you would > consider replacing with a new implementation with a moderate effort > (say, in a single quarter). > >> CPython is a large and complex program. How do you propose doing it >> "right"? > > I don't know CPython specifically to give solid recommendations, but I > would imagine the core language engine should be in a repository > separate from the standard library, and most standard library modules > should be in their respective repositories and have their individual > internal release cycles. > > A CPython release would then weave the package together from the > components that were previously (internally) released. Okay. So let's suppose we strip out huge slabs of the standard library and make an absolutely minimal "base library", with an "extended library" that can run on its own separate release cycle. (This has already been discussed; the biggest problems with the idea aren't technical, but logistical - not just for the Python devs but for everyone who has to get approval for software upgrades.) Let's suppose the base library consists of just the modules necessary for a basic invocation: rosuav@sikorsky:~$ python3 -c 'import sys; print(sorted(sys.modules.keys()))' ['__main__', '_codecs', '_collections_abc', '_frozen_importlib', '_frozen_importlib_external', '_imp', '_io', '_signal', '_sitebuiltins', '_stat', '_sysconfigdata', '_thread', '_warnings', '_weakref', '_weakrefset', 'abc', 'builtins', 'codecs', 'encodings', 'encodings.aliases', 'encodings.latin_1', 'encodings.utf_8', 'errno', 'genericpath', 'io', 'marshal', 'os', 'os.path', 'posix', 'posixpath', 'site', 'stat', 'sys', 'sysconfig', 'zipimport'] Alright. Can you rewrite all of those modules in three months? Not even digging into the language itself, just the base library. This is the bare minimum to get a viable Python execution environment going (you might be able to cut it down a bit, but not much), so it can't be modularized into separate projects. And then there's the language itself. The cpython/Python directory has 58 .c files, many of which are closely tied to each other. The cpython/Objects directory has another 39, representing specific object types (bytes, tuple, range, method) that are implemented in C. And cpython/Parser has 17 more just to handle the language parser. Edits often affect multiple files and must be kept in sync. How would you modularize that out? Which part would you spin off as a separate project with its own release cycle? The garbage collector? The string object? The peephole optimizer? The import machinery? Each of these is already too big to rewrite in three months, plus they're fairly tightly linked to all the other modules. All that code represents the accumulation of hundreds of thousands of fixes to prevent tens of millions of bugs (some of which will be visible on bugs.python.org, but most would have been found and prevented during early testing); throwing the code away means throwing all that away. http://www.joelonsoftware.com/articles/fog69.html I don't agree with everything Joel says, but seriously, do not waste your time with a full rewrite - even in theory. And I can say this from hard experience on both sides. I have an active project for a MUD server, which was originally deployed as a byte-oriented service (it took ASCII-compatible bytes from clients and sent those same octets out to other clients). When I decided that the server should work with Unicode text internally (expecting and transmitting UTF-8), I kept on coming across stupid problems where the code had been written with faulty assumptions, and I had to keep on fixing those. Would it have been better to throw the code away and start over? Well, let me tell you, it would certainly have made the Unicode handling a lot easier, so if you're looking at starting your own project, make sure you learn from my hassles and bake in Unicode support from the start! But that would have meant throwing away all the bugfixes for all the bugs that I'd noticed across the years, such as: 1) On login, typing "quit" when prompted for a user name or password would log you out. The "passwd" (change password) command had to also prevent you from *setting* your password to "quit", because that would effectively lock your account against login. 2) Some clients send backspace as 08; others send FF; some send 08 20 08. Cope with them all. 3) If a bug prevents the admin account from working, there needs to be a way to diagnose and fix that code using shell access to the back-end server, without needing the actual admin account. Etcetera, etcetera, etcetera. There's no way to "rewrite but k
Re: Refactoring in a large code base
On Fri, 22 Jan 2016 04:04:44 -0800, Rustom Mody wrote: > These are just specific examples that I am familiar with Chris' general > point still stands, viz take the large and complex program that is > cpython and clean up these messinesses: You will still have a large and > complex program I'm not really sure what the point is we're working on... let me propose these: - unix principle is good: keep things simple, limited in scope. Then leverage that. - there will always be complexity, but if the complexity is modularized, it's controlled. In particular, the complexity of a program should represent the complexity of the problem. I call that "structural complexity". To be avoided, corrected, is "superficial complexity", where the complexity of a system is squished into a single (or reduced number of) planes. Like vomiting a program onto a desk. - "Advice" that the program needs to be refracted is generally not helpful. -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Fri, Jan 22, 2016 at 11:04 PM, Rustom Mody wrote: > On Friday, January 22, 2016 at 4:49:19 PM UTC+5:30, Chris Angelico wrote: >> On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa wrote: >> > The knowhow, vision and skill is apparently very rare. On the product >> > management side, we have the famous case of Steve Jobs, who simply told >> > the engineers to go back to the drawing boards when he didn't like the >> > user experience. Most others would have simply surrendered to the >> > mediocre designs and shipped the product. >> > >> > We need similar code sanity management. Developers are given much too >> > much power to mess up the source code. That's why "legacy" is considered >> > a four-letter word among developers. >> >> So what do you do with a huge program? Do you send it back to the >> developers and say "Do this is less lines of code"? >> >> CPython is a large and complex program. How do you propose doing it "right"? > > Put thus 'generistically' this is a rhetorical question and makes Marko look > like > he's making a really foolish point > > Specifically, what little Ive seen under the CPython hood looked distinctly > improvable. egs. > > 1. My suggestion to have the docs re. generator-function vs generator-objects > cleaned up had no takers > 2. My students trying to work inside the lexer made a mess because the extant > lexer is a mess. > I.e. while python(3) *claims* to accept Unicode input, the actual lexer is > an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to > unicode > > These are just specific examples that I am familiar with Yes, there are some parts of CPython that can be improved. That's true of every large project (it's said that every program has at least one bug and could be shortened by at least one instruction, from which it can be deduced that every program can be reduced to a single instruction that doesn't work). Regarding lexers specifically, I have never seen any full-size language parser that I've wanted to tinker with. They're always highly optimized pieces of code, dealing with innumerable edge and corner cases, and exploring them is always like dipping my toe into something that's either ice-cold water or highly caustic acid, and I can't tell which. > Chris' general point still stands, viz take the large and complex program > that is cpython > and clean up these messinesses: You will still have a large and complex > program Right. You could definitely spin off *some* of CPython into a separate project (flip through the standard library - quite a few of those modules, if proposed for stdlib inclusion today, would be denied "better on PyPI"), but my point isn't that it can't be improved, but that there's an irreducible complexity to it that exceeds the "rewrite in a quarter" mark by a huge margin. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
Chris Angelico : > Alright. Can you rewrite all of those modules in three months? The point is not to rewrite modules except as a fallback for a hopelessly badly written module. > And then there's the language itself. The cpython/Python directory has > 58 .c files, many of which are closely tied to each other. Putting modularization in place after the spaghetti has been created is very hard. It can be done, though, and good things come out of the effort. > I don't agree with everything Joel says, but seriously, do not waste > your time with a full rewrite - even in theory. And I can say this > from hard experience on both sides. As long as you are happy with your code base, you don't have to change everything just for an abstract principle. However, as a matter of rule, older code bases have been bloated till they can barely be maintained. That's when the management starts to listen to new ideas. Better late than never. > Would it have been better to throw the code away and start over? Again, modularization doesn't entail rewriting -- it simply makes localized rewriting a practical option for desperate situations. > So how can you rewrite *any* large project in three months? Let go of the rewriting already. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Sat, Jan 23, 2016 at 12:00 AM, Marko Rauhamaa wrote: > However, as a matter of rule, older code bases have been bloated till > they can barely be maintained. That's when the management starts to > listen to new ideas. Better late than never. Okay. Start persuading "management" (presumably the PSU) that CPython needs to be more modular, with different release cycles for different components. Your first step is to figure out the boundaries between those components. Get started. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Friday, January 22, 2016 at 6:05:02 PM UTC+5:30, Chris Angelico wrote: > On Fri, Jan 22, 2016 at 11:04 PM, Rustom Mody wrote: > > 2. My students trying to work inside the lexer made a mess because the > > extant lexer is a mess. > > I.e. while python(3) *claims* to accept Unicode input, the actual lexer is > > an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to > > unicode > > > > These are just specific examples that I am familiar with > > > Regarding lexers specifically, I have never seen any full-size > language parser that I've wanted to tinker with. They're always highly > optimized pieces of code, dealing with innumerable edge and corner > cases, and exploring them is always like dipping my toe into something > that's either ice-cold water or highly caustic acid, and I can't tell > which. > You just gave a graphic vivid description... of the same thing Marko is describing: ;-) viz. A full-size language parser is something that you - an experienced developer - make a point of avoiding. So then the question comes down to this: Is this the order of nature? Or is it man-made disorder? Jury's out on that one for lexers/parsers specifically. For arbitrary code in general, the problem that it may be arbitrarily and unboundedly complex/complicated is the oldest problem in computer science: the halting problem. IOW anyone who thinks that *arbitrary* complexity can *always* be tamed either has a visa to utopia or needs to re-evaluate (or get) a CS degree -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
Rustom Mody : > IOW anyone who thinks that *arbitrary* complexity can *always* be > tamed either has a visa to utopia or needs to re-evaluate (or get) a > CS degree Not all complexity can be tamed, but what you can't tame you shouldn't release, either. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
Chris Angelico : > Okay. Start persuading "management" (presumably the PSU) that CPython > needs to be more modular, with different release cycles for different > components. Your first step is to figure out the boundaries between > those components. Get started. Gladly, I don't need to do anything about CPython. This particular strawman was erected by you: > CPython is a large and complex program. How do you propose doing it > "right"? It's up to you to take my proposal or ignore it. However, I have had my share of windmills to battle; I'm happy to say I've managed to bring down a few of them! Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Friday, January 22, 2016 at 7:13:49 PM UTC+5:30, Marko Rauhamaa wrote: > Rustom Mody : > > > IOW anyone who thinks that *arbitrary* complexity can *always* be > > tamed either has a visa to utopia or needs to re-evaluate (or get) a > > CS degree > > Not all complexity can be tamed, but what you can't tame you shouldn't > release, either. And how do you propose to legislate that? If you leave it to the wetware (untamed!) boxes atop our shoulders will not two developers have wildly different complexity thresholds? And as soon as you suggest an objective (∴ algorithmic) solution to detecting complexity you have landed with the halting problem (or more precisely Rice theorem) tl;dr The HP is amazingly deceptive and you just got tripped by it -- https://mail.python.org/mailman/listinfo/python-list
Question about how to do something in BeautifulSoup?
I hope this is an appropriate mailing list for BeautifulSoup questions, it's been a long time since I've used python-list and I don't remember if third-party modules are on topic. I did try posting to the BeautifulSoup mailing list on Google groups, but I've waited a day or two and my message hasn't been approved yet. Say I have the following HTML (I hope this shows up as plain text here rather than formatting): "Is today the day?" And I want to extract the "Is today the day?" part. There are other places in the document with and , but this is the only place that uses color #00, so I want to extract anything that's within a color #00 style, even if it's nested multiple levels deep within that. - Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's defined as #00 - Sometimes the is within the and sometimes the is within the . - There may be other discrepancies I haven't noticed yet How can I do this in BeautifulSoup (or is this better done in lxml.html)? Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Sat, Jan 23, 2016 at 12:30 AM, Rustom Mody wrote: > You just gave a graphic vivid description... > of the same thing Marko is describing: ;-) viz. > A full-size language parser is something that you - an experienced developer - > make a point of avoiding. It's worth noting that "experienced developer" covers a huge range of skills. There are quite a few other areas that I do not tinker with (crypto, CPU-level optimizations, and such), not because they're impossible to understand, but because *I* have not the skill to understand and improve them. This does mean they're complicated (they're beyond the "one weekend of tinkering" barrier that any serious geek should be able to invest), but I'm sure there are language nerds out there who are so familiar with the grammar of that they'll pick up CPython's grammar and make a change with confidence that it'll do what they expect. > So then the question comes down to this: Is this the order of nature? > Or is it man-made disorder? > Jury's out on that one for lexers/parsers specifically. Lexers/parsers are as complicated as the grammars they parse. A lexer for a simple structured text file can be pretty easy to implement; for instance, JSON is pretty straight-forward, with only a handful of cases (insignificant whitespace, three keywords, two recursive structures that start with specific characters ('{' and '['), strings (which start with '"'), and numbers (which start with a digit or a hyphen)), so a parser need only look for those few possibilities and it knows exactly what else to fetch up. I could probably write a JSON parser in a fairly short space of time, and wouldn't be scared of digging into the internals of someone else's. It's when the grammar adds complexities to deal with the real-world issues of full size programming languages that it becomes hairier. The CPython grammar is only ~150 lines of fairly readable directives, but the parser that implements it is ~3500 lines of C code. Pike merges the two into a YACC file of nearly 5000 lines of highly optimized code (it has different grammar paths for things a human would consider the same, in order to produce distinct code). That's where I'm ubercautious. > For arbitrary code in general, the problem that it may be arbitrarily and > unboundedly > complex/complicated is the oldest problem in computer science: the halting > problem. > > IOW anyone who thinks that *arbitrary* complexity can *always* be tamed either > has a visa to utopia or needs to re-evaluate (or get) a CS degree Exactly. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Refactoring in a large code base
On Sat, Jan 23, 2016 at 12:48 AM, Marko Rauhamaa wrote: > Chris Angelico : > >> Okay. Start persuading "management" (presumably the PSU) that CPython >> needs to be more modular, with different release cycles for different >> components. Your first step is to figure out the boundaries between >> those components. Get started. > > Gladly, I don't need to do anything about CPython. This particular > strawman was erected by you: > >> CPython is a large and complex program. How do you propose doing it >> "right"? > > It's up to you to take my proposal or ignore it. > > However, I have had my share of windmills to battle; I'm happy to say > I've managed to bring down a few of them! Okay. You have fun releasing nothing that you aren't confident can fit within your definitions (and by the way, you were the one who brought up the three-month rewrite, not me); I'm going to keep on following the principle that "practicality beats purity", and release actual working code. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: How clean/elegant is Python's syntax?
On Friday, May 31, 2013 at 1:06:29 AM UTC+5:30, Steven D'Aprano wrote: > On Thu, 30 May 2013 10:12:22 -0700, rusi wrote: > > > On Thu, May 30, 2013 at 9:34 AM, Ma Xiaojun wrote: > > >> Wait a minute! Isn't the most nature way of doing/thinking "generating > >> 9x9 multiplication table" two nested loop? > > > > Thats like saying that the most natur(al) way of using a car is to > > attach a horse to it. > >[...] > > Likewise in the world of programming, 90% of programmers think > > imperative/OO programming is natural while functional programming is > > strange. Just wait 10 years and see if things are not drastically > > different! > > It won't be. Functional programming goes back to Lisp, which is nearly as > old as Fortran and older than Cobol. There have been many decades for > functional languages to become mainstream, but they've never quite done > it. There's no reason to think that the next decade will see a change to > this. Interesting point... With interesting (counter)examples: http://blog.languager.org/2016/01/how-long.html [With apologies for necroposting] -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about how to do something in BeautifulSoup?
inhahe wrote: > I hope this is an appropriate mailing list for BeautifulSoup questions, > it's been a long time since I've used python-list and I don't remember if > third-party modules are on topic. I did try posting to the BeautifulSoup > mailing list on Google groups, but I've waited a day or two and my message > hasn't been approved yet. > > Say I have the following HTML (I hope this shows up as plain text here > rather than formatting): > > "Is today the day?" > > And I want to extract the "Is today the day?" part. There are other places > in the document with and , but this is the only place that > uses color #00, so I want to extract anything that's within a color > #00 style, even if it's nested multiple levels deep within that. > > - Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's > defined as #00 > - Sometimes the is within the and sometimes the is > within the . > - There may be other discrepancies I haven't noticed yet > > How can I do this in BeautifulSoup (or is this better done in lxml.html)? > Thanks I don't see how to do this with a lot of glue code, but it may get you started: def recursive_attr(elem, path): path = path.split("/") for name in path: if elem is None: break elem = getattr(elem, name) return elem def find(soup): for outer in soup.find_all( "span", style=re.compile(r"color:\s*(RGB\(0,\s*0,\s* 0\)|#00)")): for inner in [ recursive_attr(outer, "strong/em"), recursive_attr(outer, "em/strong"),]: if inner is not None: yield inner.string def normalize_ws(s): return " ".join(s.split()) html = ... soup = bs4.BeautifulSoup(html) for match in find(soup): print(normalize_ws(match)) -- https://mail.python.org/mailman/listinfo/python-list
one more question on regex
python 3.4.3 import re re.search('(ab){2}','abzzabab') <_sre.SRE_Match object; span=(4, 8), match='abab'> >>> re.findall('(ab){2}','abzzabab') ['ab'] Why for search() the match is 'abab' and for findall the match is 'ab'? -- https://mail.python.org/mailman/listinfo/python-list
Re: one more question on regex
mg wrote: > python 3.4.3 > > import re > re.search('(ab){2}','abzzabab') > <_sre.SRE_Match object; span=(4, 8), match='abab'> > re.findall('(ab){2}','abzzabab') > ['ab'] > > Why for search() the match is 'abab' and for findall the match is 'ab'? I suppose someone thought it was convenient for findall to return the explicit groups if there are any. If you want the whole match aka group(0) you can get that with >>> re.findall('(?:ab){2}','abzzabab') ['abab'] -- https://mail.python.org/mailman/listinfo/python-list
Re: one more question on regex
Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto: > python 3.4.3 > > import re re.search('(ab){2}','abzzabab') > <_sre.SRE_Match object; span=(4, 8), match='abab'> > re.findall('(ab){2}','abzzabab') > ['ab'] > > Why for search() the match is 'abab' and for findall the match is 'ab'? finditer seems to be consistent with search: regex = re.compile('(ab){2}') for match in regex.finditer('abzzababab'): print ("%s: %s" % (match.start(), match.span() )) ... 4: (4, 8) -- https://mail.python.org/mailman/listinfo/python-list
Re: Deprecation warnings for the future async and await keywords
On Fri, Jan 22, 2016 at 4:12 AM, Marco Buttu wrote: > I enabled the deprecation warnings in Python 3.5.1 and Python 3.6 dev, and I > noticed that assigning to async or await does not issue any deprecation > warning: > > $ python -Wd -c "import sys; print(sys.version); async = 33" > 3.5.1 (default, Jan 21 2016, 19:59:28) > [GCC 4.8.4] > $ python -Wd -c "import sys; print(sys.version); async = 33" > 3.6.0a0 (default:4b434a4770a9, Jan 12 2016, 13:01:29) > [GCC 4.8.4] > > > Is it normal? They're not reserved words, see https://www.python.org/dev/peps/pep-0492/#transition-plan -- https://mail.python.org/mailman/listinfo/python-list
Re: Deprecation warnings for the future async and await keywords
On 22/01/2016 16:59, Ian Kelly wrote: On Fri, Jan 22, 2016 at 4:12 AM, Marco Buttu wrote: >I enabled the deprecation warnings in Python 3.5.1 and Python 3.6 dev, and I >noticed that assigning to async or await does not issue any deprecation >warning: > >$ python -Wd -c "import sys; print(sys.version); async = 33" >3.5.1 (default, Jan 21 2016, 19:59:28) >[GCC 4.8.4] They're not reserved words, see https://www.python.org/dev/peps/pep-0492/#transition-plan Of course not, in fact I wrote "future keywords" in the subject, because they will be keywords in Python 3.7: https://www.python.org/dev/peps/pep-0492/#deprecation-plans -- Marco Buttu -- https://mail.python.org/mailman/listinfo/python-list
Re: one more question on regex
2016-01-22 16:50 GMT+01:00 mg : > Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto: > >> python 3.4.3 >> >> import re re.search('(ab){2}','abzzabab') >> <_sre.SRE_Match object; span=(4, 8), match='abab'> >> > re.findall('(ab){2}','abzzabab') >> ['ab'] >> >> Why for search() the match is 'abab' and for findall the match is 'ab'? > > finditer seems to be consistent with search: > regex = re.compile('(ab){2}') > > for match in regex.finditer('abzzababab'): > print ("%s: %s" % (match.start(), match.span() )) > ... > 4: (4, 8) > > -- > https://mail.python.org/mailman/listinfo/python-list Hi, as was already pointed out, findall "collects" the content of the capturing groups (if present), rather than the whole matching text; for repeated captures the last content of them is taken discarding the previous ones; cf.: >>> re.findall('(?i)(a)x(b)+','axbB') [('a', 'B')] >>> (for multiple capturing groups in the pattern, a tuple of captured parts are collected) or with your example with differenciated parts of the string using upper/lower case: >>> re.findall('(?i)(ab){2}','aBzzAbAB') ['AB'] >>> hth, vbr -- https://mail.python.org/mailman/listinfo/python-list
Re: one more question on regex
Il Fri, 22 Jan 2016 21:10:44 +0100, Vlastimil Brom ha scritto: > 2016-01-22 16:50 GMT+01:00 mg : >> Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto: >> >>> python 3.4.3 >>> >>> import re re.search('(ab){2}','abzzabab') >>> <_sre.SRE_Match object; span=(4, 8), match='abab'> >>> >> re.findall('(ab){2}','abzzabab') >>> ['ab'] >>> >>> Why for search() the match is 'abab' and for findall the match is >>> 'ab'? >> >> finditer seems to be consistent with search: >> regex = re.compile('(ab){2}') >> >> for match in regex.finditer('abzzababab'): >> print ("%s: %s" % (match.start(), match.span() )) >> ... >> 4: (4, 8) >> >> -- https://mail.python.org/mailman/listinfo/python-list > > Hi, > as was already pointed out, findall "collects" the content of the > capturing groups (if present), rather than the whole matching text; > > for repeated captures the last content of them is taken discarding the > previous ones; cf.: > re.findall('(?i)(a)x(b)+','axbB') > [('a', 'B')] > (for multiple capturing groups in the pattern, a tuple of captured parts > are collected) > > or with your example with differenciated parts of the string using > upper/lower case: re.findall('(?i)(ab){2}','aBzzAbAB') > ['AB'] > hth, >vbr You explanation of re.findall() results is correct. My point is that the documentation states: re.findall(pattern, string, flags=0) Return all non-overlapping matches of pattern in string, as a list of strings and this is not what re.findall does. IMHO it should be more reasonable to get back the whole matches, since this seems to me the most useful information for the user. In any case I'll go with finditer, that returns in match object all the infos that anyone can look for. -- https://mail.python.org/mailman/listinfo/python-list
Re: Question about how to do something in BeautifulSoup?
I think you'd do better using the pyparsing library On Friday, January 22, 2016 at 9:02:00 AM UTC-5, inhahe wrote: > I hope this is an appropriate mailing list for BeautifulSoup questions, > it's been a long time since I've used python-list and I don't remember if > third-party modules are on topic. I did try posting to the BeautifulSoup > mailing list on Google groups, but I've waited a day or two and my message > hasn't been approved yet. > > Say I have the following HTML (I hope this shows up as plain text here > rather than formatting): > > "Is > today the day?" > > And I want to extract the "Is today the day?" part. There are other places > in the document with and , but this is the only place that > uses color #00, so I want to extract anything that's within a color > #00 style, even if it's nested multiple levels deep within that. > > - Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's defined > as #00 > - Sometimes the is within the and sometimes the is > within the . > - There may be other discrepancies I haven't noticed yet > > How can I do this in BeautifulSoup (or is this better done in lxml.html)? > Thanks -- https://mail.python.org/mailman/listinfo/python-list