Steven D'Aprano: > Usability for beginners is a good thing, but not at the expense of > teaching them the right way to do things. Insisting on explicit requests > before copying data is a *good* thing. If it's a gotcha for newbies, > that's just a sign that newbies don't know the Right Way from the Wrong > Way yet. The solution is to teach them, not to compromise on the Wrong > Way. I don't want to write code where the following is possible: > ... > ... suddenly my code hits an unexpected performance drop > ... as gigabytes of data get duplicated
I understand your point of view, and I tend to agree. But let me express my other point of view. Computer languages are a way to ask a machine to do some job. As time passes, computers become faster, and people find that it becomes possible to create languages that are higher level, that is often more distant from how the CPU actually performs the job, allowing the human to express the job in a way closer to how less trained humans talk to each other and perform jobs. Probably many years ago a language like Python was too much costly in terms of CPU, making it of little use for most non-toy purposes. But there's a need for higher level computer languages. Today Ruby is a bit higher-level than Python (despite being rather close). So my mostly alternative answers to your problem are: 1) The code goes slow if you try to perform that operation? It means the JIT is "broken", and we have to find a smarter JIT (and the user will look for a better language). A higher level language means that the user is more free to ignore what's under the hood, the user just cares that the machine will perform the job, regardless how, the user focuses the mind on what job to do, the low level details regarding how to do it are left to the machine. It's a job of the JIT writers to allow the user to do such job anyway. So the JIT must be even smarter, and for example it partitions the 1 GB of data in blocks, each one of them managed with copy-on-write, so maybe it just copies few megabytes or memory. Such language may need to be smart enough. Despite that I think today lot of people that have a 3GHZ CPU that may accept to use a language 5 times slower than Python, that for example uses base-10 floating point numbers (they are different from Python Decimal numbers). Almost every day on the Python newsgroup a newbie asks if the round() is broken seeing this: >>> round(1/3.0, 2) 0.33000000000000002 A higher level language (like Mathematica) must be designed to give more numerically correct answers, even if it may require more CPU. But such language isn't just for newbies: if I write a 10 lines program that has to print 100 lines of numbers I want it to reduce my coding time, avoiding me to think about base-2 floating point numbers. If the language use a higher-level numbers by default I can ignore that problem, and my coding becomes faster, and the bugs decrease. The same happens with Python integers: they don't overflow, so I may ignore lot of details (like taking care of possible oveflows) that I have to think about when I use the C language. C is faster, but such speed isn't necessary if I need to just print 100 lines of output with a 3 GHz PC. What I need in such situation is a language that allows me to ignore how numbers are represented by the CPU, and prints the correct numbers on the file. This is just a silly example, but it may show my point of view (another example is below). 2) You don't process gigabytes of data with this language, it's designed to solve smaller problems with smaller datasets. If you want to solve very big problems you have to use a lower level language, like Python, or C, or assembly. Computers allow us to solve bigger and bigger problems, but today the life is full of little problems too, like processing a single 50-lines long text file. 3) You buy an even faster computer, where even copying 1 GB of data is fast enough. Wolfram: >Have a look at Tools/Scripts/pindent.py Oh, that's it, almost. Thank you. Bye, bearophile ----------------------- Appendix: Another example, this is a little problem from this page: http://www.faqs.org/docs/abs/HTML/writingscripts.html >Find the sum of all five-digit numbers (in the range 10000 - 99999) containing >exactly two out of the following set of digits: { 4, 5, 6 }. These may repeat >within the same number, and if so, they count once for each occurrence.< I can solve it in 3.3 seconds on my old PC with Python like this: print sum(n for n in xrange(10000, 100000) if len(set(str(n)) & set("456")) == 2) [Note: that's the second version of the code, the first version was buggy because it contained: ... & set([4, 5, 6]) So I have used the Python shell to see what set(str(12345))&set("456") was, the result was an empty set. So it's a type bug. A statically language like D often can't catch such bugs anyway, because chars are seen as numbers.] In Python I can write a low-level-style code like this that requires only 0.4 seconds with Psyco (it's backported from the D version, because it has allowed me to think at lower-level. I was NOT able to reach such low level and high speed writing a progam just for Psyco): def main(): digits = [0] * 10 tot = 0 for n in xrange(10000, 100000): i = n digits[4] = 0 digits[5] = 0 digits[6] = 0 digits[i % 10] = 1; i /= 10 digits[i % 10] = 1; i /= 10 digits[i % 10] = 1; i /= 10 digits[i % 10] = 1; i /= 10 digits[i % 10] = 1 if (digits[4] + digits[5] + digits[6]) == 2: tot += n print tot import psyco; psyco.bind(main) main() Or I can solve it in 0.07 seconds in D language (and about 0.05 seconds in very similar C code with -O3 -fomit-frame-pointer): void main() { int tot, d, i; int[10] digits; for (uint n = 10_000; n < 100_000; n++) { digits[4] = 0; digits[5] = 0; digits[6] = 0; i = n; digits[i % 10] = 1; i /= 10; digits[i % 10] = 1; i /= 10; digits[i % 10] = 1; i /= 10; digits[i % 10] = 1; i /= 10; digits[i % 10] = 1; if ((digits[4] + digits[5] + digits[6]) == 2) tot += n; } printf("%d\n", tot); } Assembly may suggest a bit lower level ways to solve the same problem (using an instruction to compute div and mod at the same time, that can go in EAX and EDX?), etc. But if I just need to solve that "little" problem once, I may want to reduce the sum of programming time + running time, so the in such situation the first Python version wins (despite the quickly fixed bug). That's why today people often use Python instead of C for small problems. Similar things can be said about a possible language that is a little higher level than Python. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list