"Scott David Daniels" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Christian Stapfer wrote: >> "Steve Holden" <[EMAIL PROTECTED]> wrote in message >> news:[EMAIL PROTECTED] >> >>>Christian Stapfer wrote: >>> >>>>"George Sakkis" <[EMAIL PROTECTED]> wrote in message >>>>news:[EMAIL PROTECTED] >>>> >>>> >>>>>"Christian Stapfer" <[EMAIL PROTECTED]> wrote: >>>>> >>>>>><[EMAIL PROTECTED]> wrote: >>>>>>>try to use set. > A reasonable suggestion. A set must have the trade-offs involved. > The abstraction itself brings to mind the issues, and the performance > can, at least in theory, be handled there. If that is true (that > the "set" abstraction "sees" the problem), then you can rely on the > Python implementation of "set" to either now, or eventually, have > a "good" implementation -- one not too far off efficient.
It is, unfortunately, not infrequently the case that there are different implementations for a given data type that are efficient in different ways - but that there is no one single implementation that is the most efficient in *all* regards. Thus the implementer must make a trade-off, and that trade-off has consequences, performance wise, that he might want to tell the users of his module about... > The Python gang is good at this stuff; I'm not denying it. The question is about telling users upfront what the consequences are (roughly) of the trade-offs that have been made so that they can use the modules that have been provided more effectively. > a "better" set implementation will win if > it can show better performance without > related down-sides. Is there ever such a thing? I always thought that, most of the time, there is no such thing as a free lunch in this field. > As to the "either now, or eventually;" if you _must_ have performance > now, not in some abstract future, then it behooves you to _test_, > _test_, _test_! Well, I might want to test: but *first* I want to design, preferably in "armchair style" - at least as regards the basic approach that I want to take. This "armchair stage" involves not infrequently the use of rough complexity measures to exclude the clearly unusable and chose, instead, an approach that is very likely efficient enough for what I need. >>>> If the documentation stated the order-of-magnitude >>>>behavior of those basic operations up front, then >>>>I (and *anyone* else who ever wanted to use those >>>>operations on large lists / large sets) could do >>>>a quick order-of-magnitude estimation of how >>>>a certain program design will behave, performance >>>>wise. > And, if the proper documentation is in place, and it > says "dictionary lookup is O(N)" (and you avoid using > it for exactly that reason), how surprised will you be > to discover that the O(N) is only reached if the hash > values of the keys are all equal? It's not that difficult to distinguish *average* case and *worst* case scenarios. It might even be a good idea to state, if that can easily be done, what the *best* case happens do be... > Oh, maybe you expect "O(N)" to really mean "\Theta(N)". > Then, if you are a dweeb like me, you will respond that > "This is not possible, a dictionary of size N must take at > least 'O(lg N)' to read the key, never mind processing it." > But, it turns out, that given a bound on the size of a > process, processing an address is "O(1)", not "O(lg N)". > Is that too practical for you, or not enough? I do not expect a developer to expend *inordinate* amounts of work to figure out the computational complexity of what he has implemented. But, as I wrote, he *must* have thought about the matter, and is thus, by definition, in a rather good position to provide what he can frequently simply pull from memory when it comes to documenting the interface of his module. >>>> *Experimenting* is not necessarily as easy to >>>>do as you seem to believe. > > >>>>How do you, for example, hit upon the worst-case behavior with your test >>>>data? > Are you saying the documentation should characterize the > cases that achieve worst-case behavior? This is a stiff > burden indeed, not one I am used to in even the most rigorous > classes I've seen. If it happens to be particularly difficult, for a given implementation (of whatever abstract data type), then, of course, state this upfront and leave it at that. > If there is such a characterization, > how burned will you feel if a case is overlooked and that > case is the one that you sold to your customer? I am not at all a lawyer type, as you seem to imagine. I just want to suggest that some (to the implementer - but not the average user) *easily* available information about computational complexity (especially for the most basic data types) would be a good thing to have. Nothing more. > Are you > willing to provide the same guaranteed performance and > documentation of performance to your customer that you > you expect of the Python system? I'm using python mainly as a very handy language to write whatever (usually relatively small) utility programs I happen to require for my main job. I am not (currently) developing any "serious" programs that are targeted for sale. (Thought the proper functioning and adequate efficiency of my personal utility programs really does matter, but it mainly matters for *me* - and for my "customers", if I can be said to have any, which seems doubtful, it only matters indirectly at best.) Even so, I frequently need to have a reasonable idea of what computational complexity attaches to certain operations on certain data types. But I hardly ever have the time to go into an extended period of first comparing several different approaches by way of experiment. > Would you mind if the quality is proportional to > the price you paid? There is really not need to indulge in polemics like this. I am just making a suggestion as to what information might *help* improve the usability of Python modules. If making such suggestions regularly only led to negative exchanges of this sort, there would be very little room for learning in this community indeed... >>>You are, of course, either assuming that there's a >>>single implementation of Python, >> Of course not! >> >>>or that all implementations have the same behaviour. >> Of course not! > >> But it is reasonable, anyway, to ask for information >> about a specific implementation (that one is *forced* >> to use). > You are not _forced_ to use any implementation of Python. What I wanted to say is *not*, that it appears to be a terrible burden to my inflated consumer-ego to have to use something not *absolutely* perfect in *all* respects, such as Python. What I wanted to say, here, is just this: *whenever* a Python program is run, it will run on some specific implementation of Python. That much should be obvious. > You are free to implement your own Python system. Don't be ridiculous. >> And if the implementer wants to complain that >> giving such information would break his wonderful >> abstraction then I can only answer: It is *reality* >> that *will* break your abstraction as regards >> performance! It is, therefore, absolutely *no* >> use to *pretend* you *can* avoid this form of "breaking >> the abstraction" by simply avoiding to mention it >> in the documentation... > I simply repeat: I have never seen a paper characterizing > the performance of an algorithm as "O(expr(N))" that described > in details _all_ cases that reached that limit. That's what's usually called "throwing out the baby with the bathwater". It is you (and only you) here, who is arguing that if perfection is not attainable (with reasonable effort), nothing should be done at all. I as am willing to live with imperfect module inferface specification as I am willing with imperfect tools more generally. But that will not stop me from making *suggestions* for improvement (not arrogant demands, mind you). > At most such > papers describe _one_ such cases. Nor have I ever seen papers > describing the performance of an algorithm as "\Theta(expr(N))" > that characterized the cases that broke the "\Theta" performance. To repeat: I was not asking for the impossible, not even asking for the difficult, but simply asking for the dumping of the (to the implementer, but not the average user) obvious and reasonably certain information about computational complexity of what operations are exposed in the interface of a module. I have seen *many* books on algorithms that specify computational complexity measures for the algorithms described. Any really good library of algorithms will at least try to reach that standard, too. > If you provide me the papers, provide me a C compiler with > equivalent docs on all C expressions, and provide me the funding > to update the Python docs, I will be happy to do so for a single > version of Python and a since version of CPython. I am not asking you, nor anyone else, to do any amount of boring or otherwise hard work here. > I expect I > will have an easier time of it than the IronPython people will > have. >> I consider it the job of the implementer to know >> about the trade-offs that he has been making in >> choosing one particular implementation, and to >> know what computational complexity therefore >> attaches to the various operations exposed in >> its interface. > Am I to take it you provide this to all of your customers? I have no customers (lucky me). And I did not want to pose as the all too frequently encountered species of the arrogant (and equally ignorant) customer who expects nothing but perfection from *others* (but not himself, of course). I am not at all like that: you are attacking a figment of your imagination. >> How reasonable is it to ask me, or anyone else >> for that matter, to extract, experiment-wise >> (if it can be done at all with reasonable effort) >> information that properly belongs to the implementer >> and really should have been exposed in the >> documentation in the first place? > Not at all reasonable. How reasonable is it to ask > me to provide you support information for free? Even providing the most basic information about the interface of a module, even providing the mere names of member functions, could be denied on *that* basis. Thus, this argument fails. It fails simply because it "proves" too much... Regards, Christian -- »In the long run men hit only what they aim at. Therefore, though they should fail immediately, they had better aim at something high.« - Henry David Thoreau: 'Walden'
-- http://mail.python.org/mailman/listinfo/python-list