On Tue, Jul 5, 2011 at 4:32 PM, Alessio Stalla <alessiosta...@gmail.com> wrote: > On 5 Lug, 18:49, Ken Wesson <kwess...@gmail.com> wrote: >> 1. A too-large string literal should have a specific error message, >> rather than generate a misleading one suggesting a different type of >> problem. > > There is no such thing as a too-large string literal in a class file.
That's not what Patrick just said. >> 2. The limit should not be different from that on String objects in >> general, namely 2147483647 characters which nobody is likely to hit >> unless they mistakenly call read-string on that 1080p Avatar blu-ray >> rip .mkv they aren't legally supposed to possess. > > That's a limitation imposed by the Java class file format. And therefore a bug in the Java class file format, which should allow any size String that the runtime allows. Using 2 bytes instead of 4 bytes for the length field, as you claim they did, seems to be the specific error. One would have thought that Java of all languages would have learned from the Y2K debacle and near-miss with cyber-armageddon, but limiting a field to 2 of something instead of 4 out of a misguided perception that space was at a premium was exactly what caused that, too! >> 3. Though both of the above bugs are in Oracle's Java implementation, > > By the above, 1. is a Clojure bug and 2. is not a bug at all. Oh, 2 is a bug alright. By your definition, Y2K bugs in a piece of software would also not be bugs. The users of such software would beg to differ. >> it would seem to be a bug in Clojure's compiler if it is trying to >> make the entire source code of a namespace into a string *literal* in >> dynamically-generated bytecode somewhere rather than a string >> *object*. > > Actually it seems it's the IDE, rather than Clojure, that is > evaluating a form containing such a big literal. Since Clojure has no > interpreter, it needs to compile that form. The same problem has been reported from multiple IDEs, so it seems to be a problem with eval and/or load-file. The question is not why they might be using String *objects* that exceed 64K, since they'll need to use Strings as large as the file gets*. It's why they'd *generate bytecode* containing String *literals* that large. And it's not IDEs that generate bytecode it's clojure.lang.Compiler.java that generates bytecode in this scenario. * There is a way to reduce the size requirements; crudely, line-seq could be used to implement a lazy seq of top-level forms built by consuming lines until delimiters are balanced and them emitting a new form string, then evaluating these forms one by one. This works with typical source files that have short individual top-level forms and have at least 1 line break between any two such and would allow consuming multi-gig source files if anyone ever had need for such a thing (I'd hope never to see it unless it was machine-generated for some purpose). Less crudely, a reader for files could be implemented that didn't just slurp the file and call read-string on it but instead read from an IO stream and emitted a seq of top-level forms converted already into reader-output data structures (but unevaluated). In fact, read-string could then be implemented in terms of this and a StringInputStream whose implementation is left as an exercise for the reader but which ought to be nearly trivial. >> Sensible alternatives are a) get the string to whatever >> consumes it by some other means than embedding it as a single >> monolithic constant in bytecode, > > This is what we currently do in ABCL (by storing literal objects in a > thread-local variable and retrieving them later when the compiled code > is loaded), but it only works for the runtime compiler, not the file > compiler (in Clojure terms, it won't work with AOT compilation). Yes, this is the same issue raised in connection with allowing arbitrary objects in code in eval. >> b) convert long strings into shorter >> chunks and emit a static initializer into the bytecode to reassemble >> them with concatenation into a single runtime-computed string constant >> stored in another static field, > > This is what I'd like to have :) Frankly it seems like a bit of a hack to me, though since it would be used to work around a Y2K-style bug in Java it might be poetic justice of a sort. >> and c) restructure whatever consumes >> the string to consume a seq, java.util.List, or whatever of strings >> instead and feed it digestible chunks (e.g. a separate string for each >> defn or other top-level form, in order of appearance in the input file >> -- surely nobody has *individual defns* exceeding 64KB). > > The problem is not in the consumer, but in the form containing the > string; to do what you're proposing, the reader, upon encountering a > big enough string, would have to produce a seq/List/whatever instead, > the compiler would need to be able to dump such an object to a class, > and all Clojure code handling strings would have to be prepared to > handle such an object, too. I think it's a little impractical. I don't think so. The problem isn't with normal strings but only with strings that get embedded as literals in code; and moreover, the problem isn't even with those strings exceeding 64k but with whole .clj files exceeding 64k. The implication is that load-file generates a class that contains the entire contents of the sourcefile as a string constant for some reason; so: a) What does this class do with this string constant? What code consumes it? b) Can that particular bit of code be rewritten to digest the same information provided in smaller chunks? > Regarding the size of individual defns, that's an orthogonal problem; > anyway, the size of the _bytecode_ for methods is limited to 64KB (see > <http://java.sun.com/docs/books/jvms/second_edition/html/ > ClassFile.doc.html#88659>) and, while pretty big, it's not impossible > to reach it, especially when using complex macros to produce a lot of > generated code. Another problem for which we will probably need an eventual fix or workaround. If bytecode can contain a JMP-like instruction it should be possible to have the compiler split long generated methods and chain the pieces together without much loss of runtime efficiency, particularly if it does so at "natural" places -- existing conditional branches, particularly, and (loop ...) borders -- (defn foo (if x (lotta-code-1) (lotta-code-2))) for example can be trivially converted to (defn foo (if x (lotta-code-1) (jmp bar))) (defn bar (lotta-code-2)) -- though if you had such a jump instruction I'd have thought implementing real TCO would have been fairly easy, and apparently it was not. Failing such a jmp capability you'd have to just use (bar) in that last example and suffer an additional method call overhead at the break-point. Again, the obvious way to do it would be to recognize common branching construct forms such as (if ...) and (cond ...) that are larger than the threshold but have individual branches that are not and turn some or all of the branches into their own under-the-hood methods and calls to those methods. > We used to generate such big methods in ABCL because > at one point we tried to spell out in the bytecode all the class names > corresponding to functions in a compiled file, in order to avoid > reflection when loading the compiled functions. For files with many > functions (> 1000 iirc) the generated code became too big. It turned > out that this optimization had a negligible impact on performance, so > we reverted it. I wonder if Clojure is using a similar optimization and would benefit from its reversion. -- Protege: What is this seething mass of parentheses?! Master: Your father's Lisp REPL. This is the language of a true hacker. Not as clumsy or random as C++; a language for a more civilized age. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en