On Wed, Jul 6, 2011 at 3:28 PM, Alessio Stalla <alessiosta...@gmail.com> wrote:
> On 6 Lug, 09:07, Ken Wesson <kwess...@gmail.com> wrote:
>> On Tue, Jul 5, 2011 at 4:32 PM, Alessio Stalla <alessiosta...@gmail.com> 
>> wrote:
>> > On 5 Lug, 18:49, Ken Wesson <kwess...@gmail.com> wrote:
>> >> 1. A too-large string literal should have a specific error message,
>> >> rather than generate a misleading one suggesting a different type of
>> >> problem.
>>
>> > There is no such thing as a too-large string literal in a class file.
>>
>> That's not what Patrick just said.
>
> Not really;
>> > That's a limitation imposed by the Java class file format.
>>
>> And therefore a bug in the Java class file format, which should allow
>> any size String that the runtime allows. Using 2 bytes instead of 4
>> bytes for the length field, as you claim they did, seems to be the
>> specific error. One would have thought that Java of all languages
>> would have learned from the Y2K debacle and near-miss with
>> cyber-armageddon, but limiting a field to 2 of something instead of 4
>> out of a misguided perception that space was at a premium was exactly
>> what caused that, too!
>
> A bug is a discrepancy between specification and implementation.

Your extremely narrow definition precludes the notion that a
specification can itself be in error, and therefore excludes, among
others, the famous category of Y2K bugs from meeting your definition
of "bug". It also precludes *anything* being considered a bug in any
software that lacks a formal specification distinct from its
implementation (i.e., almost ALL software!), or in the reference
implementation of any software whose formal specification is a
reference implementation rather than a design document of some sort.

> Now, you might argue that the spec is badly designed, and I might
> agree, but it's not a "bug in Oracle's Java implementation" - any
> conforming Java implementation must have that (mis)feature.

Does the JLS specify this limitation, or does it just say a string
literal is a " followed by any number of non-" characters and
\-escaped "s followed by a " or words to that effect? Because if the
latter, a conforming implementation of the Java language (as in not
directly contradicting the JLS anywhere) could permit longer string
literals, even if Oracle's does not.

>> Oh, 2 is a bug alright. By your definition, Y2K bugs in a piece of
>> software would also not be bugs. The users of such software would beg
>> to differ.
>
> User perception and bugs are different things.

Not past a certain point they aren't. One user disagreeing with the
developers can be written off as wrong. A large percentage of users
(that know about an issue) disagreeing with the developers means it's
the developers that are wrong, unless you take the extreme step of
rejecting the premise that the developers' ultimate job is serving the
needs of the software's user base. And when the developers are wrong
about something and that wrongness is expressed in the code, that
constitutes a bug, though it may not be a deviation from the
"specification" (assuming in that instance there even is a formal
specification distinct from the implementation).

>> Frankly it seems like a bit of a hack to me, though since it would be
>> used to work around a Y2K-style bug in Java it might be poetic justice
>> of a sort.
>
> It is a sort of hack, yes. An alternative might be to store constants
> in a resource external to the class file, unencumbered with silly size
> limits.

Given the deployment architecture around Java, that doesn't even add
any deployment complexity and, depending on how it is implemented, may
even make i18n easier to accomplish in some cases by freeing the
developers from having to set up a bunch of ResourceBundle
infrastructure in each codebase that needs i18n. +1.

>> I don't think so. The problem isn't with normal strings but only with
>> strings that get embedded as literals in code; and moreover, the
>> problem isn't even with those strings exceeding 64k but with whole
>> .clj files exceeding 64k. The implication is that load-file generates
>> a class that contains the entire contents of the sourcefile as a
>> string constant for some reason; so:
>>
>> a) What does this class do with this string constant? What code consumes it?
>
> Hmm, I don't think it's like you say.

Patrick said:

    Does the file you are evaluating have more than 65535
    characters? As far as I can tell, that is the maximum
    length of a String literal in Java

This clearly implies that the problem is the .clj file exceeding the
maximum length of a string literal, and thus implies more indirectly
that the entire source file is being embedded in some class file as a
string constant. Now you seem to be denying that this is occurring. If
so, your disagreement here is with Patrick, not me.

> Without knowing anything about
> Clojure's internals, it seems to me that the problem is more likely to
> be in a form like the one Patrick posted, (clojure.lang.Compiler/load
> (java.io.StringReader. "the-whole-file-as-a-string")), which is
> compiled in order to be evaluated in order to compile and load the
> file... it is that form, and not the file to be compiled, that
> generates the incorrect class file.

That's not disagreeing, that's agreeing. You're saying that as an
intermediate stage it's compiling a class with the source file as a
literal in order to compile the file, and suggesting a particular
answer to question a, which is that the class just invokes
Complier.load() on the string constant.

>> b) Can that particular bit of code be rewritten to digest the same
>> information provided in smaller chunks?

And the answer to this then becomes a resounding Yes, by not having
load-file or any of its cousins, or any IDE load file functions, cram
the whole thing into a string and build a form like your example above
to eval and instead just point Complier.load() at a reader open on the
source file on disk, if unchanged, or if the focused editor is dirty,
possibly instead pass it a StringReader or similar open on the
editor's buffer in memory. (Compiler.load(new
StringReader(editorJTextArea.getText())); or whatever.)

> If Patrick is right, and I think he is, then the compiler has to
> compile (java.io.StringReader. "the-whole-file-as-a-string") in a way
> that "the-whole-file-as-a-string" does not appear literally in the
> class file. It has either to somehow split the string, or load it from
> somewhere else.

Or, fixing the current crop of problems like this, such forms
shouldn't be generated as intermediate steps in loading by the
editor/IDE/load-file/whatever to begin with. See above.

On the other hand, it suggests that (foo "a very long ... string")
would still break, which is not desirable, so ultimately a mechanism
for breaking up large string constants under the hood in Compiler
seems indicated. We just may not need it to solve this specific
instance.

> In fact, no JMP-like instruction exists that can jump to a different
> method.

Seems like a troubling oversight. As I said it would come in very
handy for enabling TCO. It could even be implemented securely -- the
bytecode verifier could require the target to be the start of a method
visible to the method containing the JMP instruction, for instance,
i.e. the same requirements it currently places on the target of
invokevirtual. It would just be non-stack-consuming and otherwise
equivalent to "return otherMethod(args);" (after some suitable storage
of the arguments, where there are any). If return otherMethod(args);
is secure this limited, tail-optimization-enabling JMP could therefore
be secured.

>> Failing such a jmp capability you'd have to just use (bar) in that
>> last example and suffer an additional method call overhead at the
>> break-point. Again, the obvious way to do it would be to recognize
>> common branching construct forms such as (if ...) and (cond ...) that
>> are larger than the threshold but have individual branches that are
>> not and turn some or all of the branches into their own under-the-hood
>> methods and calls to those methods.
>
> The positive thing is that the method call overhead disappears thanks
> to Hotspot for frequently called methods. The negative thing is that
> splicing bytecode is harder than it seems because of jumps and
> exception handlers that might be present.

Hence my suggestion that the compiler work at a higher level. The
post-macroexpansion sexp seems ideal, since it need only look for the
special forms (if ...), (let* ...), and (loop* ...) to find obvious
division points and the nature of functional code is such that very
long functions usually contain these. So it can follow an algorithm
like

1. Try compiling the whole function into a single .invoke method (the
current behavior). If that succeeds, we're done.

2. Catch exception and try to break function at an obvious boundaries
near the midpoint, particularly on either side of a let* or loop*
nearly corresponding to the middle third or at the start of the "else"
of an if where that "else" is about a third to half the code.

3. If all else fails, just split at arbitrary points in e.g. a long
chain of expressions.

4. Generate a .invoke method and some auxiliary private methods that
are called by it and/or each other.

There's some complication; the sub-methods will need to receive
arguments corresponding to the locals in existence at the start of the
split-off piece of code that are used within that piece, for example.
But it doesn't strike me as infeasible.

-- 
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to