On 02.04.2018 12:11, Uwe Schindler wrote:
[...]
Yes, that script language was designed from scratch with only invokedynamic and MethodHandles in mind.
which I guess means no mop at all, or nothing beyond a missing method
It also uses a pure whitelist-approach (explicit whitelisting what you can
call, so no blacklists needed), but still allows most groovy syntax. As
Elasticsearch is very performance critical (e.g. you may call a script for
every search result which may be up to billion of times per search execution),
we tried to avoid boxing and dynamic lookups by using extensive caching in the
indy call sites.
yes, I can understand that part
On the other hand it still allows the "def" type. The compiler tries to guess
most types during compilation and only if it cannot guess the type it writes an
invokedynamic.
which means semantically your are in between dynamic Groovy and Java,
where def is more a "do this dynamic if in doubt", otherwise it is more
like Java. So I strongly assume in
class A {}
class B extends A {
void foo(String s) {}
}
B b = new B()
b.foo("a")
A a = b
a.foo("a")
you are going to call foo(String)in B in both cases while in
class A {
void foo(Object o) {}
}
class B extends A {
void foo(String s) {}
}
B b = new B()
b.foo("a")
A a = b
a.foo("a")
you are going to call B.foo(String) in the first case and A.foo(Object)
in the second case (static/dynamic Groovy: B.foo(String) in both cases),
and I assume when using def everywhere in this last example you are
going to call B.foo(String) in both cases, because you guess B in both
cases. And in case of
void bar (A a) {
a.foo("")
}
bar(new B())
bar(new A())
you will in both cases call A.foo(Object) (static Groovy the same,
dynamic Groovy both B.foo(String)), while in
void bar (def a) {
a.foo("")
}
bar(new B())
bar(new A())
you will in both cases call first B.foo(String) and then A.foo(Object).
(static Groovy will not compile this, dynamic Groovy will behave the same)
But nevertheless, by using invokedynamic with good caches (we go up to megamorphic if
needed!) the slowdown by using invokedynamic for most scripts is neglectible!!! (less
than 10% when full hotspot compilation is done). The main reason for creating a new
script language was not only security (this was just good for selling it to users at that
time), but also speed. Groovy behaves very bad when types are variying and especially in
the "def" case it uses way too much reflection (the invokedynamic part has no
good caches and still uses reflection all the time on the dynamic lookup in the call site
type handler).
I think the Reflection part is going to go away mostly. We still require
it to query the class for class members though. I assume it is the same
for you. The usage of reflection in the dynamic lookup of the call site
is for once due to the mop and due to using unreflect(Method) to produce
handles and the resulting handle not being cached. That also contributes
to bad invokedynamic performance, because creating handles and
especially MethodType objects is surprisingly expensive.
The script language also supports lambdas and method references, but in
contrast to Groovy, those are lambdas, not closures! So it uses
LambdaMetaFactory (actually it uses it's own implementation, because
LambdaMetaFactory is too strict with types and has some bugs in Java 9 that
make it behave differently than the spec; java code does not see this, but
dynamic languages break heavily with Java 9 when they use LambdaMetaFactory).
oh, any pointers on the bugs you mentioned? Since we are going to use it
in the future it would be very nice to know of these
As said before, the compiler tries to guess all types at compile time and passes the
native types to the invokedynamic, unless it can directly generate a real method call
(everything is known). For dynamic ("def") stuff it delegates everything to
DefBootstrap (https://goo.gl/DdNJJG), which has several modes:
- simple method calls: here it uses a polymorphic cache with up to 5 receiver
items (the usual guardWithTest chain using MethodHandles). Once it sees more
receivers, it reverts the whole cache and goes megamorphic (it builds a
methodhandle that calls ClassValue.get on every call).
which means you guard only on the receiver type, not the parameter
types? So in case of
class P {}
class Q extends P {}
class A {
void foo(P p){}
}
class B extends A {
void foo(Q q){}
}
void bar(A a, def p) {
a.foo(p)
}
bar(new B(), new P())
bar(new B(), new Q())
you are going to call in both cases A.foo(P)? While in
bar(new B(), new Q())
bar(new B(), new P())
you are going to call...no.. even though you have here Q first and will
naturally select B.foo(Q), you have to have to create a new variant for
P or else your call to B.foo(Q) with a P would fail. I cannot say I
looked extensively at DefBootstrap, but I could find only a CHECK_CLASS
for the receiver, and nothing for the parameters.... Where is that check
then?
This is slower, but better than resolving every time. Keep in mind: This one
has no bugs with ClassValue, as all seen types are local and can be cleaned up
by GC correctly. The problems of Groovy with ClassValue are homemade, sorry.
ClassValue works perfectly, if you do not misuse it!!! Unfortunately, Groovy
has a monomorphic cache only, which was one reason why groovy was behaving bad
in Elasticsearch!
Afaik we have no longer problems with ClassValue.
- fields, array loads/stores: same as method calls, it just translates to a
methodhandle like for simple method calls (including all caches), as mentioned
above.
- (binary-)operators: As you have mostly 2 variable types here, it's hard to
build good caches. So it goes for a monomorphic cache here. In most cases this
is not an issue, as operator constructs don't have varying types (unless both
sides are DEF). Still, once seen at runtime it stays constant for most cases.
interesting find.
- references (these are method references): Like in Java, lambdas (with/without
captures) are also compiled to static/instance methods in the script class.
yes, that is where I want to go to. Not only for lambdas, also for Closure.
Lambdas are also tried to be resolved at compile time, but you can also call a method taking a functional
interface on the "def" type! In that case the above "references" variant is used (the
DefBootstrap then delegates after type resolving to the LambdaBootstrap). At the end of an invokedynamic
call, field access, array lookup, map lookup, list lookup, lambda lookup is just a compiled method handle
including all MIC, PIC, Megamorphic caching logic (using those many methods in Java's
"MethodHandles" class for combining MHs)! Except for Bootstrapping method references, no byte code
is generated at runtime (of course java runtime does it, but we do not). We just combine methodhandles!
Yeah... there was recently a post to the mlvm list in this direction
showing where LambdaFactory is actually faster (and runtime bytecode
generation outside the MethodHandles). But I guess the penalty is not so
important for this
Most static definitions that are used in created method handles are part of the "Def"
(https://goo.gl/g7GD5t) and "DefMath" (https://goo.gl/AN4SJ1) classes. If you look at
Def.java, you also see many cases where it backports some MethodHandles stuff only available in
Java 9+, but falls back to Java 9 native code when Java 9 is used. Actually some methods missing in
Java 8's MethodHandles implementation were added to Java 9, because of our feedback (e.g. array
length lookup for implementing array[].length using invokedynamic).
yepp, thanks for that
One thing: Because of the strict types and complexity/slowness with caching,
Painless does not allow method overloading. The above whitelist uses alias
names for methods that use overloading. One famous example are the group()
methods on regex matchers (named, unnamed groups). Some users complained, but
that's under control. The whitelist also adds addon methods (like Groovy) to
some java classes in the same way like aliases.
Can you still define classes at all? I mean if you take that away as
well, then you have of course no reason to check parameter types ;)
Maybe I should then call this to be between Ruby and Java.
In my opinion that project is wrong, because the security manager
mechanisms provide enough protection. The problem is that rarely
anyone can use a security manager properly. Anyway... Groovy won't be
able to do any call Java cannot do in principle in this version. That
is not because of keeping security in mind, that is more because of
the module system, that enforces this
This is partly right, but also not always applicable in that simple form.
Elasticsearch uses SecurityManager to encapsulate all of its plugins, script
engines,... and also itsself. It prevents stuff like deep reflection
(setAccessible is completely disallowed anywhere in Elasticsearch) or
System.exit/halt().
Just to mention it. Groovy can actually handle setAccessible not being
allowed at all.
File system is also restricted to not escape config/index folders. I think
Elasticsearch's implementation of SecurityManager is one ogf those
implementation that really work correctly. It opened a lot of bug reports in
foreign projects to fix their code (e.g, missing doPrivileged or calls to
setAccessible without proper try/catch). Of course this helps to make
Elasticsearch safe for the typical security issues (code escapes sandbox). But
on the other hand it does not help, if code in a script calls some public class
in Elasticsearchs's core and executes a command to delete an index from inside
a query script.
then you ca forbid scripts to make those calls, at least in Groovy that
is possible.
Of course you could also guard all public APIs of Elasticsearch with securitymanager, but
as you know, many stuff is very performance critical, so you cannot really guard
everything. Of course in the future, all access to lower level "Apache Lucene"
or internals can be shielded by the module system, but that's not yet possible
(Elasticsearch has Java 8 as minimum requirement).
Which means you create a module at runtime in which those script classes
reside in?
Groovy has the functionality of blacklists (so you can exclude certain classes from being called by
scripts), but Elasticsearch decided to design it's script language the other way round. It is
solely "whitelist" based. In short, you cannot access anything from the
Elasticsearch/Java code that is not explicitely whitelisted (that's called "Definition"
in Painless): https://goo.gl/ehdjvh with those whitelists: https://goo.gl/adCU88
yes, that makes things more easy, can totally understand that.
[...]
BTW: There are also some goodies in this script engine: If it figures out at runtime,
that Java 9 is used, the byte code generated does not use StringBuilder when concatting
strings, instead it uses invokedynamic with StringConcatFactory. The code to do that is
quite "generic" and applied with a subclass of ASM's GeneratorAdapter:
https://github.com/elastic/elasticsearch/blob/6dadce47613a3c69d928940bcc1b2043e0a0184a/modules/lang-painless/src/main/java/org/elasticsearch/painless/MethodWriter.java#L238-L292
(if I have some time, I will extend it to use the better makeConcatWithConstants() to
improve strings with many constant parts).
maybe we "steal" that ;)
bye Jochen