Hi Aman, We have tried defining Proxy as hidden classes; a previous attempt was on hold because of issues with serialization. Otherwise, Proxies work great as hidden classes.
Chen On Mon, May 20, 2024 at 7:56 AM Aman Sharma <aman...@kth.se> wrote: > Hi David, > > > > I would not expect any class load > events. > > > I understand. I also haven't tried to intercept them but I see only one > approach right now to include them in an allowlist - 1) statically look for > invocations of "Lookup::defineHiddenClass". 2) Instrument them so that > its first argument "bytes" can be looked into upon. I haven't looked into > it much because I did not have much idea about it. And they are hidden so > it made it worse. 😅 Thanks for sharing the JEP! > > > > > java.lang.reflect.Proxy could define hidden classes to act as the proxy > classes which implement proxy interfaces; from JEP 317 > > > It says that Proxy classes will also become hidden classes. Is it > underway? Right now one can intercept, transform them, and include them in > an allowlist. What do you think of naming them independent of AtomicLong so > that a proxy class generated at runtime is easy to lookup in the allowlist? > > > > Regards, > Aman Sharma > > PhD Student > KTH Royal Institute of Technology > School of Electrical Engineering and Computer Science (EECS) > Department of Theoretical Computer Science (TCS) > <http://www.kth.se> <https://www.kth.se/profile/amansha> > <https://www.kth.se/profile/amansha> > <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ > ------------------------------ > *From:* David Holmes <david.hol...@oracle.com> > *Sent:* Monday, May 20, 2024 2:30:37 PM > *To:* Aman Sharma; liangchenb...@gmail.com > *Cc:* core-libs-dev@openjdk.org; leyden-...@openjdk.org > *Subject:* Re: Deterministic naming of subclasses of > `java/lang/reflect/Proxy` > > On 20/05/2024 10:12 pm, Aman Sharma wrote: > > Hi David, > > > > > > > How did you try to intercept them? Hidden classes are not "loaded" in > > the normal sense so won't trigger class load events. > > > > > > I could not intercept them. I only see them when I pass `-verbose:class` > > in the Java CLI. > > Yes that is why I asked how you tried to intercept them. > > > > > I also couldn't intercept them using JVMTI Class File Load Hook > > < > https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassFileLoadHook> > event. However JEP 371 suggests that it should be possible to intercept > them using JVMTI Class Load < > https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#ClassLoad> > event, but I won't have the bytecode at this stage. So is there no way to > get its bytecode before it is linked and initialized in the JVM? > > Hidden classes are not loaded so I would not expect any class load > events. However the exact nature of the JVMTI class load event is > unclear as it talks about "class or interface creation" which is neither > loading or defining per se. But a class prepare event sounds like it > should be issued. However neither give you access to the bytecode of the > class AFAICS. > > David > ----- > > > > > > Regards, > > Aman Sharma > > > > PhD Student > > KTH Royal Institute of Technology > > School of Electrical Engineering and Computer Science (EECS) > > Department of Theoretical Computer Science (TCS) > > < > http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha > > > > <https://www.kth.se/profile/amansha>https://algomaster99.github.io/ > > <https://algomaster99.github.io/> > > ------------------------------------------------------------------------ > > *From:* David Holmes <david.hol...@oracle.com> > > *Sent:* Monday, May 20, 2024 2:59:17 AM > > *To:* Aman Sharma; liangchenb...@gmail.com > > *Cc:* core-libs-dev@openjdk.org; leyden-...@openjdk.org > > *Subject:* Re: Deterministic naming of subclasses of > > `java/lang/reflect/Proxy` > > On 17/05/2024 9:43 pm, Aman Sharma wrote: > >> Hi Chen, > >> > >> > java.lang.invoke.LambdaForm$MH/0x00000200cc000400 > >> > >> I do see this as output when I pass -verbose:class. However, based on > my > >> experiments, I have seen that neither an agent passed via 'javaagent' > >> nor an agent passed via 'agentpath' is able to intercept this hidden > class. > > > > How did you try to intercept them? Hidden classes are not "loaded" in > > the normal sense so won't trigger class load events. > > > >> Also, I was a bit confused since I saw somewhere that the names of > >> hidden classes are null. But thanks for clarifying here. > > > > The JEP clearly defines the name format for hidden classes - though the > > final component is VM specific (and typically a hashcode). > > > > https://openjdk.org/jeps/371 <https://openjdk.org/jeps/371> > > > > Cheers, > > David > > ----- > > > >> > avoid dynamic class loading > >> > >> I don't see dynamic class loading as a problem. I only mind some > >> unstable generation aspects of them which make it hard to verify them > >> based on an allowlist. > >> > >> For example, if this hidden class is generated with the exact same name > >> and the exact same bytecode during runtime as well, it would be easy to > >> verify it. However, I do see the names are based on some sort of memory > >> address so and I don't know what bytecode it has so I don't have > >> suggestions to make them stable as of now. For Proxy classes, I feel it > >> can be addressed unless you disagree or some involved in Project Leyden > >> does. :) Thank you for forwarding my mail there. > >> > >> Regards, > >> Aman Sharma > >> > >> PhD Student > >> KTH Royal Institute of Technology > >> https://algomaster99.github.io/ <https://algomaster99.github.io/> > > <https://algomaster99.github.io/ <https://algomaster99.github.io/>> > >> > >> ------------------------------------------------------------------------ > >> *From:* liangchenb...@gmail.com <liangchenb...@gmail.com> > >> *Sent:* Friday, May 17, 2024 1:23:58 pm > >> *To:* Aman Sharma <aman...@kth.se> > >> *Cc:* core-libs-dev@openjdk.org <core-libs-dev@openjdk.org>; > >> leyden-...@openjdk.org <leyden-...@openjdk.org> > >> *Subject:* Re: Deterministic naming of subclasses of > >> `java/lang/reflect/Proxy` > >> > >> Hi Aman, > >> For `-verbose:class`, it's a JVM argument instead of a program > argument; > >> so when you run a java program like `java Main`, you should call it as > >> `java -verbose:class Main`. > >> When done correctly, you should see hidden class outputs like: > >> [0.032s][info][class,load] > >> java.lang.invoke.LambdaForm$MH/0x00000200cc000400 source: > >> __JVM_LookupDefineClass__ > >> The loading of java.lang.invoke hidden classes requires your program to > >> use MethodHandle features, like a lambda. > >> > >> I think the problem you are exploring, that to avoid dynamic class > >> loading and effectively turn Java Platform closed for security, is also > >> being accomplished by project Leyden (as I've shared initially); Thus, > I > >> am forwarding this to leyden-dev instead, so you can see what approach > >> Leyden uses to accomplish the same goal as yours. > >> > >> Regards, Chen Liang > >> > >> On Fri, May 17, 2024 at 4:40 AM Aman Sharma <aman...@kth.se > >> <mailto:aman...@kth.se <mailto:aman...@kth.se <aman...@kth.se>>>> > wrote: > >> > >> __ > >> > >> Hi Roger, > >> > >> > >> Do you have ideas on how to intercept them? My javaagent is not able > >> to nor a JVMTI agent passed using `agentpath` option. It also does > >> not seem to show up in logs when I pass `-verbose:class`. > >> > >> > >> Also, what do you think of renaming the proxy classes as suggested > >> below? > >> > >> > >> Regards, > >> Aman Sharma > >> > >> PhD Student > >> KTH Royal Institute of Technology > >> School of Electrical Engineering and Computer Science (EECS) > >> Department of Theoretical Computer Science (TCS) > >> <http://www.kth.se><https://www.kth.se/profile/amansha>< > https://www.kth.se/profile/amansha < > http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha > >> > >> <https://www.kth.se/profile/amansha > > <https://www.kth.se/profile/amansha>>https://algomaster99.github.io/ > >> <https://algomaster99.github.io/ <https://algomaster99.github.io/>> > >> > ------------------------------------------------------------------------ > >> *From:* core-libs-dev <core-libs-dev-r...@openjdk.org > >> <mailto:core-libs-dev-r...@openjdk.org > > <mailto:core-libs-dev-r...@openjdk.org <core-libs-dev-r...@openjdk.org>>>> > on behalf of Roger Riggs > >> <roger.ri...@oracle.com <mailto:roger.ri...@oracle.com < > mailto:roger.ri...@oracle.com <roger.ri...@oracle.com>>>> > >> *Sent:* Friday, May 17, 2024 4:57:46 AM > >> *To:* core-libs-dev@openjdk.org <mailto:core-libs-dev@openjdk.org < > mailto:core-libs-dev@openjdk.org <core-libs-dev@openjdk.org>>> > >> *Subject:* Re: Deterministic naming of subclasses of > >> `java/lang/reflect/Proxy` > >> Hi Aman, > >> > >> You may also run into hidden classes (JEP 371: Hidden Classes) that > >> allow classes to be defined, at runtime, without names. > >> It has been proposed to use them for generated proxies but that > >> hasn't been implemented yet. > >> There are benefits to having nameless classes, because they can't be > >> referenced by name, only as a capability, they can be better > >> encapsulated. > >> > >> fyi, Roger Riggs > >> > >> > >> On 5/16/24 8:11 AM, Aman Sharma wrote: > >>> > >>> Hi, > >>> > >>> > >>> Thanks for your response, Liang! > >>> > >>> > >>> > I think you meant CVE-2021-42392 instead of 2022. > >>> > >>> > >>> Sorry of the error. I indeed meant CVE-2021-42392 > >>> <https://nvd.nist.gov/vuln/detail/cve-2021-42392 > > <https://nvd.nist.gov/vuln/detail/cve-2021-42392>>. > >>> > >>> > >>> > Leyden mainly avoids this unstable generation by performing a > >>> training run to collect classes loaded > >>> > >>> > >>> Would love to know the details of Project Leyden and how they > >>> worked so far to focus on this goal. In our case, the training run > >>> is the test suite. > >>> > >>> > >>> > GeneratedConstructorAccessor is already retired by JEP 416 [2] > >>> in Java 18 > >>> > >>> > >>> I did see them not appearing in my allowlist when I ran my study > >>> subject (Apache PDFBox) with Java 21. Thanks for letting me know > >>> about this JEP. I see they are re-implemented with method handles. > >>> > >>> > >>> > How are you checking the classes? > >>> > >>> > >>> To detect runtime generated code, we have javaagent that is hooked > >>> statically to the test suite execution. It gives us all classes > >>> that that is loaded post the JVM and the javaagent are loaded. So > >>> we only check the classes loaded for the purpose of running the > >>> application. This is also why we did not choose -agentlib as it > >>> would give classes for the setting up JVM and javaagent and we the > >>> user of our tool must the classes they load. > >>> > >>> > >>> Next, we have a `ClassFileTransformer` hook in the agent where we > >>> produce the checksum using the bytecode. And we compare the > >>> checksum with the one existing in the allowlist. The checksum > >>> computation algorithm is same for both steps. Let me describe how > >>> I compute the checksum. > >>> > >>> > >>> 1. I get the CONSTANT_Class_info > >>> < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1 > < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.1>> > entry corresponding to `this_class` and rewrite the CONSTANT_Utf8_info < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7 > < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.7>> > corresponding to a fix String constant, say "foo". > >>> 2. Since, the name of the class is used to refer to its types > >>> members (fields/method), I get all CONSTANT_Fieldref_info > >>> < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2 > < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.2>> > and if its `class_index` corresponds to the old `this_class`, we rewrite > the UTF8 value of class_index to the same constant "foo". > >>> 3. Next, since the naming of the fields, in Proxy classes, are > >>> also suffixed by numbers, for example, `private static Method > >>> m4`, we rewrite the UTF8 value of name in the > >>> CONSTANT_NameAndType_info > >>> < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6 > < > https://docs.oracle.com/javase/specs/jvms/se11/html/jvms-4.html#jvms-4.4.6 > >>. > >>> 4. These fields can also have a random order so we simply sort > >>> the entire byte code using `Arrays.sort(byte[])` to eliminate > >>> any differences due to ordering of fields/methods. > >>> 5. Simply sorting the byte array still had minute differences. I > >>> could not understand why they existed even though values in > >>> constant pool of the bytecode in allowlist and at runtime were > >>> exactly the same after rewriting. The differences existed in > >>> the bytes of the Code attribute of methods. I concluded that > >>> the bytes stored some position information. To avoid this, I > >>> created a subarray where I considered the bytes corresponding > >>> to `CONSTANT_Utf8_info.bytes` only. Computing a checksum for > >>> it resulted in the same checksums for both classfiles. > >>> > >>> > >>> Let's understand the whole approach with an example of Proxy class. > >>> > >>> ` > >>> public final class $Proxy42 extends Proxy implements > org.apache.logging.log4j.core.config.plugins.Plugin { > >>> ` > >>> > >>> The will go in the allowlist as "Proxy_Plugin: <SHA256 checksum>". > >>> > >>> When the same class is intercepted at runtime, say "$Proxy10", we > >>> look for "Proxy_Plugin" in the allowlist and since the checksum > >>> algorithm is same in both cases, we get a match and let the class > >>> load. > >>> > >>> This approach has seemed to work well for Proxy classes, Generated > >>> Constructor Accessor (which is removed as you said). I also looked > >>> at the species generated by method handles. I did not notice any > >>> modification in them. Their name generation seemed okay to me. If > >>> some new Species are generated, it is of course detected since it > >>> is not in the allowlist. > >>> > >>> I have not looked into LambdaMetafactory because I did not > >>> encounter it as a problem so far, but I am aware its name > >>> generation is also unstable. I have run my approach only a few > >>> projects only. And for hidden classes, I assume the the agent > >>> won't be able to intercept them so detecting them would be really > >>> hard. > >>> > >>> > >>> Regards, > >>> Aman Sharma > >>> > >>> PhD Student > >>> KTH Royal Institute of Technology > >>> School of Electrical Engineering and Computer Science (EECS) > >>> Department of Theoretical Computer Science (TCS) > >>> <https://www.kth.se/profile/amansha > > <https://www.kth.se/profile/amansha>>https://algomaster99.github.io/ > > <https://algomaster99.github.io/ <https://algomaster99.github.io/>> > >>> > ------------------------------------------------------------------------ > >>> *From:* liangchenb...@gmail.com <mailto:liangchenb...@gmail.com < > mailto:liangchenb...@gmail.com <liangchenb...@gmail.com>>> > >>> <liangchenb...@gmail.com> <mailto:liangchenb...@gmail.com < > mailto:liangchenb...@gmail.com <liangchenb...@gmail.com>>> > >>> *Sent:* Thursday, May 16, 2024 5:52:03 AM > >>> *To:* Aman Sharma; core-libs-dev > >>> *Cc:* Martin Monperrus > >>> *Subject:* Re: Deterministic naming of subclasses of > >>> `java/lang/reflect/Proxy` > >>> Hi Aman, > >>> I think you meant CVE-2021-42392 instead of 2022. > >>> > >>> For your approach of an "allowlist" for Java runtime, project > >>> Leyden is looking to generate a static image [1], that > >>> > At run time it cannot load classes from outside the image, nor > >>> can it create classes dynamically. > >>> Leyden mainly avoids this unstable generation by performing a > >>> training run to collect classes loaded and even object graphs; I > >>> am not familiar with the details unfortunately. > >>> > >>> Otherwise, the Proxy discussion belongs better to core-libs-dev, > >>> as java.lang.reflect.Proxy is part of Java's core libraries. I am > >>> replying this thread to core-libs-dev. > >>> > >>> For your perceived problem that classes don't have unique names, > >>> your description sounds dubious: GeneratedConstructorAccessor is > >>> already retired by JEP 416 [2] in Java 18, and there are many > >>> other cases in which JDK generates classes without stable names, > >>> notoriously LambdaMetafactory (Gradle wished for cacheable > >>> Lambdas); the same applies for the generated classes for > >>> MethodHandle's LambdaForms (which carries implementation code for > >>> LambdaForm). How are you checking the classes? It seems you are > >>> not checking hidden classes. Proxy and Lambda classes are defined > >>> by the caller's class loader, while LambdaForms are under JDK's > >>> system class loader I think. We need to ensure you are correctly > >>> finding all unstable classes before we can proceed. > >>> > >>> [1]: https://openjdk.org/projects/leyden/notes/01-beginnings > > <https://openjdk.org/projects/leyden/notes/01-beginnings> > >>> <https://openjdk.org/projects/leyden/notes/01-beginnings > > <https://openjdk.org/projects/leyden/notes/01-beginnings>> > >>> [2]: https://openjdk.org/jeps/416 <https://openjdk.org/jeps/416> > > <https://openjdk.org/jeps/416 <https://openjdk.org/jeps/416>> > >>> > >>> On Wed, May 15, 2024 at 7:00 PM Aman Sharma <aman...@kth.se > >>> <mailto:aman...@kth.se <mailto:aman...@kth.se <aman...@kth.se>>>> > wrote: > >>> > >>> Hi, > >>> > >>> > >>> My name is Aman and I am a PhD student at KTH Royal Institute > >>> of Technology, Stockholm, Sweden. I research as part of CHAINS > >>> <https://chains.proj.kth.se/ <https://chains.proj.kth.se/>> > project to > > strengthen the > >>> software supply chain of multiple ecosystem. I particularly > >>> focus on runtime integrity in Java. In this email, I want to > >>> write about an issue I have discovered with /dynamic > >>> generation of `java.lang.reflect.Proxy`classes/. I will > >>> propose a solution and would love to hear the feedback from > >>> the community. Let me know if this is the correct mailing-list > >>> for such discussions. It seemed the most relevant from this > >>> list <https://mail.openjdk.org/mailman/listinfo > > <https://mail.openjdk.org/mailman/listinfo>>. > >>> > >>> > >>> *My research* > >>> > >>> * > >>> * > >>> > >>> Java has features to load class on the fly - it can either > >>> download or generate a class at runtime. These features are > >>> useful for inner workings of JDK. For example, implementing > >>> annotations, reflective access, etc. However, these features > >>> have also contributed to critical vulnerabilities in the past > >>> - CVE-2021-44228 (log4shell), CVE-2022-33980, CVE-2022-42392. > >>> All of these vulnerabilities have one thing in common - /a > >>> class that was not known during build time was > >>> downloaded/generated at runtime and loaded into JVM./ > >>> > >>> > >>> To defend against such vulnerabilities, we propose a solution > >>> to /allowlist classes for runtime/. This allowlist will > >>> contain an exhaustive list of classes that can be loaded by > >>> the JVM and it will be enforced at runtime. We build this > >>> allowlist from three sources: > >>> > >>> 1. All classes of all modules provided by the Java Standard > >>> Library. We use ClassGraph > >>> <https://github.com/classgraph/classgraph > > <https://github.com/classgraph/classgraph>> to scan the JDK. > >>> 2. We can take the source code and all dependencies of an > >>> application. We use a software bill of materials to get > >>> all the data. > >>> 3. Finally, we use run the test suite to include any runtime > >>> downloaded/generated classes. > >>> > >>> Such a list is able to prevent the above 3 CVEs because it > >>> does not let the "unknown" bytecode to be loaded. > >>> > >>> *Problem with generating such an allowlist* > >>> * > >>> * > >>> The first two parts of the allowlist are easy to get. The > >>> problem is with the third step where we want to allowlist all > >>> the classes that could be downloaded or generated. Upon > >>> running the test suite and hooking to the classes it loads, we > >>> observer that the list consists of classes that are called > >>> "com/sun/proxy/$Proxy2", > >>> "jdk/internal/reflect/GeneratedConstructorAccessor3" among > >>> many more. The purpose of these classes can be identifed. The > >>> proxy class is created for to implement an annotation. The > >>> accessor gives access to constructor of a class to the JVM. > >>> > >>> When enforcing this allowlist at runtime, we see that the > >>> bytecode content for "com/sun/proxy/$Proxy2" differs in the > >>> allowlist and at runtime. In our case, we we are experimenting > >>> with pdfbox <https://github.com/apache/pdfbox < > https://github.com/apache/pdfbox>> so > > we created > >>> the allowlist using its test suite. Then we enforced this > >>> allowlist while running some of its subcommands. However, > >>> there was some other proxy class say "com/sun/proxy/$Proxy5" > >>> at runtime that implemented the same interfaces and had the > >>> same methods as "com/sun/proxy/$Proxy2" in the allowlist. They > >>> only differed in the name of the class, order of fields, and > >>> types for fields references. This could happen because the > >>> order of the loading of class is workload dependent, but it > >>> causes problem to generate such an allowlist. > >>> > >>> *Solution > >>> * > >>> > >>> > >>> We propose that naming of subclasses of > >>> "java/lang/reflect/Proxy" should not be dependent upon the > >>> order of loading. In order to do so, two issues can be fixed: > >>> > >>> 1. The naming of the class should not be based on AtomicLong > >>> < > https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531 > < > https://github.com/openjdk/jdk/blob/b687aa550837830b38f0f0faa69c353b1e85219c/src/java.base/share/classes/java/lang/reflect/Proxy.java#L531>>. > Rather it could be named based on the interfaces it implements. I also > wonder why AtomicLong is chosen in the first place. > >>> 2. Methods of the interfaces must be in a particular order. > >>> Right now, they are not sorted in any particular order > >>> < > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178 > < > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Class.java#L2178 > >>. > >>> > >>> > >>> These fixes will make proxy class generation deterministic > >>> with respect to order of loading and won't be flagged at > >>> runtime since the test suite would already detect them. > >>> > >>> I would love to hear from the community about these ideas. If > >>> in agreement, I would be happy to produce a patch. I have > >>> discovered this issue with subclasses of > >>> GeneratedConstructorAccessor > >>> < > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java > < > https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/reflect/ConstructorAccessor.java>> > as well and I imagine it will also apply to some other runtime generated > classes. If you disagree, please let me know also. It helps with my > research. > >>> > >>> I also have PoCs for the above CVEs > >>> <https://github.com/chains-project/exploits-for-sbom.exe > > <https://github.com/chains-project/exploits-for-sbom.exe>> and > >>> a proof concept tool is being developed under the name > >>> sbom.exe <https://github.com/chains-project/sbom.exe > > <https://github.com/chains-project/sbom.exe>> in case > >>> any one wonders about the implementation. I would also be > >>> happy to explain more. > >>> > >>> Regards, > >>> Aman Sharma > >>> > >>> PhD Student > >>> KTH Royal Institute of Technology > >>> School of Electrical Engineering and Computer Science (EECS) > >>> Department of Theoretical Computer Science (TCS) > >>> <https://www.kth.se/profile/amansha > > <https://www.kth.se/profile/amansha>>https://algomaster99.github.io/ > > <https://algomaster99.github.io/ <https://algomaster99.github.io/>> > >>> > >> > >> >