Hi all, from some performance tests we learned that a bottleneck for Hibernate using applications is often identified in the amount of memory we allocate at runtime, even considering the so called "short lived" objects which usually are not a threat are actually too high.
Specifically, the highest consumer is the wild high amount of instances of _org.hibernate.engine.spi.EntityKey_. I think we have mainly two different strategies to attack this: 1# reduce the amount of EntityKey instances being allocated 2# reduce the size of each EntityKey instance While 1# seems a wise move, I'll leave that to the ORM experts to consider for future as it probably requires a broader knowledge of how all components fit together. So I'm attacking 2#, especially as I thought I could get some high win in short time :) To properly estimate the runtime size of each instance, I could simply use the various reference tables of how much each pointer and primitive takes, but there is actually some more complexity related to the order in which the fields will be organized, and the requirement of object alignment. The rules aren't too hard to figure out the cost of a small value object, but I'm actually using a tool which nicely dumps out a reasonable estimate: "java object layout". So this is the output of the current EntityKey implementation: Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode) Running 64-bit HotSpot VM. Using compressed references with 3-bit shift. Objects are 8 bytes aligned. org.hibernate.engine.spi.EntityKey offset size type description 0 12 (assumed to be the object header + first field alignment) 12 4 int EntityKey.hashCode 16 1 boolean EntityKey.isBatchLoadable 17 3 (alignment/padding gap) 20 4 Serializable EntityKey.identifier 24 4 String EntityKey.entityName 28 4 String EntityKey.rootEntityName 32 4 String EntityKey.tenantId 36 4 Type EntityKey.identifierType 40 4 SessionFactoryImplementor EntityKey.factory 44 4 (loss due to the next object alignment) 48 (object boundary, size estimate) So that's 48 bytes per instance, and matches my expectations. In our benchmark, we created about 200GB of such instances in 20 minutes, and the impact is of approximately 10% of the overall memory pressure of the application. I have an open pull request [2] which improves things, changing the code to the following layout: org.hibernate.engine.spi.EntityKey offset size type description 0 12 (assumed to be the object header + first field alignment) 12 4 int EntityKey.hashCode 16 4 Serializable EntityKey.identifier 20 4 String EntityKey.tenantId 24 4 EntityEssentials EntityKey.persister 28 4 (loss due to the next object alignment) 32 (object boundary, size estimate) So we get from 48 to 32 bytes per instance; assuming the same allocation that means saving about 33%, so some 60GB of memory bandwith saved. I think we could take it as a first step in the right direction, still we're so very close to actually improve of 50%: there are 4 bytes per instance "wasted" just because of alignment needs, if we could remove just one more field we would save 8bytes. In this example I removed the _tenantId_: org.hibernate.engine.spi.EntityKey offset size type description 0 12 (assumed to be the object header + first field alignment) 12 4 int EntityKey.hashCode 16 4 Serializable EntityKey.identifier 20 4 EntityEssentials EntityKey.persister 24 (object boundary, size estimate) So I would aim at implementing this, or alternatively if we really can't remove the tenantId we could remove the (cached) hashCode: org.hibernate.engine.spi.EntityKey offset size type description 0 12 (assumed to be the object header + first field alignment) 12 4 Serializable EntityKey.identifier 16 4 String EntityKey.tenantId 20 4 EntityEssentials EntityKey.persister 24 (object boundary, size estimate) But that hashCode calculation is really hot so I would love to rather remove the _tenantId_. How can we remove the tenantId ? I was thinking to have the tenandId included in the persister, so we'd have a new interface: public interface TenantLocalEntityPersister extends EntityPersister { String getTenantId(); } Implementors would have two fields: EntityPersister ep; //shared across all tenants String tenantId; Then one single instance of such a TenantLocalEntityPersister would be shared across all EntityKey instances associated to it, but also by many other objects which need to shrink: for example org.hibernate.engine.spi.EntityEntry needs to access the same concept, and is the second highest consumer of memory (so next in line for a similar refactoring to be done). Would that be doable to refactor EntityPersister to have this concept of per-tenant instance? As a nice side-effect, I suspect it would also bring the cost of managing multi-tenancy to zero for those applications which aren't doing any multi-tenancy, while now there seems to be a cost paid by all users. Sanne [1] https://github.com/jbaruch/java-object-layout [2] https://github.com/hibernate/hibernate-orm/pull/633 _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev