#### background #### Search used to be split in two main components:
engine ---> indexing backend - there was a contract (public API) between the two so that indexing backends could be replaced - cluster configuration needed the "-->" invocation to be replaced by an RPC, therefore forcing the parameters required by this public API to be Serializable. These parameters are essentially a List<LuceneWork>, where a LuceneWork instance contains some simple primitives and an instance of org.apache.lucene.document.Document. This Document *used* to be Serializable, and so this worked fine, with the minor inconvenience that we could not add more properties to LuceneWork without introducing some painful class incompatibilities in clustered deployments. The Lucene project decided that maintaining the guarantees of implementing Serializable is too much of a burden, and in fact the NumericField has never been Serializable, hence this bug is open on Search since we introduced NumericField support: HSEARCH-681 - NotSerializableException when NumericField gets serialized in JMSBackendQueueProcessor #### new architecture #### In Hibernate Search 4 there is an additional level of indirection to the actual communication, it looks like engine ---> index manager --> backend and both components are replaceable; in fact you could plug in an IndexManager which deals with backends in a totally different way, so the RPC channel can use a different format which is not mandated by the API (the second indirection still defines an interface, and by using that you can reuse a larger set of provided components, but you don't have to). Example: an Infinispan IndexManager would not use a standard backend but rather use Infinispan's own communication channels to send write operations to the index. It could still use a JMS backend by assembling the existing components. #### the problem #### So we still have to find a way to serialize the Document instances, tracked by HSEARCH-757 - Explicitly control binary format of communication with the backend I started this mail from the architecture to clarify that we don't need to replace the API making use of LuceneWork instances, which is doing a pretty good job (and is not necessarily the final API for v. 4.0). We also don't need to mandate a specific binary format, as this could be a detail left to different backends; but certainly all implementations would need to deal with this so we need an helper service which could be reused by JMS backends, JGroups, Infinispan, possibly others. As soon as we have such a toy, implementing a new Infinispan IndexManager is going to be pretty easy so I'm looking forward for this as a great means to simplify configuration (and have it working with NumericFields); it's also possible that other fields in the Lucene implementation might drop Serializable soon. # Solution option A) Code a new utility from scratch which provides this bi-directional transformation: List<LuceneWork> <--> byte[] Pros: - flexible, lovely do-it-yourself with no dependencies. Cons: - since Lucene doesn't want to care about Serializable, it's possible that they will sneak in new fields / different fields without notice in minor releases. This is going to need excellent tests as it requires manual code inspection and will become a maintenance overhead (more than usually). # Solution option B) Use JBoss Marshaller to implement the same. We will likely still need to write the details of how to externalize specific Lucene classes, but it's supposed to provide many high performance helpers. Pros: - via Infinispan we already depend on this, but this applies only to the hibernate-search-infinispan module. - when Lucene changes class format, it will help to deal with it as it adapts to the class definition ( we might notice better ). Cons: - will add more dependencies to hibernate-search-core, or we split out all the support for clustering in sub modules. - while it adapts to the class format, produced byte[] streams will be incompatible; we can deal with this by storing example streams in constants and use them in tests. # Solution option C) Don't serialize the Document at all, but send over only the metadata we need encoded in a different ad hoc structure. ## Solution option C+JBM) Even doing so, we could optionally introduce JBoss Marshaller to avoid slow java Serialization. Pros: - better isolation from Lucene changes Cons: - slower "time to market" to expose new Lucene features: until we add it, people won't be able to use it. - We might forget some use case/ make wrong assumptions on the data, making it impossible for people to workaround it unless they plug a different backend implementation. #### WDYT? [Davide, you're in CC as we where considering upgrading your contributor status from beginner, to do some more hardcore stuff.. how would you feel to get this one assigned?] Cheers, Sanne _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev