On 27 April 2017 at 15:11, Yoann Rodiere <yo...@hibernate.org> wrote: > I wonder, what's the benefit for HSEARCH-2616? Do you want to have that > field so that we can just use AddLuceneWorks everywhere, and run targeted > delete operations when we start a partition? If so, is it as a fallback > solution, if what I proposed cannot be implemented, or as a better > alternative? Note I don't have strong arguments against that solution, I'm > just trying to understand the "why".
I had written the "why" on HSEARCH-2616, but to clarify here: I liked your idea of trying to figure out if the current block of work is being repeated, vs it being a re-try. However while I initially thought to add such a field as a fallback solution, I believe it's ultimately the more robust solution as otherwise you have to trust such state, which could be lost / wrong / corrupted independently for a number of reasons. Since the problem being solved is about resuming the process after a problem happened we can't make many safe assumptions about what kind of problem we're dealing with; for example if you run out of disk space you'll have an half-written index but no way to store such batch-state. Other problems might involve indexes being backed up / restored / replicated over other technologies (rsync, Infinispan, ..) so a mismatch between the index and other state is yet another problem which might need caution, logs and possibly tooling. Say an IO operation fails during an index write flush: some admin intervenes fixing hardware and then triggers resume of indexing. In such conditions I wouldn't trust some additional persistent state not even if it were cryptographically signed to be correct: corruption or signature mismatches could be detected but in this case there's the risk of it being trustful but out of date: with IO unavailable when this should have been written you're probably reading the previous version which had been written. Having an out of date batch state would likely have the opposite effect of what we need. On the other hand, inspecting what's in the index is coupled with the index state so while indexes could be corrupted, the progress tracking state and the index being one thing you're not easily fooled. Since I agree that having additional fields is not something everyone will like, as I suggested on HSEARCH-2616 we could offer the alternatives as fallback. > > On adding a hidden field, I wonder what this will mean for Elasticsearch; if > we start doing such things, we should clearly and explicitly state in the > documentation that targeting existing ES schemas without adapting them to > Hibernate Search is not supported. > On top of that, it may hurt users upgrading Hibernate Search: Lucene may > simply ignore queries against a field that doesn't exist in the index, but > I'm not sure Elasticsearch behaves that way when the field isn't even > defined in the mapping. So users may have to upgrade their schema just for > that. I know Elasticsearch integration is experimental anyway, but what I > mean is if we do that, it must be *before* Elasticsearch we drop the > "experimental" mention on Elasticsearch integration. Good point. Such proposals to change some internal field don't happen very often though. We strive to have a stable encoding, but since the index is not the database well documented changes might be worth it. Especially "private internal" fields should not be too hard to manage as we can deal with them explicitly in some lenient way, and if they don't contain end user state like in this case we don't even have to require an index rebuild. For people not wanting this they can have a slower mass indexer, or not support recovery. Thanks, Sanne > > > Yoann Rodière > Hibernate NoORM Team > yo...@hibernate.org > > On 27 April 2017 at 15:59, Yoann Rodiere <yrodi...@redhat.com> wrote: >> >> I wonder, what's the benefit for HSEARCH-2616? Do you want to have that >> field so that we can just use AddLuceneWorks everywhere, and run targeted >> delete operations when we start a partition? If so, is it as a fallback >> solution, if what I proposed cannot be implemented, or as a better >> alternative? Note I don't have strong arguments against that solution, I'm >> just trying to understand the "why". >> >> On adding a hidden field, I wonder what this will mean for Elasticsearch; >> if we start doing such things, we should clearly and explicitly state in the >> documentation that targeting existing ES schemas without adapting them to >> Hibernate Search is not supported. >> On top of that, it may hurt users upgrading Hibernate Search: Lucene may >> simply ignore queries against a field that doesn't exist in the index, but >> I'm not sure Elasticsearch behaves that way when the field isn't even >> defined in the mapping. So users may have to upgrade their schema just for >> that. I know Elasticsearch integration is experimental anyway, but what I >> mean is if we do that, it must be *before* Elasticsearch we drop the >> "experimental" mention on Elasticsearch integration. >> >> >> Yoann Rodière >> Software Engineer, Hibernate NoORM Team >> Red Hat >> yrodi...@redhat.com >> >> On 27 April 2017 at 15:23, Sanne Grinovero <sa...@hibernate.org> wrote: >>> >>> To better implement recovery operations during MassIndexer >>> [HSEARCH-2616] - specifically in the context of the upcoming JBatch >>> based implementation - I'm considering the benefits of adding one more >>> field the the Lucene index for our internal purposes. >>> >>> This new field is only useful for Hibernate Search internals so we >>> shouldn't allow it to be targeted by queries, etc.. >>> >>> There is a single precedent: we already encode the entity name, so >>> "hiding fields" is not a new problem that we have to deal with. It >>> might be a reason to polish the existing concept and improve the >>> encapsulation. >>> >>> Would anyone have a strong case against this? >>> >>> Thanks, >>> Sanne >>> _______________________________________________ >>> hibernate-dev mailing list >>> hibernate-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/hibernate-dev >> >> > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev