[
https://issues.apache.org/jira/browse/LUCENE-7410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7410:
---------------------------------
Attachment: LUCENE-7410.patch
I have a patch which makes the situation better I think:
- the ability to remove a listener is gone
- there is no "core" cache key or listener on {{IndexReader}} anymore, only on
{{LeafReader}}
- cache key and listener registration have moved to the
{{IndexReader.CacheHelper}} class so that it is clear which key relates to
which listener. It also makes it very hard to propagate the cache key from a
filtered reader without propagating the listener registration or vice versa,
you cannot do it by mistake anymore.
- {{IndexReader.addReaderClosedListener}} and
{{IndexReader.getCombinedCoreAndDeletesKey}} have been replaced by
{{IndexReader.getReaderCacheHelper}}, which returns null by default, meaning
that the reader is not suited for caching
- {{IndexReader.addCoreClosedListener}} and {{IndexReader.getCoreCacheKey()}}
have been replaced by {{LeafReader.getCoreCacheHelper}}, which returns null by
default, meaning that this leaf reader has no concept of "core" data
- there is only one impl that actually implements
{{LeafReader.getCoreCacheHelper}}: {{SegmentReader}}. All other impls either
delegate to it or do not support caching on a core key.
- there are only two impls that actually implement
{{IndexReader.getReaderCacheHelper}}: {{SegmentReader}} and
{{StandardDirectoryReader}}. All other impls either delegate to it or do not
support caching.
- the query cache and BitSetProducer for joins skip caching when
{{LeafReader.getCoreCacheHelper}} returns null. Other APIs like segment-based
replication or FieldCache fail with an exception since not being able to cache
is a problem/performance trap in those cases.
- IndexReader.CacheKey is used as a cache key to avoid type safety issues.
On the cons side, I removed insanity checking since I could not implement it
anymore with the new API but maybe it is not that much of an issue since cache
insanity is not really possible anymore unless you really want it? I also found
some usage of cache keys in Solr which can be dangerous since cache keys are
shared between readers that have different content, I *think* it should be fine
given how these readers are used (I left notes in the patch), but that is
something we might still want to look into since it could cause very subtle
bugs.
> Make cache keys and closed listeners less trappy
> ------------------------------------------------
>
> Key: LUCENE-7410
> URL: https://issues.apache.org/jira/browse/LUCENE-7410
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Adrien Grand
> Attachments: LUCENE-7410.patch
>
>
> IndexReader currently exposes getCoreCacheKey(),
> getCombinedCoreAndDeletesKey(), addCoreClosedListener() and
> addReaderClosedListener(). They are typically used to manage resources whose
> lifetime needs to mimic the lifetime of segments/indexes, typically caches.
> I think this is trappy for various reasons:
> h3. Memory leaks
> When maintaining a cache, entries are added to the cache based on the cache
> key and then evicted using the cache key that is given back by the close
> listener, so it is very important that both keys are the same.
> But if a filter reader happens to delegate get*Key() and not
> add*ClosedListener() or vice-versa then there is potential for a memory leak
> since the closed listener will be called on a different key and entries will
> never be evicted from the cache.
> h3. Lifetime expectations
> The expectation of using the core cache key is that it will not change in
> case of deletions, but this is only true on SegmentReader and LeafReader
> impls that delegate to it. Other implementations such as composite readers or
> parallel leaf readers use the same key for "core" and "combined core and
> deletes".
> h3. Throw-away wrappers cause cache trashing
> An application might want to either expose more (with a ParrallelReader or
> MultiReader) or less information (by filtering fields/docs that can be seen)
> depending on the user who is logged in. In that case the application would
> typically maintain a DirectoryReader and then wrap it per request depending
> on the logged user and throw away the wrapper once the request is completed.
> The problem is that these wrappers have their own cache keys and the
> application may build something costly and put it in a cache to throw it away
> a couple milliseconds later. I would rather like for such readers to have a
> way to opt out from caching on order to avoid this performance trap.
> h3. Type safety
> The keys that are exposed are plain java.lang.Object instances, which
> requires caches to look like a {{Map<Object, ?>}} which makes it very easy to
> either try to get, put or remove on the wrong object since any object would
> be accepted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]