[
https://issues.apache.org/jira/browse/FC-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060134#comment-18060134
]
Ben Manes commented on FC-327:
------------------------------
Shawn, thank you for the details. I believe calling this a "cache" is
misleading for the search use-case and what you want is replication.
For terminology, a cache is a hot subset of transient data for a transparent
speedup by reducing lookups to the system-of-record. A cache miss is common and
expected, with the side effect of a slower but correct operation. Since the
cache avoids costly lookups it can return stale data, so consistency skew must
be acceptable, e.g. the TTL you mentioned.
A replica is a complete copy of a data set, where partial or full means across
a span of data sets, e.g. the entirety of a single sql table where partial
replication is only subset of all tables in the database. That differs, as a
cache would only be a subset of an individual data set (e.g. partial copy of a
sql table). A _cold replica_ is a one-time snapshot, a _warm replica_ is a
periodic snapshot, and a _hot replica_ is a snapshot that is being continuously
updated (steaming).
For simple key-value lookups then a _cache_ is perfectly reasonable. It cannot
perform search queries itself, but can store the results of previous searches.
For example a query cache would have the query itself be a cache key, the
matching record ids the cache value, and a cache miss would execute the query
against the external data store. The query cache would avoid expensive
redundant searches and could be combined with a data cache to fetch the records
by a multi-get batch load. The caches would always be key-value maps that speed
up single key lookups.
For the actual search query evaluation then a cache cannot fulfill this
use-case because the entire data set is needed for the evaluation. Here
replication required to build a complete search index to query against. The
search index would contain all records in the data set, but would be thin
records by only storing the searchable fields to match against and
materializing from the data store. Think ElasticSearch / Solr vs Memcached /
Redis.
Your usage of Ehcache v2 required the entire data set to be in the cache or
else it gave incorrect responses, as search evaluated only a subset of data.
You can certainly emulate this with any other cache by similarly scanning over
its contents, but it can be very misleading and confusing. A more
straightforward approach is to use periodically load all of the searchable
portions of your data set, e.g. [(123, john, doe), (456, dan, smith)], to
evaluate the queries against and then return the full records by the a cache,
e.g. [(john, doe, admin, sales), 456 -> (dan, smith, member, engineering)]. The
search index can be an immutable {{List}} that is replaced by a scheduled task,
scanned to match against the query criteria, and then fully materialized by
cache lookups.
(x) {{Search == Cache}}
(/) {{Search + Cache}}
> Upgrade from ehcache v2
> -----------------------
>
> Key: FC-327
> URL: https://issues.apache.org/jira/browse/FC-327
> Project: FORTRESS
> Issue Type: Improvement
> Affects Versions: 3.0.0
> Reporter: Shawn McKinney
> Priority: Major
> Fix For: 4.0.0
>
>
> Fortress core uses ehcache v2. It is getting long in tooth, has a number of
> CVE's, and needs to be replaced. Here we'll look at alternatives.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]