We are considering using an on-disk - this is planned for later. Even with
an on-disk cache, we still need an eviction policy to ensure that Gandiva
doesn't use up the entire disk.

For now, we are assuming that we can measure the cost accurately - the
assumption is that the query engine would use Gandiva on a thread that is
pinned to a core. For other engines, an alternate estimate of cost can be
the complexity of the expression.

On Tue, Apr 20, 2021 at 2:46 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Projjal,
>
> The main issue here is to compute the cost accurately (is it computation
> runtime? memory footprint? can you measure the computation time
> accurately, regardless of system noise - e.g. other threads and
> processes?).
>
> Intuitively, if the LRU cache shows too many misses, a simple measure is
> to increase its size ;-)
>
> Last question: have you considered a second level on-disk cache?  Numba
> uses such a cache with good results:
> https://numba.readthedocs.io/en/stable/developer/caching.html
>
> Regards
>
> Antoine.
>
>
> Le 20/04/2021 à 06:28, Projjal Chanda a écrit :
> > Hi,
> > We currently have a cache[1] in gandiva that caches the built projector
> or filter module with LRU based eviction policy. However since the cost of
> building different expressions is not uniform it makes sense to have a
> different eviction policy that takes into account an associated cost of a
> cache miss (while also discounting the items which have not been recently
> used). We are planning to use an algorithm called GreedyDual-Size Algorithm
> [2] which seems fit for the purpose. The algorithm is quite simple -
> > Each item has a cost (build time in our case) and item with lowest cost
> (c_min) is evicted. All other items cost are deducted by (c_min)
> > On cache hit, the item cost is restored to the original value
> >
> > This can be implemented using a priority queue and an efficient
> implementation of this can handle both cache hit and eviction in O(logk)
> time.
> >
> > Does anybody have any other suggestions or ideas on this?
> >
> > [1] https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h
> <https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h>
> > [2]
> https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
> <
> https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
> >
> >
> > Regards,
> > Projjal
> >
>

Reply via email to