We would love to use Gandiva in Apache Calcite [1] but we are blocked
because the JAR on Maven Central doesn't work on macOS, Linux or
Windows  [2] and there seems to be no interest in fixing the problem.
So I doubt whether anyone is using Gandiva in production (unless they
have built the artifacts for themselves).

Once Gandiva is working for us we will have an opinion about caching.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-2040

[2] https://issues.apache.org/jira/browse/ARROW-11135

On Tue, Apr 20, 2021 at 2:58 AM Vivekanand Vellanki <vi...@dremio.com> wrote:
>
> We are considering using an on-disk - this is planned for later. Even with
> an on-disk cache, we still need an eviction policy to ensure that Gandiva
> doesn't use up the entire disk.
>
> For now, we are assuming that we can measure the cost accurately - the
> assumption is that the query engine would use Gandiva on a thread that is
> pinned to a core. For other engines, an alternate estimate of cost can be
> the complexity of the expression.
>
> On Tue, Apr 20, 2021 at 2:46 PM Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > Hi Projjal,
> >
> > The main issue here is to compute the cost accurately (is it computation
> > runtime? memory footprint? can you measure the computation time
> > accurately, regardless of system noise - e.g. other threads and
> > processes?).
> >
> > Intuitively, if the LRU cache shows too many misses, a simple measure is
> > to increase its size ;-)
> >
> > Last question: have you considered a second level on-disk cache?  Numba
> > uses such a cache with good results:
> > https://numba.readthedocs.io/en/stable/developer/caching.html
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 20/04/2021 à 06:28, Projjal Chanda a écrit :
> > > Hi,
> > > We currently have a cache[1] in gandiva that caches the built projector
> > or filter module with LRU based eviction policy. However since the cost of
> > building different expressions is not uniform it makes sense to have a
> > different eviction policy that takes into account an associated cost of a
> > cache miss (while also discounting the items which have not been recently
> > used). We are planning to use an algorithm called GreedyDual-Size Algorithm
> > [2] which seems fit for the purpose. The algorithm is quite simple -
> > > Each item has a cost (build time in our case) and item with lowest cost
> > (c_min) is evicted. All other items cost are deducted by (c_min)
> > > On cache hit, the item cost is restored to the original value
> > >
> > > This can be implemented using a priority queue and an efficient
> > implementation of this can handle both cache hit and eviction in O(logk)
> > time.
> > >
> > > Does anybody have any other suggestions or ideas on this?
> > >
> > > [1] https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h
> > <https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h>
> > > [2]
> > https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
> > <
> > https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html
> > >
> > >
> > > Regards,
> > > Projjal
> > >
> >

Reply via email to