Hi, We currently have a cache[1] in gandiva that caches the built projector or filter module with LRU based eviction policy. However since the cost of building different expressions is not uniform it makes sense to have a different eviction policy that takes into account an associated cost of a cache miss (while also discounting the items which have not been recently used). We are planning to use an algorithm called GreedyDual-Size Algorithm [2] which seems fit for the purpose. The algorithm is quite simple - Each item has a cost (build time in our case) and item with lowest cost (c_min) is evicted. All other items cost are deducted by (c_min) On cache hit, the item cost is restored to the original value
This can be implemented using a priority queue and an efficient implementation of this can handle both cache hit and eviction in O(logk) time. Does anybody have any other suggestions or ideas on this? [1] https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h <https://github.com/apache/arrow/blob/master/cpp/src/gandiva/cache.h> [2] https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html <https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/cao/cao_html/node8.html> Regards, Projjal