Joe McDonnell has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/22215 )

Change subject: IMPALA-13478: Sync tuple cache files to disk asynchronously
......................................................................

IMPALA-13478: Sync tuple cache files to disk asynchronously

When a tuple cache entry is first being written, we want to
sync the contents to disk. Currently, that happens on the
fast path and delays the query results, sometimes significantly.
This moves the Sync() call off of the fast path by passing
the work to a thread pool. The threads in the pool open
the file, sync it to disk, then close the file. If anything
goes wrong, the cache entry is evicted.

As basic back pressure, if the thread pool is overwhelmed
and the queue is full, it abandons the write of the cache
entry.

This adds two counters at the daemon level to track the
number of syncs that fail (e.g. due to IO errors) and
the number of syncs that are dropped due to backpressure.

Testing:
 - Added a unit test to tuple-cache-mgr-test
 - Testing with TPC-DS on a cluster with fast NVME SSDs showed
   a significant improvement in the first-run times

Change-Id: I646bb56300656d8b8ac613cb8fe2f85180b386d3
---
M be/src/exec/tuple-file-writer.cc
M be/src/runtime/tuple-cache-mgr-test.cc
M be/src/runtime/tuple-cache-mgr.cc
M be/src/runtime/tuple-cache-mgr.h
4 files changed, 145 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/22215/2
--
To view, visit http://gerrit.cloudera.org:8080/22215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I646bb56300656d8b8ac613cb8fe2f85180b386d3
Gerrit-Change-Number: 22215
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>

Reply via email to