[ 
https://issues.apache.org/jira/browse/KUDU-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213532#comment-17213532
 ] 

ASF subversion and git services commented on KUDU-3195:
-------------------------------------------------------

Commit 640a84ecff857c3d0447c690c68e2361eb3e9c3b in kudu's branch 
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=640a84e ]

KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Currently each tablet will wait at least 2 minutes (controlled by
--flush_threshold_secs) between flushing DMSs, even if there are several
DMSs that are older than 2 minutes in a given tablet. This means that
for tablets with several dozen rowsets and updates across the entire
tablet, it could take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Reviewed-on: http://gerrit.cloudera.org:8080/16581
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <aser...@cloudera.com>


> Make DMS flush policy more robust when maintenance threads are idle
> -------------------------------------------------------------------
>
>                 Key: KUDU-3195
>                 URL: https://issues.apache.org/jira/browse/KUDU-3195
>             Project: Kudu
>          Issue Type: Improvement
>          Components: tserver
>    Affects Versions: 1.13.0
>            Reporter: Alexey Serbin
>            Priority: Major
>
> In one scenario I observed very long bootstrap times of tablet servers 
> (something between to 45 minutes and 60 minutes) even if tablet servers had 
> relatively small amount of data under management (~80GByte).  It turned out 
> the time was spent on replaying WAL segments, with {{kudu cluster ksck}} 
> reporting something like below all the time during bootstrap:
> {noformat}
>   b0a20b117a1242ae9fc15620a6f7a524 (tserver-6.local.site:7050): not running
>     State:       BOOTSTRAPPING
>     Data state:  TABLET_DATA_READY
>     Last status: Bootstrap replaying log segment 21/37 (2.28M/7.85M this 
> segment, stats: ops{read=27374 overwritten=0 applied=25016 ignored=657} 
> inserts{seen=5949247 
> ignored=0} mutations{seen=0 ignored=0} orphaned_commits=7)
> {noformat}
> The workload I ran before shutting down the tablet servers consisted of many 
> small UPSERT operations, but the cluster was idle after terminating the 
> workload for long time (about few hours or so).  The workload was generated by
> {noformat}
> kudu perf loadgen \
>   --table_name=$TABLE_NAME \
>   --num_rows_per_thread=800000000 \
>   --num_threads=4 \
>   --use_upsert \
>   --use_random_pk \
>   $MASTER_ADDR
> {noformat}
> The table that the UPSERT workload was running against had been pre-populated 
> by the following:
> {noformat}
> kudu perf loadgen --table_num_replicas=3 --keep-auto-table 
> --table_num_hash_partitions=5 --table_num_range_partitions=5 
> --num_rows_per_thread=800000000 --num_threads=4 $MASTER_ADDR
> {noformat}
> As it turned out, tablet servers accumulated huge number of DMS which 
> required flushing/compaction, but after the memory pressure subsided, the 
> compaction policy was scheduling just one  operation per tablet in every 120 
> seconds (the latter interval is controlled by {{\-\-flush_threshold_secs}}).  
> In fact, tablet servers could flush those rowsets non-stop since the 
> maintenance threads were completely idle otherwise and there were no active 
> workload running against the cluster.  Those DMS has been around for long 
> time (much more than 120 seconds) and were anchoring a lot of WAL segments.  
> So, the operations from the WAL had to be replayed once I restarted the 
> tablet servers.
> It would be great to update the flushing/compaction policy to allow tablet 
> servers run {{FlushDeltaMemStoresOp}} as soon as a DMS becomes older than 
> specified by {{\-\-flush_threshold_secs}} when the maintenance threads are 
> not busy otherwise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to