[ 
https://issues.apache.org/jira/browse/KUDU-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491207#comment-16491207
 ] 

Andrew Wong commented on KUDU-1861:
-----------------------------------

I've been playing around with the `loadgen` tool recently and its non-random 
workload was kind of surprising to me. The spirit of this Jira seems to be to 
update the loadgen tool to "[exercise] a load pattern which maximizes 
throughput", so I thought I'd update it in with some context and some potential 
improvements.

Currently the loadgen supports a couple insert options: an entirely random 
workload, where values for keys and columns alike are generated at random; and 
a sequential-ish workload, where each thread is assigned a partition of values 
to insert with, and each thread inserts rows sequentially within this partition.

In this second workload, these per-thread partitions are orthogonal to the hash 
partitioning used to split up tablets. So while this workload always inserts 
unique keys, it will insert over the entirety of the tablets' keyspaces, 
bounded by `num_rows_per_thread`. This means more bloom lookups, which slows 
down the rate at which we can apply ops, and means more compactions.

So to avoid this, pointed out by this Jira is that we want a workload in which 
tablets would insert rows completely sequentially. Also pointed out is that one 
solution would be to have the partitioning based on `num_rows_per_thread` match 
that of the tablet partitioning, but this seems like a rather intrusive 
semantics change. An alternate approach would be to share a counter across 
threads and have the load generator increment this for in a new "actually 
sequential" mode.

> kudu test loadgen: change default behavior to avoid compactions on tablet 
> servers 
> ----------------------------------------------------------------------------------
>
>                 Key: KUDU-1861
>                 URL: https://issues.apache.org/jira/browse/KUDU-1861
>             Project: Kudu
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 1.2.0
>            Reporter: Alexey Serbin
>            Assignee: Andrew Wong
>            Priority: Major
>
> In the context of use case to '...generate as many Kudu blocks as 
> possible...', the 'kudu test loadgen' tool can do better job if exercising a 
> load pattern which maximizes throughput and avoids compaction activity on 
> tablet servers.
> In short, the default behavior should change for the auto-created table case, 
> so the tool would:
> # create a table with N partitions (where n == number of generator threads)
> # let each worker thread insert sequentially into its own partition
> Current option of having hash-partioned auto-created table should be 
> preserved, but turned off by default.  For some test scenarios, it makes 
> sense to exercise data load patterns which involve a lot of compaction 
> activity on the tablet servers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to