xinyiZzz commented on a change in pull request #8476:
URL: https://github.com/apache/incubator-doris/pull/8476#discussion_r829013533
##########
File path: be/src/exec/tablet_info.cpp
##########
@@ -416,7 +416,7 @@
VOlapTablePartitionParam::VOlapTablePartitionParam(std::shared_ptr<OlapTableSche
: _schema(schema),
_t_param(t_param),
_slots(_schema->tuple_desc()->slots()),
- _mem_tracker(MemTracker::create_tracker(-1,
"OlapTablePartitionParam")) {
+ _mem_tracker(MemTracker::create_virtual_tracker(-1,
"OlapTablePartitionParam")) {
Review comment:
Because this tracker needs to manually consume/release
The difference between virutal tracker and non-virutal tracker:
- non-virutal tracker
In order to ensure the absolute accuracy of non-virutal mem tracker tree
statistics, there are only two ways to count: one is to modify the tls mem
tracker through attach or switch, and count in the tcmalloc new/delete hook;
the other is to transfer memory ownership between non-virutal trackers.
-virutal tracker
Manual consume/release as before, the reasons for designing the virutal
tracker: First, to transfer memory ownership between two trackers, it will
release first and then consume, which is slower than calling consume/release
directly on the virutal tracker; second, through parameters After blocking the
virutal tracker, it will prevent the mem tracker tree from becoming more messy,
and it is safer to add or delete the virutal tracker.
The non-virutal tracker is similar to the INFO log level, and the virutal
tracker is similar to the DEBUG log level.
##########
File path: be/src/exec/tablet_info.cpp
##########
@@ -416,7 +416,7 @@
VOlapTablePartitionParam::VOlapTablePartitionParam(std::shared_ptr<OlapTableSche
: _schema(schema),
_t_param(t_param),
_slots(_schema->tuple_desc()->slots()),
- _mem_tracker(MemTracker::create_tracker(-1,
"OlapTablePartitionParam")) {
+ _mem_tracker(MemTracker::create_virtual_tracker(-1,
"OlapTablePartitionParam")) {
Review comment:
Because this tracker needs to manually consume/release
The difference between virutal tracker and non-virutal tracker:
- non-virutal tracker
In order to ensure the absolute accuracy of non-virutal mem tracker tree
statistics, there are only two ways to count: one is to modify the tls mem
tracker through attach or switch, and count in the tcmalloc new/delete hook;
the other is to transfer memory ownership between non-virutal trackers.
- virutal tracker
Manual consume/release as before, the reasons for designing the virutal
tracker: First, to transfer memory ownership between two trackers, it will
release first and then consume, which is slower than calling consume/release
directly on the virutal tracker; second, through parameters After blocking the
virutal tracker, it will prevent the mem tracker tree from becoming more messy,
and it is safer to add or delete the virutal tracker.
The non-virutal tracker is similar to the INFO log level, and the virutal
tracker is similar to the DEBUG log level.
##########
File path: be/src/exec/tablet_sink.cpp
##########
@@ -48,6 +49,7 @@ NodeChannel::NodeChannel(OlapTableSink* parent, IndexChannel*
index_channel, int
if (_parent->_transfer_data_by_brpc_attachment) {
_tuple_data_buffer_ptr = &_tuple_data_buffer;
}
+ _node_channel_tracker = MemTracker::create_tracker(-1, "NodeChannel");
Review comment:
The BE id of a BE mem tracker is the same = _ =, so I added the thread
id.
##########
File path: be/src/exprs/agg_fn_evaluator.cpp
##########
@@ -154,7 +154,7 @@ Status AggFnEvaluator::prepare(RuntimeState* state, const
RowDescriptor& desc, M
_intermediate_slot_desc = intermediate_slot_desc;
_string_buffer_len = 0;
- _mem_tracker = mem_tracker;
+ _mem_tracker = MemTracker::create_virtual_tracker(-1, "AggFnEvaluator",
mem_tracker);
Review comment:
The `memtracker` param is used in Expr::prepare on the next line, which
I modified.
In addition, it is also the parent of the virtual tracker `_mem_tracker`.
##########
File path: be/src/olap/memtable.cpp
##########
@@ -31,14 +31,13 @@ namespace doris {
MemTable::MemTable(int64_t tablet_id, Schema* schema, const TabletSchema*
tablet_schema,
const std::vector<SlotDescriptor*>* slot_descs,
TupleDescriptor* tuple_desc,
- KeysType keys_type, RowsetWriter* rowset_writer,
- const std::shared_ptr<MemTracker>& parent_tracker)
+ KeysType keys_type, RowsetWriter* rowset_writer)
: _tablet_id(tablet_id),
_schema(schema),
_tablet_schema(tablet_schema),
_slot_descs(slot_descs),
_keys_type(keys_type),
- _mem_tracker(MemTracker::create_tracker(-1, "MemTable",
parent_tracker)),
+ _mem_tracker(MemTracker::create_tracker(-1, "MemTable")),
Review comment:
The difference between virtual and non-virtual trackers is explained
above
##########
File path: be/src/olap/rowset/segment_v2/segment.cpp
##########
@@ -51,10 +51,10 @@ Segment::Segment(const FilePathDesc& path_desc, uint32_t
segment_id,
const TabletSchema* tablet_schema)
: _path_desc(path_desc), _segment_id(segment_id),
_tablet_schema(tablet_schema) {
#ifndef BE_TEST
- _mem_tracker = MemTracker::create_tracker(
+ _mem_tracker = MemTracker::create_virtual_tracker(
Review comment:
The difference between virtual and non-virtual trackers is explained
above
##########
File path: be/src/olap/snapshot_manager.h
##########
@@ -99,6 +102,9 @@ class SnapshotManager {
// snapshot
Mutex _snapshot_mutex;
uint64_t _snapshot_base_id;
+
+ // TODO(zxy) used after
+ std::shared_ptr<MemTracker> _mem_tracker = nullptr;
Review comment:
In the next pr, this tracker will be switched to the tls mem tracker in
the public func of `SnapshotManager`.
In this pr, I created all the trackers that will be used in the future, and
built a complete mem tracker tree. (Perhaps it would be better to do this with
a pr alone... = =||| )
##########
File path: be/src/olap/tablet_manager.cpp
##########
@@ -73,7 +73,7 @@ static bool _cmp_tablet_by_create_time(const TabletSharedPtr&
a, const TabletSha
}
TabletManager::TabletManager(int32_t tablet_map_lock_shard_size)
- : _mem_tracker(MemTracker::create_tracker(-1, "TabletManager", nullptr,
+ : _mem_tracker(MemTracker::create_virtual_tracker(-1, "TabletManager",
nullptr,
Review comment:
Because of the cache involved, the memory statistics of `TabletManager`
were skipped before,
This has been implemented in the updated code.
##########
File path: be/src/olap/task/engine_alter_tablet_task.cpp
##########
@@ -25,7 +25,14 @@ namespace doris {
using std::to_string;
EngineAlterTabletTask::EngineAlterTabletTask(const TAlterTabletReqV2& request)
- : _alter_tablet_req(request) {}
+ : _alter_tablet_req(request) {
+ _mem_tracker = MemTracker::create_tracker(
Review comment:
Same as above, // TODO(zxy) used after
##########
File path: be/src/olap/snapshot_manager.h
##########
@@ -99,6 +102,9 @@ class SnapshotManager {
// snapshot
Mutex _snapshot_mutex;
uint64_t _snapshot_base_id;
+
+ // TODO(zxy) used after
+ std::shared_ptr<MemTracker> _mem_tracker = nullptr;
Review comment:
In the next pr, this tracker will be switched to the tls mem tracker in
the public func of `SnapshotManager`.
In this pr, I created all the trackers that will be used in the future, and
built a complete mem tracker tree. (Perhaps it would be better to do this with
a pr alone... Do you think it needs to be deleted first? = =||| )
##########
File path: be/src/runtime/dpp_sink.cpp
##########
@@ -0,0 +1,926 @@
+// Licensed to the Apache Software Foundation (ASF) under one
Review comment:
no, need to be deleted
This file was deleted on the master, I didn't pay attention when I rebase
##########
File path: be/src/runtime/row_batch.cpp
##########
@@ -39,8 +40,8 @@ namespace doris {
const int RowBatch::AT_CAPACITY_MEM_USAGE = 8 * 1024 * 1024;
const int RowBatch::FIXED_LEN_BUFFER_LIMIT = AT_CAPACITY_MEM_USAGE / 2;
-RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity, MemTracker*
mem_tracker)
- : _mem_tracker(mem_tracker),
+RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity)
+ :
_mem_tracker(thread_local_ctx.get()->_thread_mem_tracker_mgr->mem_tracker()),
Review comment:
I understand that modifying the mem tracker of `rowbatch` is not
directly related to the thread used.
My understanding of this question is: whether there is a `rowbatch`, consume
tls A mem tracker when new, release tls B mem tracker when delete, and both
trackers are not equal to 0 when they are finally destructed.
The actual code now avoids the above problem. Through
`RowBatch::acquire_state` and `RowBatch::transfer_resource_ownership`, complete
the mem_tracker update and memory ownership transfer of buffers in two
`row_batch`, avoiding the new and delete of buffers in a `rowbatch` on
different trackers.
For example: In `OlapScanNode::get_next`, the `rowbatch` created by Scanner
will be transferred to the external parameter `row_batch` through
`RowBatch::acquire_state`, and the mem tracker of the buffer will be modified.
Ownership is transferred in two row_batches via `update_mem_tracker`.
But I'm not sure if the buffer mem_tracker is updated in all similar places,
which requires manual maintenance to ensure that the new and delete of a buffer
are in the same tracker. **I will test further. **
Similarly, `mem pool` also has a similar situation, `mem pool` also provides
`MemPool::acquire_data` and `MemPool::exchange_data` to complete the transfer
of chunks. However, I used to add the tls mem tracker when allocate in each
`chunk`, and found that the `chunk` tracker is different from the tls mem
tracker when the `mem pool` is destructed.
##########
File path: be/src/runtime/row_batch.cpp
##########
@@ -39,8 +40,8 @@ namespace doris {
const int RowBatch::AT_CAPACITY_MEM_USAGE = 8 * 1024 * 1024;
const int RowBatch::FIXED_LEN_BUFFER_LIMIT = AT_CAPACITY_MEM_USAGE / 2;
-RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity, MemTracker*
mem_tracker)
- : _mem_tracker(mem_tracker),
+RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity)
+ :
_mem_tracker(thread_local_ctx.get()->_thread_mem_tracker_mgr->mem_tracker()),
Review comment:
I understand that modifying the mem tracker of `rowbatch` is not
directly related to the thread used.
My understanding of this question is: whether there is a `rowbatch`, consume
tls A mem tracker when new, release tls B mem tracker when delete, and both
trackers are not equal to 0 when they are finally destructed.
The actual code now avoids the above problem. Through
`RowBatch::acquire_state` and `RowBatch::transfer_resource_ownership`, complete
the mem_tracker update and memory ownership transfer of buffers in two
`row_batch`, avoiding the new and delete of buffers in a `rowbatch` on
different trackers.
For example: In `OlapScanNode::get_next`, the `rowbatch` created by Scanner
will be transferred to the external parameter `row_batch` through
`RowBatch::acquire_state`, and the mem tracker of the buffer will be modified.
Ownership is transferred in two row_batches via `update_mem_tracker`.
But I'm not sure if the buffer mem_tracker is updated in all similar places,
which requires manual maintenance to ensure that the new and delete of a buffer
are in the same tracker. ** I will test in further. **
Similarly, `mem pool` also has a similar situation, `mem pool` also provides
`MemPool::acquire_data` and `MemPool::exchange_data` to complete the transfer
of chunks. However, I used to add the tls mem tracker when allocate in each
`chunk`, and found that the `chunk` tracker is different from the tls mem
tracker when the `mem pool` is destructed.
##########
File path: be/src/runtime/row_batch.cpp
##########
@@ -39,8 +40,8 @@ namespace doris {
const int RowBatch::AT_CAPACITY_MEM_USAGE = 8 * 1024 * 1024;
const int RowBatch::FIXED_LEN_BUFFER_LIMIT = AT_CAPACITY_MEM_USAGE / 2;
-RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity, MemTracker*
mem_tracker)
- : _mem_tracker(mem_tracker),
+RowBatch::RowBatch(const RowDescriptor& row_desc, int capacity)
+ :
_mem_tracker(thread_local_ctx.get()->_thread_mem_tracker_mgr->mem_tracker()),
Review comment:
I understand that modifying the mem tracker of `rowbatch` is not
directly related to the thread used.
My understanding of this question is: whether there is a `rowbatch`, consume
tls A mem tracker when new, release tls B mem tracker when delete, and both
trackers are not equal to 0 when they are finally destructed.
The actual code now avoids the above problem. Through
`RowBatch::acquire_state` and `RowBatch::transfer_resource_ownership`, complete
the mem_tracker update and memory ownership transfer of buffers in two
`row_batch`, avoiding the new and delete of buffers in a `rowbatch` on
different trackers.
For example: In `OlapScanNode::get_next`, the `rowbatch` created by Scanner
will be transferred to the external parameter `row_batch` through
`RowBatch::acquire_state`, and the mem tracker of the buffer will be modified.
Ownership is transferred in two row_batches via `update_mem_tracker`.
But I'm not sure if the buffer mem_tracker is updated in all similar places,
which requires manual maintenance to ensure that the new and delete of a buffer
are in the same tracker.
- I will test in further.
Similarly, `mem pool` also has a similar situation, `mem pool` also provides
`MemPool::acquire_data` and `MemPool::exchange_data` to complete the transfer
of chunks. However, I used to add the tls mem tracker when allocate in each
`chunk`, and found that the `chunk` tracker is different from the tls mem
tracker when the `mem pool` is destructed.
##########
File path: be/src/runtime/thread_context.h
##########
@@ -50,27 +56,65 @@ class ThreadContext {
};
public:
- ThreadContext() : _thread_id(std::this_thread::get_id()),
_type(TaskType::UNKNOWN) {}
+ ThreadContext() : _thread_id(std::this_thread::get_id()),
_type(TaskType::UNKNOWN) {
+ _thread_mem_tracker_mgr.reset(new ThreadMemTrackerMgr());
+ }
void attach(const TaskType& type, const std::string& task_id,
- const TUniqueId& fragment_instance_id) {
+ const TUniqueId& fragment_instance_id,
+ const std::shared_ptr<MemTracker>& mem_tracker) {
DCHECK(_type == TaskType::UNKNOWN && _task_id == "");
_type = type;
_task_id = task_id;
_fragment_instance_id = fragment_instance_id;
+ _thread_mem_tracker_mgr->attach_task(task_type_string(_type), task_id,
fragment_instance_id,
+ mem_tracker);
}
void detach() {
_type = TaskType::UNKNOWN;
_task_id = "";
_fragment_instance_id = TUniqueId();
+ _thread_mem_tracker_mgr->detach_task();
}
- const std::string type() const;
const std::string& task_id() const { return _task_id; }
const std::thread::id& thread_id() const { return _thread_id; }
const TUniqueId& fragment_instance_id() const { return
_fragment_instance_id; }
+ inline static const std::string task_type_string(ThreadContext::TaskType
type) {
+ switch (type) {
+ case ThreadContext::TaskType::QUERY:
+ return "QUERY";
+ case ThreadContext::TaskType::LOAD:
+ return "LOAD";
+ case ThreadContext::TaskType::COMPACTION:
+ return "COMPACTION";
+ default:
+ return "UNKNOWN";
+ }
Review comment:
I see, it seems that compilers does compile switch constructs into array
of code pointers, as said in the link above.
I changed to array as you said, which seems more concise.
Also, I did a simple test in the benchmark and it seems that switch
constructs are faster, I didn't analyze the assembly code carefully,
performance is not the bottleneck in this case.


##########
File path: be/src/runtime/thread_context.h
##########
@@ -50,27 +56,65 @@ class ThreadContext {
};
public:
- ThreadContext() : _thread_id(std::this_thread::get_id()),
_type(TaskType::UNKNOWN) {}
+ ThreadContext() : _thread_id(std::this_thread::get_id()),
_type(TaskType::UNKNOWN) {
+ _thread_mem_tracker_mgr.reset(new ThreadMemTrackerMgr());
+ }
void attach(const TaskType& type, const std::string& task_id,
- const TUniqueId& fragment_instance_id) {
+ const TUniqueId& fragment_instance_id,
+ const std::shared_ptr<MemTracker>& mem_tracker) {
DCHECK(_type == TaskType::UNKNOWN && _task_id == "");
_type = type;
_task_id = task_id;
_fragment_instance_id = fragment_instance_id;
+ _thread_mem_tracker_mgr->attach_task(task_type_string(_type), task_id,
fragment_instance_id,
+ mem_tracker);
}
void detach() {
_type = TaskType::UNKNOWN;
_task_id = "";
_fragment_instance_id = TUniqueId();
+ _thread_mem_tracker_mgr->detach_task();
}
- const std::string type() const;
const std::string& task_id() const { return _task_id; }
const std::thread::id& thread_id() const { return _thread_id; }
const TUniqueId& fragment_instance_id() const { return
_fragment_instance_id; }
+ inline static const std::string task_type_string(ThreadContext::TaskType
type) {
+ switch (type) {
+ case ThreadContext::TaskType::QUERY:
+ return "QUERY";
+ case ThreadContext::TaskType::LOAD:
+ return "LOAD";
+ case ThreadContext::TaskType::COMPACTION:
+ return "COMPACTION";
+ default:
+ return "UNKNOWN";
+ }
Review comment:
I understand, it seems that compilers does compile switch constructs
into array of code pointers, as said in the link above.
I changed to array as you said, which seems more concise.
Also, I did a simple test in the benchmark and it seems that switch
constructs are faster, I didn't analyze the assembly code carefully,
performance is not the bottleneck in this case.


##########
File path: be/src/runtime/thread_context.h
##########
@@ -50,27 +56,65 @@ class ThreadContext {
};
public:
- ThreadContext() : _thread_id(std::this_thread::get_id()),
_type(TaskType::UNKNOWN) {}
+ ThreadContext() : _thread_id(std::this_thread::get_id()),
_type(TaskType::UNKNOWN) {
+ _thread_mem_tracker_mgr.reset(new ThreadMemTrackerMgr());
+ }
void attach(const TaskType& type, const std::string& task_id,
- const TUniqueId& fragment_instance_id) {
+ const TUniqueId& fragment_instance_id,
+ const std::shared_ptr<MemTracker>& mem_tracker) {
DCHECK(_type == TaskType::UNKNOWN && _task_id == "");
_type = type;
_task_id = task_id;
_fragment_instance_id = fragment_instance_id;
+ _thread_mem_tracker_mgr->attach_task(task_type_string(_type), task_id,
fragment_instance_id,
+ mem_tracker);
}
void detach() {
_type = TaskType::UNKNOWN;
_task_id = "";
_fragment_instance_id = TUniqueId();
+ _thread_mem_tracker_mgr->detach_task();
}
- const std::string type() const;
const std::string& task_id() const { return _task_id; }
const std::thread::id& thread_id() const { return _thread_id; }
const TUniqueId& fragment_instance_id() const { return
_fragment_instance_id; }
+ inline static const std::string task_type_string(ThreadContext::TaskType
type) {
+ switch (type) {
+ case ThreadContext::TaskType::QUERY:
+ return "QUERY";
+ case ThreadContext::TaskType::LOAD:
+ return "LOAD";
+ case ThreadContext::TaskType::COMPACTION:
+ return "COMPACTION";
+ default:
+ return "UNKNOWN";
+ }
Review comment:
I understand, it seems that compilers does compile switch constructs
into array of code pointers, as said in the link above.
I changed to array as you said, which seems more concise. thx~ @dataroaring
Also, I did a simple test in the benchmark and it seems that switch
constructs are faster, I didn't analyze the assembly code carefully,
performance is not the bottleneck in this case.


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]