This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 7288b8b  Automated deployment: Wed Jul 28 09:02:12 UTC 2021 
1a4297276255b99edf003015fc90a2598c2e7361
7288b8b is described below

commit 7288b8b026f4062de5cfc3912815a4a32ac28c2a
Author: dockerzhang <dockerzh...@users.noreply.github.com>
AuthorDate: Wed Jul 28 09:02:12 2021 +0000

    Automated deployment: Wed Jul 28 09:02:12 UTC 2021 
1a4297276255b99edf003015fc90a2598c2e7361
---
 docs/en-us/modules/dataproxy/architecture.md      | 143 ++++++++++++++++++++-
 docs/en-us/modules/dataproxy/img/architecture.png | Bin 0 -> 431999 bytes
 docs/zh-cn/modules/dataproxy/architecture.md      | 146 +++++++++++++++++++++-
 docs/zh-cn/modules/dataproxy/img/architecture.png | Bin 0 -> 431999 bytes
 en-us/docs/modules/dataproxy/architecture.html    | 134 +++++++++++++++++++-
 en-us/docs/modules/dataproxy/architecture.json    |   2 +-
 en-us/docs/modules/dataproxy/architecture.md      | 143 ++++++++++++++++++++-
 en-us/docs/modules/dataproxy/img/architecture.png | Bin 0 -> 431999 bytes
 zh-cn/docs/modules/dataproxy/architecture.html    | 134 +++++++++++++++++++-
 zh-cn/docs/modules/dataproxy/architecture.json    |   2 +-
 zh-cn/docs/modules/dataproxy/architecture.md      | 146 +++++++++++++++++++++-
 zh-cn/docs/modules/dataproxy/img/architecture.png | Bin 0 -> 431999 bytes
 12 files changed, 829 insertions(+), 21 deletions(-)

diff --git a/docs/en-us/modules/dataproxy/architecture.md 
b/docs/en-us/modules/dataproxy/architecture.md
index 2fe2abf..8e6d2ef 100644
--- a/docs/en-us/modules/dataproxy/architecture.md
+++ b/docs/en-us/modules/dataproxy/architecture.md
@@ -1,11 +1,150 @@
 # 1、intro
 
-    Inlong-bus belongs to the inlong proxy layer and is used for data 
collection, reception and forwarding. Through format conversion, the data is 
converted into TDMsg1 format that can be cached and processed by the cache layer
-    The overall architecture of inlong-bus is based on Apache Flume. On the 
basis of this project, inlong-bus expands the source layer and sink layer, and 
optimizes disaster tolerance forwarding, which improves the stability of the 
system.
+    Inlong-dataProxy belongs to the inlong proxy layer and is used for data 
collection, reception and forwarding. Through format conversion, the data is 
converted into TDMsg1 format that can be cached and processed by the cache layer
+    InLong-dataProxy acts as a bridge from the InLong collection end to the 
InLong buffer end. Dataproxy pulls the relationship between the business id and 
the corresponding topic name from the manager module, and internally manages 
the producers of multiple topics
+    The overall architecture of inlong-dataproxy is based on Apache Flume. On 
the basis of this project, inlong-bus expands the source layer and sink layer, 
and optimizes disaster tolerance forwarding, which improves the stability of 
the system.
 
 
 # 2、architecture
 
+![](img/architecture.png)
+
        1. The source layer opens port monitoring, which is realized through 
netty server. The decoded data is sent to the channel layer
        2. The channel layer has a selector, which is used to choose which type 
of channel to go. If the memory is eventually full, the data will be processed.
        3. The data of the channel layer will be forwarded through the sink 
layer. The main purpose here is to convert the data to the TDMsg1 format and 
push it to the cache layer (tube is more commonly used here)
+
+# 3、DataProxy support configuration instructions
+
+DataProxy supports configurable source-channel-sink, and the configuration 
method is the same as the configuration file structure of flume:
+
+Source configuration example and corresponding notes:
+
+    agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 ch-more1 
ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 ch-transfer 
ch -Back
+    Define the channel used in the source. Note that if the configuration 
below this source uses the channel, it needs to be annotated here
+
+    agent1.sources.tcp-source.type = org.apache.flume.source.SimpleTcpSource
+    tcp resolution type definition, here provide the class name for 
instantiation, SimpleTcpSource is mainly to initialize the configuration and 
start port monitoring
+
+    agent1.sources.tcp-source.msg-factory-name = 
org.apache.flume.source.ServerMessageFactory
+    Handler used for message structure analysis, and set read stream handler 
and write stream handler, netty concept, please refer to: 
https://blog.csdn.net/u013252773/article/details/21195593
+
+
+    agent1.sources.tcp-source.host = 0.0.0.0
+    tcp ip binding monitoring, binding all network cards by default
+
+    agent1.sources.tcp-source.port = 46801
+    tcp port binding, port 46801 is bound by default
+
+    agent1.sources.tcp-source.highWaterMark=2621440
+    The concept of netty, set the netty high water level value
+
+    agent1.sources.tcp-source.enableExceptionReturn=true
+    The new function of v1.7 version, optional, the default is false, used to 
open the exception channel, when an exception occurs, the data is written to 
the exception channel to prevent other normal data transmission (the open 
source version does not add this function), Details: Increase the local disk of 
abnormal data landing
+
+    agent1.sources.tcp-source.max-msg-length = 524288
+    Limit the size of a single package, here if the compressed package is 
transmitted, it is the compressed package size, the limit is 512KB
+
+    agent1.sources.tcp-source.topic = test_token
+    The default topic value, if the mapping relationship between bid and topic 
cannot be found, it will be sent to this topic
+
+    agent1.sources.tcp-source.attr = m=9
+    The default value of m is set, where the value of m is the version of 
inlong's internal TdMsg protocol
+
+    agent1.sources.tcp-source.connections = 5000
+    Concurrent connections go online, new connections will be broken when the 
upper limit is exceeded
+
+    agent1.sources.tcp-source.max-threads = 64
+    Netty thread pool work thread upper limit, generally recommended to choose 
twice the cpu
+
+    agent1.sources.tcp-source.receiveBufferSize = 524288
+    Netty server tcp tuning parameters, please refer to: 
https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+    
+    agent1.sources.tcp-source.sendBufferSize = 524288
+    Netty server tcp tuning parameters, please refer to: 
https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+    agent1.sources.tcp-source.custom-cp = true
+    Whether to use the self-developed channel process, the self-developed 
channel process can select the alternate channel to send when the main channel 
is blocked
+
+    agent1.sources.tcp-source.selector.type = 
org.apache.flume.channel.FailoverChannelSelector
+    This channel selector is a self-developed channel selector, which is not 
much different from the official website, mainly because of the channel 
master-slave selection logic
+
+    agent1.sources.tcp-source.selector.master = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    Specify the master channel, these channels will be preferentially selected 
for data push. Those channels that are not in the master, transfer, fileMetric, 
and slaMetric configuration items, but are in
+    There are defined channels in channels, which are all classified as slave 
channels. When the master channel is full, the slave channel will be selected. 
Generally, the file channel type is recommended for the slave channel.
+
+    agent1.sources.tcp-source.selector.transfer = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    Specify the transfer channel to accept the transfer type data. The 
transfer here generally refers to the data pushed to the non-tube cluster, 
which is only for forwarding, and it is reserved for subsequent functions.
+
+    agent1.sources.tcp-source.selector.fileMetric = ch-back
+    Specify the fileMetric channel to receive the metric data reported by the 
agent
+
+
+Channel configuration examples and corresponding annotations
+
+memory channel
+
+    agent1.channels.ch-more1.type = memory
+    memory channel type
+
+    agent1.channels.ch-more1.capacity = 10000000
+    Memory channel queue size, the maximum number of messages that can be 
cached
+
+    agent1.channels.ch-more1.keep-alive = 0
+    
+    agent1.channels.ch-more1.transactionCapacity = 20
+    The maximum number of batches are processed in atomic operations, and the 
memory channel needs to be locked when used, so there will be a batch process 
to increase efficiency
+
+file channel
+
+    agent1.channels.ch-msg5.type = file
+    file channel type
+
+    agent1.channels.ch-msg5.capacity = 100000000
+    The maximum number of messages that can be cached in a file channel
+
+    agent1.channels.ch-msg5.maxFileSize = 1073741824
+    file channel file maximum limit, the number of bytes
+
+    agent1.channels.ch-msg5.minimumRequiredSpace = 1073741824
+    The minimum free space of the disk where the file channel is located. 
Setting this value can prevent the disk from being full
+
+    agent1.channels.ch-msg5.checkpointDir = /data/work/file/ch-msg5/check
+    file channel checkpoint path
+
+    agent1.channels.ch-msg5.dataDirs = /data/work/file/ch-msg5/data
+    file channel data path
+
+    agent1.channels.ch-msg5.fsyncPerTransaction = false
+    Whether to synchronize the disk for each atomic operation, it is 
recommended to change it to false, otherwise it will affect the performance
+
+    agent1.channels.ch-msg5.fsyncInterval = 5
+    The time interval between data flush from memory to disk, in seconds
+
+Sink configuration example and corresponding notes
+
+    agent1.sinks.meta-sink-more1.channel = ch-msg1
+    The upstream channel name of the sink
+
+    agent1.sinks.meta-sink-more1.type = org.apache.flume.sink.MetaSink
+    The sink class is implemented, where the message is implemented to push 
data to the tube cluster
+
+    agent1.sinks.meta-sink-more1.master-host-port-list = 
sk-tegspider-tube-master-1:8609,sk-tegspider-tube-master-2:8609
+    Tube cluster master node list
+
+    agent1.sinks.meta-sink-more1.send_timeout = 30000
+    Timeout limit when sending to tube
+
+    agent1.sinks.meta-sink-more1.stat-interval-sec = 60
+    Sink indicator statistics interval time, in seconds
+
+    agent1.sinks.meta-sink-more1.thread-num = 8
+    Sink class sends messages to the worker thread, 8 means to start 8 
concurrent threads
+
+    agent1.sinks.meta-sink-more1.client-id-cache = true
+    agent id cache, used to check the data reported by the agent to remove 
duplicates
+
+    agent1.sinks.meta-sink-more1.max-survived-time = 300000
+    Maximum cache time
+    
+    agent1.sinks.meta-sink-more1.max-survived-size = 3000000
+    Maximum number of caches
diff --git a/docs/en-us/modules/dataproxy/img/architecture.png 
b/docs/en-us/modules/dataproxy/img/architecture.png
new file mode 100644
index 0000000..bc46026
Binary files /dev/null and b/docs/en-us/modules/dataproxy/img/architecture.png 
differ
diff --git a/docs/zh-cn/modules/dataproxy/architecture.md 
b/docs/zh-cn/modules/dataproxy/architecture.md
index c3a6901..7cf7770 100644
--- a/docs/zh-cn/modules/dataproxy/architecture.md
+++ b/docs/zh-cn/modules/dataproxy/architecture.md
@@ -1,11 +1,147 @@
 # 一、说明
 
-       inlong-bus属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
-       inlong-bus整体架构基于Apache 
Flume。inlong-bus在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
-
-
+       InLong-dataProxy属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
+    
InLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者
+       当dataproxy收到消息时,会首先缓存到本地的Channel中,并使用本地的producer往后端即cache层发送数据
+    InLong-dataProxy整体架构基于Apache 
Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
+    
+    
 # 二、架构
 
+![](img/architecture.png)
+
        1.Source层开启端口监听,通过netty server实现。解码之后的数据发到channel层
        2.channel层有一个selector,用于选择走哪种类型的channel,如果memory最终满了,会对数据做落地处理
-       3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
\ No newline at end of file
+       3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
+
+
+# 三、DataProxy功能配置说明
+
+DataProxy支持配置化的source-channel-sink,配置方式与flume的配置文件结构相同:
+
+Source配置示例以及对应的注解:
+
+    agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 ch-more1 
ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 ch-transfer 
ch-back
+    定义source中使用到的channel,注意此source下面的配置如果有使用到channel,均需要在此注释
+
+    agent1.sources.tcp-source.type = org.apache.flume.source.SimpleTcpSource
+    tcp解析类型定义,这里提供类名用于实例化,SimpleTcpSource主要是初始化配置并启动端口监听
+
+    agent1.sources.tcp-source.msg-factory-name = 
org.apache.flume.source.ServerMessageFactory
+    用于构造消息解析的handler,并设置read stream handler和write stream 
handler,netty概念,具体可参考:https://blog.csdn.net/u013252773/article/details/21195593
+
+    agent1.sources.tcp-source.host = 0.0.0.0    
+    tcp ip绑定监听,默认绑定所有网卡
+
+    agent1.sources.tcp-source.port = 46801
+    tcp 端口绑定,默认绑定46801端口
+
+    agent1.sources.tcp-source.highWaterMark=2621440 
+    netty概念,设置netty高水位值,高水位详细解答可参考:https://www.jianshu.com/p/c9e9a5a11943
+
+    agent1.sources.tcp-source.max-msg-length = 524288
+    限制单个包大小,这里如果传输的是压缩包,则是压缩包大小,限制512KB
+
+    agent1.sources.tcp-source.topic = test_token
+    默认topic值,如果bid和topic的映射关系找不到,则发送到此topic中
+
+    agent1.sources.tcp-source.attr = m=9
+    默认m值设置,这里的m值是inlong内部TdMsg协议的版本
+
+    agent1.sources.tcp-source.connections = 5000
+    并发连接上线,超过上限值时会对新连接做断链处理
+
+    agent1.sources.tcp-source.max-threads = 64
+    netty线程池工作线程上限,一般推荐选择cpu的两倍
+
+    agent1.sources.tcp-source.receiveBufferSize = 524288
+    netty server 
tcp调优参数,具体解释可参考:https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+    
+    agent1.sources.tcp-source.sendBufferSize = 524288
+    netty server 
tcp调优参数,具体解释可参考:https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+    agent1.sources.tcp-source.custom-cp = true
+    是否使用自研的channel process,自研channel process可在主channel阻塞时,选择备用channel发送
+
+    agent1.sources.tcp-source.selector.type = 
org.apache.flume.channel.FailoverChannelSelector
+    这个channel selector就是自研的channel selector,和官网的差别不大,主要是有channel主从选择逻辑
+
+    agent1.sources.tcp-source.selector.master = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    指定master 
channel,这些channel会被优先选择用于数据推送。那些不在master、transfer、fileMetric、slaMetric配置项里的channel,但在
+    channels里面有定义的channel,统归为slave channel,当master channel都被占满时,就会选择使用slave 
channel,slave channel一般建议使用file channel类型
+
+    agent1.sources.tcp-source.selector.transfer = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    指定transfer 
channel,承接transfer类型的数据,这里的transfer一般是指推送到非tube集群的数据,仅做转发,这里预留出来供后续功能使用
+
+    agent1.sources.tcp-source.selector.fileMetric = ch-back
+    指定fileMetric channel,用于接收agent上报的指标数据
+
+Channel配置示例以及对应的注解
+
+memory channel
+
+    agent1.channels.ch-more1.type = memory
+    memory channel类型
+
+    agent1.channels.ch-more1.capacity = 10000000
+    memory channel 队列大小,可缓存最大消息条数
+
+    agent1.channels.ch-more1.keep-alive = 0
+    
+    agent1.channels.ch-more1.transactionCapacity = 20
+    原子操作时批量处理最大条数,memory channel使用时需要用到加锁,因此会有批处理流程增加效率
+
+file channel
+
+    agent1.channels.ch-msg5.type = file
+    file channel类型
+
+    agent1.channels.ch-msg5.capacity = 100000000
+    file channel最大可缓存消息条数
+
+    agent1.channels.ch-msg5.maxFileSize = 1073741824
+    file channel文件最大上限,字节数
+
+    agent1.channels.ch-msg5.minimumRequiredSpace = 1073741824
+    file channel所在磁盘最小可用空间,设置此值可以防止磁盘写满
+
+    agent1.channels.ch-msg5.checkpointDir = /data/work/file/ch-msg5/check
+    file channel checkpoint路径
+
+    agent1.channels.ch-msg5.dataDirs = /data/work/file/ch-msg5/data
+    file channel数据路径
+
+    agent1.channels.ch-msg5.fsyncPerTransaction = false
+    是否对每个原子操作做同步磁盘,建议改false,否则会对性能有影响
+
+    agent1.channels.ch-msg5.fsyncInterval = 5
+    数据从内存flush到磁盘的时间间隔,单位秒
+
+Sink配置示例以及对应的注解
+
+    agent1.sinks.meta-sink-more1.channel = ch-msg1
+    sink的上游channel名称
+
+    agent1.sinks.meta-sink-more1.type = org.apache.flume.sink.MetaSink
+    sink类实现,此处实现消息向tube集群推送数据
+
+    agent1.sinks.meta-sink-more1.master-host-port-list = 
sk-tegspider-tube-master-1:8609,sk-tegspider-tube-master-2:8609
+    tube集群master节点列表
+
+    agent1.sinks.meta-sink-more1.send_timeout = 30000
+    发送到tube时超时时间限制
+
+    agent1.sinks.meta-sink-more1.stat-interval-sec = 60
+    sink指标统计间隔时间,单位秒
+
+    agent1.sinks.meta-sink-more1.thread-num = 8
+    Sink类发送消息的工作线程,8表示启动8个并发线程
+
+    agent1.sinks.meta-sink-more1.client-id-cache = true
+    agent id缓存,用于检查agent上报数据去重
+
+    agent1.sinks.meta-sink-more1.max-survived-time = 300000
+    缓存最大时间
+    
+    agent1.sinks.meta-sink-more1.max-survived-size = 3000000
+    缓存最大个数
diff --git a/docs/zh-cn/modules/dataproxy/img/architecture.png 
b/docs/zh-cn/modules/dataproxy/img/architecture.png
new file mode 100644
index 0000000..bc46026
Binary files /dev/null and b/docs/zh-cn/modules/dataproxy/img/architecture.png 
differ
diff --git a/en-us/docs/modules/dataproxy/architecture.html 
b/en-us/docs/modules/dataproxy/architecture.html
index 31e70cd..fd8a7ea 100644
--- a/en-us/docs/modules/dataproxy/architecture.html
+++ b/en-us/docs/modules/dataproxy/architecture.html
@@ -13,14 +13,144 @@
 </head>
 <body>
        <div id="root"><div class="documentation-page" 
data-reactroot=""><header class="header-container header-container-normal"><div 
class="header-body"><a href="/en-us/index.html"><a href="//www.apache.org"><img 
class="logo apache" style="width:120px" src="/img/asf_logo.svg"/></a><div 
class="logo-split"></div><img class="logo tube" 
style="width:120px;top:12px;position:absolute" src="/img/Tube 
logo.svg"/></a><div class="search search-normal"><span 
class="icon-search"></span></div><span class= [...]
-<pre><code>Inlong-bus belongs to the inlong proxy layer and is used for data 
collection, reception and forwarding. Through format conversion, the data is 
converted into TDMsg1 format that can be cached and processed by the cache layer
-The overall architecture of inlong-bus is based on Apache Flume. On the basis 
of this project, inlong-bus expands the source layer and sink layer, and 
optimizes disaster tolerance forwarding, which improves the stability of the 
system.
+<pre><code>Inlong-dataProxy belongs to the inlong proxy layer and is used for 
data collection, reception and forwarding. Through format conversion, the data 
is converted into TDMsg1 format that can be cached and processed by the cache 
layer
+InLong-dataProxy acts as a bridge from the InLong collection end to the InLong 
buffer end. Dataproxy pulls the relationship between the business id and the 
corresponding topic name from the manager module, and internally manages the 
producers of multiple topics
+The overall architecture of inlong-dataproxy is based on Apache Flume. On the 
basis of this project, inlong-bus expands the source layer and sink layer, and 
optimizes disaster tolerance forwarding, which improves the stability of the 
system.
 </code></pre>
 <h1>2、architecture</h1>
+<p><img src="img/architecture.png" alt=""></p>
 <pre><code>1. The source layer opens port monitoring, which is realized 
through netty server. The decoded data is sent to the channel layer
 2. The channel layer has a selector, which is used to choose which type of 
channel to go. If the memory is eventually full, the data will be processed.
 3. The data of the channel layer will be forwarded through the sink layer. The 
main purpose here is to convert the data to the TDMsg1 format and push it to 
the cache layer (tube is more commonly used here)
 </code></pre>
+<h1>3、DataProxy support configuration instructions</h1>
+<p>DataProxy supports configurable source-channel-sink, and the configuration 
method is the same as the configuration file structure of flume:</p>
+<p>Source configuration example and corresponding notes:</p>
+<pre><code>agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 
ch-more1 ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 
ch-transfer ch -Back
+Define the channel used in the source. Note that if the configuration below 
this source uses the channel, it needs to be annotated here
+
+agent1.sources.tcp-source.type = org.apache.flume.source.SimpleTcpSource
+tcp resolution type definition, here provide the class name for instantiation, 
SimpleTcpSource is mainly to initialize the configuration and start port 
monitoring
+
+agent1.sources.tcp-source.msg-factory-name = 
org.apache.flume.source.ServerMessageFactory
+Handler used for message structure analysis, and set read stream handler and 
write stream handler, netty concept, please refer to: 
https://blog.csdn.net/u013252773/article/details/21195593
+
+
+agent1.sources.tcp-source.host = 0.0.0.0
+tcp ip binding monitoring, binding all network cards by default
+
+agent1.sources.tcp-source.port = 46801
+tcp port binding, port 46801 is bound by default
+
+agent1.sources.tcp-source.highWaterMark=2621440
+The concept of netty, set the netty high water level value
+
+agent1.sources.tcp-source.enableExceptionReturn=true
+The new function of v1.7 version, optional, the default is false, used to open 
the exception channel, when an exception occurs, the data is written to the 
exception channel to prevent other normal data transmission (the open source 
version does not add this function), Details: Increase the local disk of 
abnormal data landing
+
+agent1.sources.tcp-source.max-msg-length = 524288
+Limit the size of a single package, here if the compressed package is 
transmitted, it is the compressed package size, the limit is 512KB
+
+agent1.sources.tcp-source.topic = test_token
+The default topic value, if the mapping relationship between bid and topic 
cannot be found, it will be sent to this topic
+
+agent1.sources.tcp-source.attr = m=9
+The default value of m is set, where the value of m is the version of inlong's 
internal TdMsg protocol
+
+agent1.sources.tcp-source.connections = 5000
+Concurrent connections go online, new connections will be broken when the 
upper limit is exceeded
+
+agent1.sources.tcp-source.max-threads = 64
+Netty thread pool work thread upper limit, generally recommended to choose 
twice the cpu
+
+agent1.sources.tcp-source.receiveBufferSize = 524288
+Netty server tcp tuning parameters, please refer to: 
https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+agent1.sources.tcp-source.sendBufferSize = 524288
+Netty server tcp tuning parameters, please refer to: 
https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+agent1.sources.tcp-source.custom-cp = true
+Whether to use the self-developed channel process, the self-developed channel 
process can select the alternate channel to send when the main channel is 
blocked
+
+agent1.sources.tcp-source.selector.type = 
org.apache.flume.channel.FailoverChannelSelector
+This channel selector is a self-developed channel selector, which is not much 
different from the official website, mainly because of the channel master-slave 
selection logic
+
+agent1.sources.tcp-source.selector.master = ch-msg5 ch-msg6 ch-msg7 ch-msg8 
ch-msg9
+Specify the master channel, these channels will be preferentially selected for 
data push. Those channels that are not in the master, transfer, fileMetric, and 
slaMetric configuration items, but are in
+There are defined channels in channels, which are all classified as slave 
channels. When the master channel is full, the slave channel will be selected. 
Generally, the file channel type is recommended for the slave channel.
+
+agent1.sources.tcp-source.selector.transfer = ch-msg5 ch-msg6 ch-msg7 ch-msg8 
ch-msg9
+Specify the transfer channel to accept the transfer type data. The transfer 
here generally refers to the data pushed to the non-tube cluster, which is only 
for forwarding, and it is reserved for subsequent functions.
+
+agent1.sources.tcp-source.selector.fileMetric = ch-back
+Specify the fileMetric channel to receive the metric data reported by the agent
+</code></pre>
+<p>Channel configuration examples and corresponding annotations</p>
+<p>memory channel</p>
+<pre><code>agent1.channels.ch-more1.type = memory
+memory channel type
+
+agent1.channels.ch-more1.capacity = 10000000
+Memory channel queue size, the maximum number of messages that can be cached
+
+agent1.channels.ch-more1.keep-alive = 0
+
+agent1.channels.ch-more1.transactionCapacity = 20
+The maximum number of batches are processed in atomic operations, and the 
memory channel needs to be locked when used, so there will be a batch process 
to increase efficiency
+</code></pre>
+<p>file channel</p>
+<pre><code>agent1.channels.ch-msg5.type = file
+file channel type
+
+agent1.channels.ch-msg5.capacity = 100000000
+The maximum number of messages that can be cached in a file channel
+
+agent1.channels.ch-msg5.maxFileSize = 1073741824
+file channel file maximum limit, the number of bytes
+
+agent1.channels.ch-msg5.minimumRequiredSpace = 1073741824
+The minimum free space of the disk where the file channel is located. Setting 
this value can prevent the disk from being full
+
+agent1.channels.ch-msg5.checkpointDir = /data/work/file/ch-msg5/check
+file channel checkpoint path
+
+agent1.channels.ch-msg5.dataDirs = /data/work/file/ch-msg5/data
+file channel data path
+
+agent1.channels.ch-msg5.fsyncPerTransaction = false
+Whether to synchronize the disk for each atomic operation, it is recommended 
to change it to false, otherwise it will affect the performance
+
+agent1.channels.ch-msg5.fsyncInterval = 5
+The time interval between data flush from memory to disk, in seconds
+</code></pre>
+<p>Sink configuration example and corresponding notes</p>
+<pre><code>agent1.sinks.meta-sink-more1.channel = ch-msg1
+The upstream channel name of the sink
+
+agent1.sinks.meta-sink-more1.type = org.apache.flume.sink.MetaSink
+The sink class is implemented, where the message is implemented to push data 
to the tube cluster
+
+agent1.sinks.meta-sink-more1.master-host-port-list = 
sk-tegspider-tube-master-1:8609,sk-tegspider-tube-master-2:8609
+Tube cluster master node list
+
+agent1.sinks.meta-sink-more1.send_timeout = 30000
+Timeout limit when sending to tube
+
+agent1.sinks.meta-sink-more1.stat-interval-sec = 60
+Sink indicator statistics interval time, in seconds
+
+agent1.sinks.meta-sink-more1.thread-num = 8
+Sink class sends messages to the worker thread, 8 means to start 8 concurrent 
threads
+
+agent1.sinks.meta-sink-more1.client-id-cache = true
+agent id cache, used to check the data reported by the agent to remove 
duplicates
+
+agent1.sinks.meta-sink-more1.max-survived-time = 300000
+Maximum cache time
+
+agent1.sinks.meta-sink-more1.max-survived-size = 3000000
+Maximum number of caches
+</code></pre>
 </div></section><footer class="footer-container"><div class="footer-body"><img 
src="/img/incubator-logo.svg"/><div class="cols-container"><div class="col 
col-24"><p>Apache InLong (incubating) is an effort undergoing incubation at The 
Apache Software Foundation (ASF), sponsored by Incubator. Incubation is 
required of all newly accepted projects until a further review indicates that 
the infrastructure, communications, and decision making process have stabilized 
in a manner consistent with  [...]
        <script 
src="https://f.alicdn.com/react/15.4.1/react-with-addons.min.js";></script>
        <script 
src="https://f.alicdn.com/react/15.4.1/react-dom.min.js";></script>
diff --git a/en-us/docs/modules/dataproxy/architecture.json 
b/en-us/docs/modules/dataproxy/architecture.json
index fde2db6..78c155f 100644
--- a/en-us/docs/modules/dataproxy/architecture.json
+++ b/en-us/docs/modules/dataproxy/architecture.json
@@ -1,6 +1,6 @@
 {
   "filename": "architecture.md",
-  "__html": "<h1>1、intro</h1>\n<pre><code>Inlong-bus belongs to the inlong 
proxy layer and is used for data collection, reception and forwarding. Through 
format conversion, the data is converted into TDMsg1 format that can be cached 
and processed by the cache layer\nThe overall architecture of inlong-bus is 
based on Apache Flume. On the basis of this project, inlong-bus expands the 
source layer and sink layer, and optimizes disaster tolerance forwarding, which 
improves the stability of t [...]
+  "__html": "<h1>1、intro</h1>\n<pre><code>Inlong-dataProxy belongs to the 
inlong proxy layer and is used for data collection, reception and forwarding. 
Through format conversion, the data is converted into TDMsg1 format that can be 
cached and processed by the cache layer\nInLong-dataProxy acts as a bridge from 
the InLong collection end to the InLong buffer end. Dataproxy pulls the 
relationship between the business id and the corresponding topic name from the 
manager module, and internall [...]
   "link": "/en-us/docs/modules/dataproxy/architecture.html",
   "meta": {}
 }
\ No newline at end of file
diff --git a/en-us/docs/modules/dataproxy/architecture.md 
b/en-us/docs/modules/dataproxy/architecture.md
index 2fe2abf..8e6d2ef 100644
--- a/en-us/docs/modules/dataproxy/architecture.md
+++ b/en-us/docs/modules/dataproxy/architecture.md
@@ -1,11 +1,150 @@
 # 1、intro
 
-    Inlong-bus belongs to the inlong proxy layer and is used for data 
collection, reception and forwarding. Through format conversion, the data is 
converted into TDMsg1 format that can be cached and processed by the cache layer
-    The overall architecture of inlong-bus is based on Apache Flume. On the 
basis of this project, inlong-bus expands the source layer and sink layer, and 
optimizes disaster tolerance forwarding, which improves the stability of the 
system.
+    Inlong-dataProxy belongs to the inlong proxy layer and is used for data 
collection, reception and forwarding. Through format conversion, the data is 
converted into TDMsg1 format that can be cached and processed by the cache layer
+    InLong-dataProxy acts as a bridge from the InLong collection end to the 
InLong buffer end. Dataproxy pulls the relationship between the business id and 
the corresponding topic name from the manager module, and internally manages 
the producers of multiple topics
+    The overall architecture of inlong-dataproxy is based on Apache Flume. On 
the basis of this project, inlong-bus expands the source layer and sink layer, 
and optimizes disaster tolerance forwarding, which improves the stability of 
the system.
 
 
 # 2、architecture
 
+![](img/architecture.png)
+
        1. The source layer opens port monitoring, which is realized through 
netty server. The decoded data is sent to the channel layer
        2. The channel layer has a selector, which is used to choose which type 
of channel to go. If the memory is eventually full, the data will be processed.
        3. The data of the channel layer will be forwarded through the sink 
layer. The main purpose here is to convert the data to the TDMsg1 format and 
push it to the cache layer (tube is more commonly used here)
+
+# 3、DataProxy support configuration instructions
+
+DataProxy supports configurable source-channel-sink, and the configuration 
method is the same as the configuration file structure of flume:
+
+Source configuration example and corresponding notes:
+
+    agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 ch-more1 
ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 ch-transfer 
ch -Back
+    Define the channel used in the source. Note that if the configuration 
below this source uses the channel, it needs to be annotated here
+
+    agent1.sources.tcp-source.type = org.apache.flume.source.SimpleTcpSource
+    tcp resolution type definition, here provide the class name for 
instantiation, SimpleTcpSource is mainly to initialize the configuration and 
start port monitoring
+
+    agent1.sources.tcp-source.msg-factory-name = 
org.apache.flume.source.ServerMessageFactory
+    Handler used for message structure analysis, and set read stream handler 
and write stream handler, netty concept, please refer to: 
https://blog.csdn.net/u013252773/article/details/21195593
+
+
+    agent1.sources.tcp-source.host = 0.0.0.0
+    tcp ip binding monitoring, binding all network cards by default
+
+    agent1.sources.tcp-source.port = 46801
+    tcp port binding, port 46801 is bound by default
+
+    agent1.sources.tcp-source.highWaterMark=2621440
+    The concept of netty, set the netty high water level value
+
+    agent1.sources.tcp-source.enableExceptionReturn=true
+    The new function of v1.7 version, optional, the default is false, used to 
open the exception channel, when an exception occurs, the data is written to 
the exception channel to prevent other normal data transmission (the open 
source version does not add this function), Details: Increase the local disk of 
abnormal data landing
+
+    agent1.sources.tcp-source.max-msg-length = 524288
+    Limit the size of a single package, here if the compressed package is 
transmitted, it is the compressed package size, the limit is 512KB
+
+    agent1.sources.tcp-source.topic = test_token
+    The default topic value, if the mapping relationship between bid and topic 
cannot be found, it will be sent to this topic
+
+    agent1.sources.tcp-source.attr = m=9
+    The default value of m is set, where the value of m is the version of 
inlong's internal TdMsg protocol
+
+    agent1.sources.tcp-source.connections = 5000
+    Concurrent connections go online, new connections will be broken when the 
upper limit is exceeded
+
+    agent1.sources.tcp-source.max-threads = 64
+    Netty thread pool work thread upper limit, generally recommended to choose 
twice the cpu
+
+    agent1.sources.tcp-source.receiveBufferSize = 524288
+    Netty server tcp tuning parameters, please refer to: 
https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+    
+    agent1.sources.tcp-source.sendBufferSize = 524288
+    Netty server tcp tuning parameters, please refer to: 
https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+    agent1.sources.tcp-source.custom-cp = true
+    Whether to use the self-developed channel process, the self-developed 
channel process can select the alternate channel to send when the main channel 
is blocked
+
+    agent1.sources.tcp-source.selector.type = 
org.apache.flume.channel.FailoverChannelSelector
+    This channel selector is a self-developed channel selector, which is not 
much different from the official website, mainly because of the channel 
master-slave selection logic
+
+    agent1.sources.tcp-source.selector.master = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    Specify the master channel, these channels will be preferentially selected 
for data push. Those channels that are not in the master, transfer, fileMetric, 
and slaMetric configuration items, but are in
+    There are defined channels in channels, which are all classified as slave 
channels. When the master channel is full, the slave channel will be selected. 
Generally, the file channel type is recommended for the slave channel.
+
+    agent1.sources.tcp-source.selector.transfer = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    Specify the transfer channel to accept the transfer type data. The 
transfer here generally refers to the data pushed to the non-tube cluster, 
which is only for forwarding, and it is reserved for subsequent functions.
+
+    agent1.sources.tcp-source.selector.fileMetric = ch-back
+    Specify the fileMetric channel to receive the metric data reported by the 
agent
+
+
+Channel configuration examples and corresponding annotations
+
+memory channel
+
+    agent1.channels.ch-more1.type = memory
+    memory channel type
+
+    agent1.channels.ch-more1.capacity = 10000000
+    Memory channel queue size, the maximum number of messages that can be 
cached
+
+    agent1.channels.ch-more1.keep-alive = 0
+    
+    agent1.channels.ch-more1.transactionCapacity = 20
+    The maximum number of batches are processed in atomic operations, and the 
memory channel needs to be locked when used, so there will be a batch process 
to increase efficiency
+
+file channel
+
+    agent1.channels.ch-msg5.type = file
+    file channel type
+
+    agent1.channels.ch-msg5.capacity = 100000000
+    The maximum number of messages that can be cached in a file channel
+
+    agent1.channels.ch-msg5.maxFileSize = 1073741824
+    file channel file maximum limit, the number of bytes
+
+    agent1.channels.ch-msg5.minimumRequiredSpace = 1073741824
+    The minimum free space of the disk where the file channel is located. 
Setting this value can prevent the disk from being full
+
+    agent1.channels.ch-msg5.checkpointDir = /data/work/file/ch-msg5/check
+    file channel checkpoint path
+
+    agent1.channels.ch-msg5.dataDirs = /data/work/file/ch-msg5/data
+    file channel data path
+
+    agent1.channels.ch-msg5.fsyncPerTransaction = false
+    Whether to synchronize the disk for each atomic operation, it is 
recommended to change it to false, otherwise it will affect the performance
+
+    agent1.channels.ch-msg5.fsyncInterval = 5
+    The time interval between data flush from memory to disk, in seconds
+
+Sink configuration example and corresponding notes
+
+    agent1.sinks.meta-sink-more1.channel = ch-msg1
+    The upstream channel name of the sink
+
+    agent1.sinks.meta-sink-more1.type = org.apache.flume.sink.MetaSink
+    The sink class is implemented, where the message is implemented to push 
data to the tube cluster
+
+    agent1.sinks.meta-sink-more1.master-host-port-list = 
sk-tegspider-tube-master-1:8609,sk-tegspider-tube-master-2:8609
+    Tube cluster master node list
+
+    agent1.sinks.meta-sink-more1.send_timeout = 30000
+    Timeout limit when sending to tube
+
+    agent1.sinks.meta-sink-more1.stat-interval-sec = 60
+    Sink indicator statistics interval time, in seconds
+
+    agent1.sinks.meta-sink-more1.thread-num = 8
+    Sink class sends messages to the worker thread, 8 means to start 8 
concurrent threads
+
+    agent1.sinks.meta-sink-more1.client-id-cache = true
+    agent id cache, used to check the data reported by the agent to remove 
duplicates
+
+    agent1.sinks.meta-sink-more1.max-survived-time = 300000
+    Maximum cache time
+    
+    agent1.sinks.meta-sink-more1.max-survived-size = 3000000
+    Maximum number of caches
diff --git a/en-us/docs/modules/dataproxy/img/architecture.png 
b/en-us/docs/modules/dataproxy/img/architecture.png
new file mode 100644
index 0000000..bc46026
Binary files /dev/null and b/en-us/docs/modules/dataproxy/img/architecture.png 
differ
diff --git a/zh-cn/docs/modules/dataproxy/architecture.html 
b/zh-cn/docs/modules/dataproxy/architecture.html
index 7237b96..42d268d 100644
--- a/zh-cn/docs/modules/dataproxy/architecture.html
+++ b/zh-cn/docs/modules/dataproxy/architecture.html
@@ -13,13 +13,141 @@
 </head>
 <body>
        <div id="root"><div class="documentation-page" 
data-reactroot=""><header class="header-container header-container-normal"><div 
class="header-body"><a href="/zh-cn/index.html"><a href="//www.apache.org"><img 
class="logo apache" style="width:120px" src="/img/asf_logo.svg"/></a><div 
class="logo-split"></div><img class="logo tube" 
style="width:120px;top:12px;position:absolute" src="/img/Tube 
logo.svg"/></a><div class="search search-normal"><span 
class="icon-search"></span></div><span class= [...]
-<pre><code>inlong-bus属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
-inlong-bus整体架构基于Apache 
Flume。inlong-bus在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
+<pre><code>InLong-dataProxy属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
+InLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者
+当dataproxy收到消息时,会首先缓存到本地的Channel中,并使用本地的producer往后端即cache层发送数据
+InLong-dataProxy整体架构基于Apache 
Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
 </code></pre>
 <h1>二、架构</h1>
+<p><img src="img/architecture.png" alt=""></p>
 <pre><code>1.Source层开启端口监听,通过netty server实现。解码之后的数据发到channel层
 2.channel层有一个selector,用于选择走哪种类型的channel,如果memory最终满了,会对数据做落地处理
-3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)</code></pre>
+3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
+</code></pre>
+<h1>三、DataProxy功能配置说明</h1>
+<p>DataProxy支持配置化的source-channel-sink,配置方式与flume的配置文件结构相同:</p>
+<p>Source配置示例以及对应的注解:</p>
+<pre><code>agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 
ch-more1 ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 
ch-transfer ch-back
+定义source中使用到的channel,注意此source下面的配置如果有使用到channel,均需要在此注释
+
+agent1.sources.tcp-source.type = org.apache.flume.source.SimpleTcpSource
+tcp解析类型定义,这里提供类名用于实例化,SimpleTcpSource主要是初始化配置并启动端口监听
+
+agent1.sources.tcp-source.msg-factory-name = 
org.apache.flume.source.ServerMessageFactory
+用于构造消息解析的handler,并设置read stream handler和write stream 
handler,netty概念,具体可参考:https://blog.csdn.net/u013252773/article/details/21195593
+
+agent1.sources.tcp-source.host = 0.0.0.0    
+tcp ip绑定监听,默认绑定所有网卡
+
+agent1.sources.tcp-source.port = 46801
+tcp 端口绑定,默认绑定46801端口
+
+agent1.sources.tcp-source.highWaterMark=2621440 
+netty概念,设置netty高水位值,高水位详细解答可参考:https://www.jianshu.com/p/c9e9a5a11943
+
+agent1.sources.tcp-source.max-msg-length = 524288
+限制单个包大小,这里如果传输的是压缩包,则是压缩包大小,限制512KB
+
+agent1.sources.tcp-source.topic = test_token
+默认topic值,如果bid和topic的映射关系找不到,则发送到此topic中
+
+agent1.sources.tcp-source.attr = m=9
+默认m值设置,这里的m值是inlong内部TdMsg协议的版本
+
+agent1.sources.tcp-source.connections = 5000
+并发连接上线,超过上限值时会对新连接做断链处理
+
+agent1.sources.tcp-source.max-threads = 64
+netty线程池工作线程上限,一般推荐选择cpu的两倍
+
+agent1.sources.tcp-source.receiveBufferSize = 524288
+netty server 
tcp调优参数,具体解释可参考:https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+agent1.sources.tcp-source.sendBufferSize = 524288
+netty server 
tcp调优参数,具体解释可参考:https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+agent1.sources.tcp-source.custom-cp = true
+是否使用自研的channel process,自研channel process可在主channel阻塞时,选择备用channel发送
+
+agent1.sources.tcp-source.selector.type = 
org.apache.flume.channel.FailoverChannelSelector
+这个channel selector就是自研的channel selector,和官网的差别不大,主要是有channel主从选择逻辑
+
+agent1.sources.tcp-source.selector.master = ch-msg5 ch-msg6 ch-msg7 ch-msg8 
ch-msg9
+指定master 
channel,这些channel会被优先选择用于数据推送。那些不在master、transfer、fileMetric、slaMetric配置项里的channel,但在
+channels里面有定义的channel,统归为slave channel,当master channel都被占满时,就会选择使用slave 
channel,slave channel一般建议使用file channel类型
+
+agent1.sources.tcp-source.selector.transfer = ch-msg5 ch-msg6 ch-msg7 ch-msg8 
ch-msg9
+指定transfer 
channel,承接transfer类型的数据,这里的transfer一般是指推送到非tube集群的数据,仅做转发,这里预留出来供后续功能使用
+
+agent1.sources.tcp-source.selector.fileMetric = ch-back
+指定fileMetric channel,用于接收agent上报的指标数据
+</code></pre>
+<p>Channel配置示例以及对应的注解</p>
+<p>memory channel</p>
+<pre><code>agent1.channels.ch-more1.type = memory
+memory channel类型
+
+agent1.channels.ch-more1.capacity = 10000000
+memory channel 队列大小,可缓存最大消息条数
+
+agent1.channels.ch-more1.keep-alive = 0
+
+agent1.channels.ch-more1.transactionCapacity = 20
+原子操作时批量处理最大条数,memory channel使用时需要用到加锁,因此会有批处理流程增加效率
+</code></pre>
+<p>file channel</p>
+<pre><code>agent1.channels.ch-msg5.type = file
+file channel类型
+
+agent1.channels.ch-msg5.capacity = 100000000
+file channel最大可缓存消息条数
+
+agent1.channels.ch-msg5.maxFileSize = 1073741824
+file channel文件最大上限,字节数
+
+agent1.channels.ch-msg5.minimumRequiredSpace = 1073741824
+file channel所在磁盘最小可用空间,设置此值可以防止磁盘写满
+
+agent1.channels.ch-msg5.checkpointDir = /data/work/file/ch-msg5/check
+file channel checkpoint路径
+
+agent1.channels.ch-msg5.dataDirs = /data/work/file/ch-msg5/data
+file channel数据路径
+
+agent1.channels.ch-msg5.fsyncPerTransaction = false
+是否对每个原子操作做同步磁盘,建议改false,否则会对性能有影响
+
+agent1.channels.ch-msg5.fsyncInterval = 5
+数据从内存flush到磁盘的时间间隔,单位秒
+</code></pre>
+<p>Sink配置示例以及对应的注解</p>
+<pre><code>agent1.sinks.meta-sink-more1.channel = ch-msg1
+sink的上游channel名称
+
+agent1.sinks.meta-sink-more1.type = org.apache.flume.sink.MetaSink
+sink类实现,此处实现消息向tube集群推送数据
+
+agent1.sinks.meta-sink-more1.master-host-port-list = 
sk-tegspider-tube-master-1:8609,sk-tegspider-tube-master-2:8609
+tube集群master节点列表
+
+agent1.sinks.meta-sink-more1.send_timeout = 30000
+发送到tube时超时时间限制
+
+agent1.sinks.meta-sink-more1.stat-interval-sec = 60
+sink指标统计间隔时间,单位秒
+
+agent1.sinks.meta-sink-more1.thread-num = 8
+Sink类发送消息的工作线程,8表示启动8个并发线程
+
+agent1.sinks.meta-sink-more1.client-id-cache = true
+agent id缓存,用于检查agent上报数据去重
+
+agent1.sinks.meta-sink-more1.max-survived-time = 300000
+缓存最大时间
+
+agent1.sinks.meta-sink-more1.max-survived-size = 3000000
+缓存最大个数
+</code></pre>
 </div></section><footer class="footer-container"><div class="footer-body"><img 
src="/img/incubator-logo.svg"/><div class="cols-container"><div class="col 
col-24"><p>Apache InLong (incubating) is an effort undergoing incubation at The 
Apache Software Foundation (ASF), sponsored by Incubator. Incubation is 
required of all newly accepted projects until a further review indicates that 
the infrastructure, communications, and decision making process have stabilized 
in a manner consistent with  [...]
        <script 
src="https://f.alicdn.com/react/15.4.1/react-with-addons.min.js";></script>
        <script 
src="https://f.alicdn.com/react/15.4.1/react-dom.min.js";></script>
diff --git a/zh-cn/docs/modules/dataproxy/architecture.json 
b/zh-cn/docs/modules/dataproxy/architecture.json
index 9016eef..ebb7463 100644
--- a/zh-cn/docs/modules/dataproxy/architecture.json
+++ b/zh-cn/docs/modules/dataproxy/architecture.json
@@ -1,6 +1,6 @@
 {
   "filename": "architecture.md",
-  "__html": "<h1>一、说明</h1>\n<pre><code>inlong-bus属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式\ninlong-bus整体架构基于Apache 
Flume。inlong-bus在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。\n</code></pre>\n<h1>二、架构</h1>\n<pre><code>1.Source层开启端口监听,通过netty
 
server实现。解码之后的数据发到channel层\n2.channel层有一个selector,用于选择走哪种类型的channel,如果memory最终满了,会对数据做落地处理\n3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)</code></pre>\n",
+  "__html": "<h1>一、说明</h1>\n<pre><code>InLong-dataProxy属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式\nInLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者\n当dataproxy收到消息时,会首先缓存到本地的Channel中,并使用本地的producer往后端即cache层发送数据\nInLong-dataProxy整体架构基于Apache
 
Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。\n</code></pre>\n<h1>二、架构</h1>\n<p><img
 src=\"img/architecture.png\" alt=\"\"></p>\n<pre><code>1.Source层开启端口监听 [...]
   "link": "/zh-cn/docs/modules/dataproxy/architecture.html",
   "meta": {}
 }
\ No newline at end of file
diff --git a/zh-cn/docs/modules/dataproxy/architecture.md 
b/zh-cn/docs/modules/dataproxy/architecture.md
index c3a6901..7cf7770 100644
--- a/zh-cn/docs/modules/dataproxy/architecture.md
+++ b/zh-cn/docs/modules/dataproxy/architecture.md
@@ -1,11 +1,147 @@
 # 一、说明
 
-       inlong-bus属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
-       inlong-bus整体架构基于Apache 
Flume。inlong-bus在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
-
-
+       InLong-dataProxy属于inlong 
proxy层,用于数据的汇集接收以及转发。通过格式转换,将数据转为cache层可以缓存处理的TDMsg1格式
+    
InLong-dataProxy充当了InLong采集端到InLong缓冲端的桥梁,dataproxy从manager模块拉取业务id与对应topic名称的关系,内部管理多个topic的生产者
+       当dataproxy收到消息时,会首先缓存到本地的Channel中,并使用本地的producer往后端即cache层发送数据
+    InLong-dataProxy整体架构基于Apache 
Flume。inlong-dataproxy在该项目的基础上,扩展了source层和sink层,并对容灾转发做了优化处理,提升了系统的稳定性。
+    
+    
 # 二、架构
 
+![](img/architecture.png)
+
        1.Source层开启端口监听,通过netty server实现。解码之后的数据发到channel层
        2.channel层有一个selector,用于选择走哪种类型的channel,如果memory最终满了,会对数据做落地处理
-       3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
\ No newline at end of file
+       3.channel层的数据会通过sink层做转发,这里主要是将数据转为TDMsg1的格式,并推送到cache层(这里用的比较多的是tube)
+
+
+# 三、DataProxy功能配置说明
+
+DataProxy支持配置化的source-channel-sink,配置方式与flume的配置文件结构相同:
+
+Source配置示例以及对应的注解:
+
+    agent1.sources.tcp-source.channels = ch-msg1 ch-msg2 ch-msg3 ch-more1 
ch-more2 ch-more3 ch-msg5 ch-msg6 ch-msg7 ch-msg8 ch-msg9 ch-msg10 ch-transfer 
ch-back
+    定义source中使用到的channel,注意此source下面的配置如果有使用到channel,均需要在此注释
+
+    agent1.sources.tcp-source.type = org.apache.flume.source.SimpleTcpSource
+    tcp解析类型定义,这里提供类名用于实例化,SimpleTcpSource主要是初始化配置并启动端口监听
+
+    agent1.sources.tcp-source.msg-factory-name = 
org.apache.flume.source.ServerMessageFactory
+    用于构造消息解析的handler,并设置read stream handler和write stream 
handler,netty概念,具体可参考:https://blog.csdn.net/u013252773/article/details/21195593
+
+    agent1.sources.tcp-source.host = 0.0.0.0    
+    tcp ip绑定监听,默认绑定所有网卡
+
+    agent1.sources.tcp-source.port = 46801
+    tcp 端口绑定,默认绑定46801端口
+
+    agent1.sources.tcp-source.highWaterMark=2621440 
+    netty概念,设置netty高水位值,高水位详细解答可参考:https://www.jianshu.com/p/c9e9a5a11943
+
+    agent1.sources.tcp-source.max-msg-length = 524288
+    限制单个包大小,这里如果传输的是压缩包,则是压缩包大小,限制512KB
+
+    agent1.sources.tcp-source.topic = test_token
+    默认topic值,如果bid和topic的映射关系找不到,则发送到此topic中
+
+    agent1.sources.tcp-source.attr = m=9
+    默认m值设置,这里的m值是inlong内部TdMsg协议的版本
+
+    agent1.sources.tcp-source.connections = 5000
+    并发连接上线,超过上限值时会对新连接做断链处理
+
+    agent1.sources.tcp-source.max-threads = 64
+    netty线程池工作线程上限,一般推荐选择cpu的两倍
+
+    agent1.sources.tcp-source.receiveBufferSize = 524288
+    netty server 
tcp调优参数,具体解释可参考:https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+    
+    agent1.sources.tcp-source.sendBufferSize = 524288
+    netty server 
tcp调优参数,具体解释可参考:https://stackoverflow.com/questions/4257410/what-are-so-sndbuf-and-so-rcvbuf
+
+    agent1.sources.tcp-source.custom-cp = true
+    是否使用自研的channel process,自研channel process可在主channel阻塞时,选择备用channel发送
+
+    agent1.sources.tcp-source.selector.type = 
org.apache.flume.channel.FailoverChannelSelector
+    这个channel selector就是自研的channel selector,和官网的差别不大,主要是有channel主从选择逻辑
+
+    agent1.sources.tcp-source.selector.master = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    指定master 
channel,这些channel会被优先选择用于数据推送。那些不在master、transfer、fileMetric、slaMetric配置项里的channel,但在
+    channels里面有定义的channel,统归为slave channel,当master channel都被占满时,就会选择使用slave 
channel,slave channel一般建议使用file channel类型
+
+    agent1.sources.tcp-source.selector.transfer = ch-msg5 ch-msg6 ch-msg7 
ch-msg8 ch-msg9
+    指定transfer 
channel,承接transfer类型的数据,这里的transfer一般是指推送到非tube集群的数据,仅做转发,这里预留出来供后续功能使用
+
+    agent1.sources.tcp-source.selector.fileMetric = ch-back
+    指定fileMetric channel,用于接收agent上报的指标数据
+
+Channel配置示例以及对应的注解
+
+memory channel
+
+    agent1.channels.ch-more1.type = memory
+    memory channel类型
+
+    agent1.channels.ch-more1.capacity = 10000000
+    memory channel 队列大小,可缓存最大消息条数
+
+    agent1.channels.ch-more1.keep-alive = 0
+    
+    agent1.channels.ch-more1.transactionCapacity = 20
+    原子操作时批量处理最大条数,memory channel使用时需要用到加锁,因此会有批处理流程增加效率
+
+file channel
+
+    agent1.channels.ch-msg5.type = file
+    file channel类型
+
+    agent1.channels.ch-msg5.capacity = 100000000
+    file channel最大可缓存消息条数
+
+    agent1.channels.ch-msg5.maxFileSize = 1073741824
+    file channel文件最大上限,字节数
+
+    agent1.channels.ch-msg5.minimumRequiredSpace = 1073741824
+    file channel所在磁盘最小可用空间,设置此值可以防止磁盘写满
+
+    agent1.channels.ch-msg5.checkpointDir = /data/work/file/ch-msg5/check
+    file channel checkpoint路径
+
+    agent1.channels.ch-msg5.dataDirs = /data/work/file/ch-msg5/data
+    file channel数据路径
+
+    agent1.channels.ch-msg5.fsyncPerTransaction = false
+    是否对每个原子操作做同步磁盘,建议改false,否则会对性能有影响
+
+    agent1.channels.ch-msg5.fsyncInterval = 5
+    数据从内存flush到磁盘的时间间隔,单位秒
+
+Sink配置示例以及对应的注解
+
+    agent1.sinks.meta-sink-more1.channel = ch-msg1
+    sink的上游channel名称
+
+    agent1.sinks.meta-sink-more1.type = org.apache.flume.sink.MetaSink
+    sink类实现,此处实现消息向tube集群推送数据
+
+    agent1.sinks.meta-sink-more1.master-host-port-list = 
sk-tegspider-tube-master-1:8609,sk-tegspider-tube-master-2:8609
+    tube集群master节点列表
+
+    agent1.sinks.meta-sink-more1.send_timeout = 30000
+    发送到tube时超时时间限制
+
+    agent1.sinks.meta-sink-more1.stat-interval-sec = 60
+    sink指标统计间隔时间,单位秒
+
+    agent1.sinks.meta-sink-more1.thread-num = 8
+    Sink类发送消息的工作线程,8表示启动8个并发线程
+
+    agent1.sinks.meta-sink-more1.client-id-cache = true
+    agent id缓存,用于检查agent上报数据去重
+
+    agent1.sinks.meta-sink-more1.max-survived-time = 300000
+    缓存最大时间
+    
+    agent1.sinks.meta-sink-more1.max-survived-size = 3000000
+    缓存最大个数
diff --git a/zh-cn/docs/modules/dataproxy/img/architecture.png 
b/zh-cn/docs/modules/dataproxy/img/architecture.png
new file mode 100644
index 0000000..bc46026
Binary files /dev/null and b/zh-cn/docs/modules/dataproxy/img/architecture.png 
differ

Reply via email to