Im-Manshushu commented on issue #11258: URL: https://github.com/apache/doris/issues/11258#issuecomment-1201967334
> > Many users put all the canal logs of all tables in the business library into one topic, which needs to be distributed before they can use doris-flink-connector. His idea is to edit a task to synchronize the entire library. Because currently doris-flink-connector uses http inputstream, that is, a checkpoint opens a stream, and a streamLoad url is strongly bound. Therefore, the current doris-flink-connector architecture is not suitable for the entire library synchronization, because it will involve too many http long link. In this case, we can only use the old streamload batch mode: the flink side caches data, then a table generates a buffer, and binds the corresponding table-streamload-url, and sets a threshold, such as rows number or batch size to submit tasks, just like doris-datax-writer. > > However, in the old version of stream load and batch writing, there may be several problems: > > 1. A series of problems caused by the unreasonable setting of the cached batch size: For example, if it is too small, it will cause the -235 problem caused by frequent imports; if the setting is too large, the flink memory will be under pressure. > 2. And does not guarantee exactly-once semantics So in future versions of flink-connector-doris, will this function of dynamically writing doris data tables be added? If so, in which version will it be added? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org