This is an automated email from the ASF dual-hosted git repository. dataroaring pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/branch-3.0 by this push: new 32102f792fe [fix](csv reader) fix csv parser incorrect if enclosing line_delimiter (#38347) 32102f792fe is described below commit 32102f792fea424a1aa1c59369dfafe7dd98a291 Author: hui lai <1353307...@qq.com> AuthorDate: Fri Jul 26 10:11:27 2024 +0800 [fix](csv reader) fix csv parser incorrect if enclosing line_delimiter (#38347) Csv reader parse data incorrect when data enclosing line_delimiter, for example, line_delimiter is \n and enclose is ', data as follows: ``` 'aaaaaaaaaaaa bbbb' ``` it will be parsed as two columns: `'aaaaaaaaaaaa` and `bbbb',` rather than one column ``` 'aaaaaaaaaaaa bbbb' ``` The reason why this happened is csv reader will not reset result when not match enclose in this `output_buf_read`, causing incorrect truncation was made. Co-authored-by: Xin Liao <liaoxin...@126.com> --- be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp index 8dce6e589af..75350890aee 100644 --- a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp +++ b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp @@ -160,6 +160,11 @@ void EncloseCsvLineReaderContext::_on_pre_match_enclose(const uint8_t* start, si if (_idx != _total_len) { len = update_reading_bound(start); } else { + // It needs to set the result to nullptr for matching enclose may not be read + // after reading the output buf. + // Therefore, if the result is not set to nullptr, + // the parser will consider reading a line as there is a line delimiter. + _result = nullptr; break; } } while (true); --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org