This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/doris.git

The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 32102f792fe [fix](csv reader) fix csv parser incorrect if enclosing 
line_delimiter (#38347)
32102f792fe is described below

commit 32102f792fea424a1aa1c59369dfafe7dd98a291
Author: hui lai <1353307...@qq.com>
AuthorDate: Fri Jul 26 10:11:27 2024 +0800

    [fix](csv reader) fix csv parser incorrect if enclosing line_delimiter 
(#38347)
    
    Csv reader parse data incorrect when data enclosing line_delimiter, for
    example, line_delimiter is \n and enclose is ', data as follows:
    ```
    'aaaaaaaaaaaa
    bbbb'
    ```
    it will be parsed as two columns: `'aaaaaaaaaaaa` and `bbbb',` rather
    than one column
    ```
    'aaaaaaaaaaaa
    bbbb'
    ```
    
    The reason why this happened is csv reader will not reset result when
    not match enclose in this `output_buf_read`, causing incorrect
    truncation was made.
    
    Co-authored-by: Xin Liao <liaoxin...@126.com>
---
 be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp 
b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
index 8dce6e589af..75350890aee 100644
--- a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
+++ b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
@@ -160,6 +160,11 @@ void 
EncloseCsvLineReaderContext::_on_pre_match_enclose(const uint8_t* start, si
         if (_idx != _total_len) {
             len = update_reading_bound(start);
         } else {
+            // It needs to set the result to nullptr for matching enclose may 
not be read
+            // after reading the output buf.
+            // Therefore, if the result is not set to nullptr,
+            // the parser will consider reading a line as there is a line 
delimiter.
+            _result = nullptr;
             break;
         }
     } while (true);


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to