Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-22 Thread via GitHub
houseme commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2820310701 > Or fully support utf8? 期待支持全部的utf-8[偷笑] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2820193611 If delimeter is `&[u8]`, I think any kind of delimeter should be able to supported. The only concern is how to design it optimally for all the cases -- This is an automated

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
myrust-go commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2819889539 Or fully support utf8? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
myrust-go commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2819848742 Can it support more character sets? We also need this > > > We need to support delimeter with &[u8] or similar in arrow's csv reader. You only take the first u8 so obvio

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2819773012 > > We need to support delimeter with &[u8] or similar in arrow's csv reader. You only take the first u8 so obviously there is error. > > ``` > > let r = "╦".as_byte

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
guojidan commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2818174774 > We need to support delimeter with &[u8] or similar in arrow's csv reader. You only take the first u8 so obviously there is error. > > ``` > let r = "╦".as_bytes(

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
guojidan commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2818171149 > btw, why do you need this kind of delimiter support? Is converting them to `,` an option for your use-case? in minio test project [mint](https://github.com/minio/mint/

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2818160274 btw, why do you need this kind of delimiter support? Is converting them to `,` an option for your use-case? -- This is an automated message from the Apache Git Service. To

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-21 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2818157270 We need to support delimeter with &[u8] or similar in arrow's csv reader. You only take the first u8 so obviously there is error. ``` let r = "╦".as_bytes();

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-20 Thread via GitHub
guojidan commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2817457411 ``` use std::sync::Arc; use bytes::Bytes; use datafusion::prelude::{CsvReadOptions, SessionContext}; use object_store::{memory::InMemory, path::Path, ObjectStore}

Re: [I] Support non UTF-8 in CSV files [datafusion]

2025-04-18 Thread via GitHub
jayzhan211 commented on issue #15756: URL: https://github.com/apache/datafusion/issues/15756#issuecomment-2815340158 `╦` should be utf8. what error did you get -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab