I’m told the relevant code is in QuotedLineRecordReader, that's where CSV/TSV parsing takes place, so you can have a look at what is happening there. There’s also an undocumented escape flag there (which we need to test and document). Others will probably have more details…. 🙂
On Mon, Jun 10, 2024 at 4:18 PM Mehnaz Tabassum Mahin < mehnaztabassum.ma...@email.ucr.edu> wrote: > Hello everyone, > > I am trying to load the IMDb dataset in AsterixDB. It seems that some of > the rows end up with broken escaping and eventually not being inserted at > all. For example, I used the syntax as follows: > > LOAD DATASET movie_companies using localfs ( > ("path"=asterix_nc1://imdb-data/movie-companies.csv), > ("format"="delimited-text"),("delimiter"=","), ("null"="") > ); > > The schema is movie_companies (id: int, movie_id: int, company_id: int, > company_type_id: int, note: string) and the CSV file contains the following > row: > > 13893, 53192, 1376, 1, "(1986) (USA) (VHS) (included in \"The Best Of > Alfred Hitchcock, Vol. One\")" > > This row ends up not loading at all. The rest of the row with no such > string input can be loaded successfully. > > Any suggestions? > > Thanks, > Mehnaz >