I’m told the relevant code is in QuotedLineRecordReader, that's where
CSV/TSV parsing takes place, so you can have a look at what is happening
there.  There’s also an undocumented escape flag there (which we need to
test and document).  Others will probably have more details…. 🙂

On Mon, Jun 10, 2024 at 4:18 PM Mehnaz Tabassum Mahin <
mehnaztabassum.ma...@email.ucr.edu> wrote:

> Hello everyone,
>
> I am trying to load the IMDb dataset in AsterixDB. It seems that some of
> the rows end up with broken escaping and eventually not being inserted at
> all. For example, I used the syntax as follows:
>
> LOAD DATASET movie_companies using localfs (
> ("path"=asterix_nc1://imdb-data/movie-companies.csv),
> ("format"="delimited-text"),("delimiter"=","), ("null"="")
> );
>
> The schema is movie_companies (id: int, movie_id: int, company_id: int,
> company_type_id: int, note: string) and the CSV file contains the following
> row:
>
> 13893, 53192, 1376, 1, "(1986) (USA) (VHS) (included in \"The Best Of
> Alfred Hitchcock, Vol. One\")"
>
> This row ends up not loading at all. The rest of the row with no such
> string input can be loaded successfully.
>
> Any suggestions?
>
> Thanks,
> Mehnaz
>

Reply via email to