Hi all, I’m exporting a table with Hive CLI using hive –f query.hql > file.tsv. The resulting tab separated file won’t read in R because it seems that some of my fields contain the \t separator and that Hive CLI is neither escaping those characters nor putting the chr fields within quotes. The error I get is the following; In include it because it’s very clearly described
Error in fread(input = "/data3/webview_export.tsv", sep = "\t", header = TRUE, : Expecting 54 cols, but line 520137 contains text after processing all cols. It is very likely that this is due to one or more fields having embedded sep=' ' and/or (unescaped) '\n' characters within unbalanced unescaped quotes. fread cannot handle such ambiguous cases and those lines may not have been read in as expected. Please read the section on quotes in ?fread. Is there any clean way to instruct Hive CLI to a) Escape special characters b) Quote fields of type string c) Use ^A as a separator Another error I get when skipping this problematic 520137th line is Read 50.9% of 4260134 rows Error in fread(input = "/data3/webview_export.tsv", sep = "\t", header = TRUE, : embedded nul in string: 'kaufmann\00400618' Again, it looks like this character ain’t escaped properly Maybe using an alternative SerDe could solve that? Thanks for your help! Thomas