Thanks Xiangrui. This file already exists w/o escapes. I could probably try to preprocess it and add the escaping.
On Fri, Sep 12, 2014 at 9:38 PM, Xiangrui Meng <men...@gmail.com> wrote: > I wrote an input format for Redshift's tables unloaded UNLOAD the > ESCAPE option: https://github.com/mengxr/redshift-input-format , which > can recognize multi-line records. > > Redshift puts a backslash before any in-record `\\`, `\r`, `\n`, and > the delimiter character. You can apply the same escaping before > calling saveAsTextFIle, then use the input format to load them back. > > Xiangrui > > On Fri, Sep 12, 2014 at 7:43 PM, Mohit Jaggi <mohitja...@gmail.com> wrote: > > Folks, > > I think this might be due to the default TextInputFormat in Hadoop. Any > > pointers to solutions much appreciated. > >>> > > More powerfully, you can define your own InputFormat implementations to > > format the input to your programs however you want. For example, the > default > > TextInputFormat reads lines of text files. The key it emits for each > record > > is the byte offset of the line read (as a LongWritable), and the value is > > the contents of the line up to the terminating '\n' character (as a Text > > object). If you have multi-line records each separated by a $character, > you > > could write your own InputFormat that parses files into records split on > > this character instead. > >>> > > > > Thanks, > > Mohit >