+1 For this idea. I need it also. Regards, Dongjin
On Fri, Dec 9, 2016 at 8:59 AM, Dongjoon Hyun <dongj...@apache.org> wrote: > Hi, All. > > Could you give me some opinion? > > There is an old SPARK issue, SPARK-11374, about removing header lines from > text file. > Currently, Spark supports removing CSV header lines by the following way. > > ``` > scala> spark.read.option("header","true").csv("/data").show > +---+---+ > | c1| c2| > +---+---+ > | 1| a| > | 2| b| > +---+---+ > ``` > > In SQL world, we can support that like the Hive way, > `skip.header.line.count`. > > ``` > scala> sql("CREATE TABLE t1 (id INT, value VARCHAR(10)) ROW FORMAT > DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/data' > TBLPROPERTIES('skip.header.line.count'='1')") > scala> sql("SELECT * FROM t1").show > +---+-----+ > | id|value| > +---+-----+ > | 1| a| > | 2| b| > +---+-----+ > ``` > > Although I made a PR for this based on the JIRA issue, I want to know this > is really needed feature. > Is it need for your use cases? Or, it's enough for you to remove them in a > preprocessing stage. > If this is too old and not proper in these days, I'll close the PR and > JIRA issue as WON'T FIX. > > Thank you for all in advance! > > Bests, > Dongjoon. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- *Dongjin Lee* *Software developer in Line+.So interested in massive-scale machine learning.facebook: www.facebook.com/dongjin.lee.kr <http://www.facebook.com/dongjin.lee.kr>linkedin: kr.linkedin.com/in/dongjinleekr <http://kr.linkedin.com/in/dongjinleekr>github: <http://goog_969573159/>github.com/dongjinleekr <http://github.com/dongjinleekr>twitter: www.twitter.com/dongjinleekr <http://www.twitter.com/dongjinleekr>*