Hi Depending on how how you reading the data in the first place, can you simply use the header as header instead of a row?
http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv(scala.collection.Seq) See the header option On Wed, Aug 3, 2016 at 10:14 PM, Carlo.Allocca <carlo.allo...@open.ac.uk> wrote: > Hi All, > > I would like to apply a regression to my data. One of the workflow is the > prepare my data as a JavaRDD<LabeledPoint> starting from a Dataset<Row> > with its header. So, what I did was the following: > > == Step 1: transform the Dataset<Row> into JavaRDD<Row> > JavaRDD<Row> dataPointsWithHeader =modelDS.toJavaRDD(); > > > == Step 2: take the first row (I was thinking that it was the header) > Row header= dataPointsWithHeader.first(); > > == Step 3: eliminate the row header by > JavaRDD<Row> dataPointsWithoutHeader = dataPointsWithHeader.filter((Row > row) -> { > return !row.equals(header); > }); > > The issue with the above approach are: > > a) the result of the Step 2 is not the header row; > b) the application of the Step 3 is very inefficient in case there is a > way to access to the header. > > My question is: > > Is the an efficient way to access to the header and eliminate it ? > > Many Thanks in advance for your help and suggestion. > > Regards, > Carlo > -- The Open University is incorporated by Royal Charter (RC 000391), an > exempt charity in England & Wales and a charity registered in Scotland (SC > 038302). The Open University is authorised and regulated by the Financial > Conduct Authority. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >