2.2.0 is the default version spark uses if a specific version of hadoop is not specified while building it. spark-csv uses spark-packages to "link" with spark. ideally, it would not care about any specific hadoop version. also ideally, spark-csv should not have that hadoop import at all. your workaround may lead to trouble because spark-csv would then include hadoop in its assembly. you would then have duplicate hadoop client code when you use this spark-csv assembly jar in a spark cluster.
On Wed, Aug 19, 2015 at 10:53 PM, Gil Vernik <g...@il.ibm.com> wrote: > It shouldn't? > This one com.databricks.spark.csv.util.TextFile has hadoop imports. > > I figured out that the answer to my question is just to add > libraryDependencies > += "org.apache.hadoop" % "hadoop-client" % "2.6.0". > But i still wonder where is this 2.2.0 default comes from. > > > > From: Mohit Jaggi <mohitja...@gmail.com> > To: Gil Vernik/Haifa/IBM@IBMIL > Cc: Dev <dev@spark.apache.org> > Date: 19/08/2015 21:47 > Subject: Re: [spark-csv] how to build with Hadoop 2.6.0? > ------------------------------ > > > > spark-csv should not depend on hadoop > > On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <*g...@il.ibm.com* > <g...@il.ibm.com>> wrote: > I would like to build spark-csv with Hadoop 2.6.0 > I noticed that when i build it with sbt/sbt ++2.10.4 package it build it > with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository). > > How to define 2.6.0 during spark-csv build? By the way, is it possible to > build spark-csv using maven repository? > > Thanks, > Gil. > > >