Re: [spark-csv] how to build with Hadoop 2.6.0?

Mohit Jaggi Thu, 20 Aug 2015 00:42:07 -0700

2.2.0 is the default version spark uses if a specific version of hadoop is
not specified while building it.
spark-csv uses spark-packages to "link" with spark. ideally, it would not
care about any specific hadoop version. also ideally, spark-csv should not
have that hadoop import at all.
your workaround may lead to trouble because spark-csv would then include
hadoop in its assembly. you would then have duplicate hadoop client code
when you use this spark-csv assembly jar in a spark cluster.


On Wed, Aug 19, 2015 at 10:53 PM, Gil Vernik <g...@il.ibm.com> wrote:

> It shouldn't?
> This one com.databricks.spark.csv.util.TextFile has hadoop imports.
>
> I figured out that the answer to my question is just to add 
> libraryDependencies
> += "org.apache.hadoop" % "hadoop-client" % "2.6.0".
> But i still wonder where is this 2.2.0 default comes from.
>
>
>
> From:        Mohit Jaggi <mohitja...@gmail.com>
> To:        Gil Vernik/Haifa/IBM@IBMIL
> Cc:        Dev <dev@spark.apache.org>
> Date:        19/08/2015 21:47
> Subject:        Re: [spark-csv] how to build with Hadoop 2.6.0?
> ------------------------------
>
>
>
> spark-csv should not depend on hadoop
>
> On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <*g...@il.ibm.com*
> <g...@il.ibm.com>> wrote:
> I would like to build spark-csv with Hadoop 2.6.0
> I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
> with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
>
> How to define 2.6.0 during spark-csv build? By the way, is it possible to
> build spark-csv using maven repository?
>
> Thanks,
> Gil.
>
>
>

Re: [spark-csv] how to build with Hadoop 2.6.0?

Reply via email to