Hello,

I downloaded spark-2.4.5 source from
https://mirrors.ocf.berkeley.edu/apache/spark/spark-2.4.5/spark-2.4.5.tgz
After extracting it and running:

./dev/make-distribution.sh --name custom-spark --pip --r --tgz
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
-Pkubernetes


It creates a Spark binary distribution named:
spark-2.4.5-bin-custom-spark.tgz

So this file is supposedly a ready-to-distribute Spark binary file like the
one you can download from
http://mirror.metrocast.net/apache/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz

However, one big difference between this custom build and the official
build is that you do not have a LICENSE file in the custom build. I don't
know much about Apache license, but I would suppose a custom build
distribution should have one.

The reason we are missing the file is caused by the following code in
make-distribution.sh:
[image: image.png]

There is no LICENSE-binary file in the official spark-2.4.5.tgz file,
therefore there will be no LICENSE file in your custom build.

I am aware of two pull requests related to this:

https://github.com/apache/spark/pull/22436
started to use LICENSE-binary instead of just the LICENSE.

And
https://github.com/apache/spark/pull/22840
To avoid failure when there is no LICENSE-binary in spark-2.4.5 source
directory.

I think we need to change make-distribution.sh to make sure that the
LICENSE file is copied over to its corresponding custom build distribution.
However, I am not ready to do a pull request, so hopefully we can discuss
it here first.
-- 
Sincerely
Xiangyu Li

<yisky...@gmail.com>

Reply via email to