We made some progress to parallelize our python code using beam-spark.
Following your advice, we are using spark 3.2.1
The spark server and worker are connected ok.
In a third machine, the client machine, I am running the docker jobserver:
$ sudo docker run --net=host apache/beam_spark_job_server:
> On 28 Mar 2022, at 20:58, Mihai Alexe wrote:
>
> the jackson runtime dependencies should be updated manually (at least to
> 2.9.2) in case of using Spark 2.x
>
> yes - that is exactly what we are looking to achieve, any hints about how to
> do that? We’re not Java experts. Do you happen to
* the jackson runtime dependencies should be updated manually (at least to
2.9.2) in case of using Spark 2.x
yes - that is exactly what we are looking to achieve, any hints about how to do
that? We’re not Java experts. Do you happen to have a CI recipe or binary lis
for this particular conf
Well, it’s caused by recent jackson's version update in Beam [1] - so, the
jackson runtime dependencies should be updated manually (at least to 2.9.2) in
case of using Spark 2.x.
Either, use Spark 3..x if possible since it already provides jackson jars of
version 2.10.0.
[1]
https://github.
Greetings,
We are setting up an Apache Beam cluster using Spark as a backend to run python
code. This is currently a toy example with 4 virtual machines running Centos (a
client, a spark main, and two spark-workers).
We are running into version issues (detail below) and would need help on which