Hello,
at first you will need to make sure that JAVA is installed, or
install it otherwise. Then install scala and a build tool (sbt or
maven). In my point of view, IntelliJ IDEA is a good option to create
your Spark applications. At the end you have to install a distributed
file system
Dear list,
I am very new to spark, and I am having trouble installing it on my mac. I have
following questions, please give me some guidance. Thank you very much.
1. How many and what software should I install before installing spark? I have
been searching online, people discussing their expe
Hi,
I am curious how records are being put to task, since, as you may see on
the photo below, there's 1 specific executor that contains more task than
the other.
The setup is this:
- Spark version 2.3.1
- Spark streaming job runs on Spark Standalone with following
configuration:
-
Hi Magnus,
Yes, I was thinking also about partitioning approach. And I think this is
the best solution in this type of scenario.
Also my scenario is relevant to your last paragraph, the dates which are
coming are very random. I can get updated from 2012 and from 2019.
Therefore, this strategy mi
Hi all
sorry, tl;dr
I'm on my first Python Spark structured streaming app, in the end joining
messages from ~10 different Kafka topics. I've recently upgraded to Spark
2.4.3, which has resolved all my issues with the time handling (watermarks,
join windows) I had before with Spark 2.3.2.
My c