Personally I like Jupyter notebooks for my interactive work and then once
I’ve done my exploration I switch back to emacs with either scala-metals or
Python mode.
I think the main takeaway is: do what feels best for you, there is no one
true way to develop in Spark.
On Fri, Oct 1, 2021 at 1:28 AM
> With IntelliJ you are OK with Spark & Scala.
also intelliJ as a nice python plugin that turns it into pycharm.
On Thu Sep 30, 2021 at 1:57 PM CEST, Jeff Zhang wrote:
> IIRC, you want an IDE for pyspark on yarn ?
>
> Mich Talebzadeh 于2021年9月30日周四
> 下午7:00写道:
>
> > Hi,
> >
> > This may look lik
Are you looking for
https://spark.apache.org/docs/latest/ml-features.html#interaction ? That's
the closest built in thing I can think of. Otherwise you can make custom
transformations.
On Fri, Oct 1, 2021, 8:44 AM David Diebold wrote:
> Hello everyone,
>
> In MLLib, I’m trying to rely essential
Hello everyone,
In MLLib, I’m trying to rely essentially on pipelines to create features
out of the Titanic dataset, and show-case the power of feature hashing. I
want to:
- Apply bucketization on some columns (QuantileDiscretizer is fine)
- Then I want to cross all my columns
Thanks guys for your comments.
I agree with you Florian that opening a terminal say in VSC allows you to
run a shell script (an sh file) to submit your spark code, however, this
really makes sense if your IDE is running on a Linux host submitting a job
to a Kubernetes cluster or YARN cluster.
For