Re: Choice of IDE for Spark

2021-10-01 Thread Holden Karau
Personally I like Jupyter notebooks for my interactive work and then once I’ve done my exploration I switch back to emacs with either scala-metals or Python mode. I think the main takeaway is: do what feels best for you, there is no one true way to develop in Spark. On Fri, Oct 1, 2021 at 1:28 AM

Re: Choice of IDE for Spark

2021-10-01 Thread Nicolas Paris
> With IntelliJ you are OK with Spark & Scala. also intelliJ as a nice python plugin that turns it into pycharm. On Thu Sep 30, 2021 at 1:57 PM CEST, Jeff Zhang wrote: > IIRC, you want an IDE for pyspark on yarn ? > > Mich Talebzadeh 于2021年9月30日周四 > 下午7:00写道: > > > Hi, > > > > This may look lik

Re: Trying to hash cross features with mllib

2021-10-01 Thread Sean Owen
Are you looking for https://spark.apache.org/docs/latest/ml-features.html#interaction ? That's the closest built in thing I can think of. Otherwise you can make custom transformations. On Fri, Oct 1, 2021, 8:44 AM David Diebold wrote: > Hello everyone, > > In MLLib, I’m trying to rely essential

Trying to hash cross features with mllib

2021-10-01 Thread David Diebold
Hello everyone, In MLLib, I’m trying to rely essentially on pipelines to create features out of the Titanic dataset, and show-case the power of feature hashing. I want to: - Apply bucketization on some columns (QuantileDiscretizer is fine) - Then I want to cross all my columns

Re: Choice of IDE for Spark

2021-10-01 Thread Mich Talebzadeh
Thanks guys for your comments. I agree with you Florian that opening a terminal say in VSC allows you to run a shell script (an sh file) to submit your spark code, however, this really makes sense if your IDE is running on a Linux host submitting a job to a Kubernetes cluster or YARN cluster. For