We use Jupyter on Hadoop https://jupyterhub-on-hadoop.readthedocs.io/en/latest/ 
for developing spark jobs directly inside the Cluster it should run.




With that you have direct access to yarn and hdfs (fully secured) without any 
migration steps.




You can control the size of your Jupyter yarn container and of course your 
spark session.







Regards,




Christian











\-------- Original-Nachricht --------
Am 2. Okt. 2021, 01:21, Holden Karau schrieb:

>
>
>
> Personally I like Jupyter notebooks for my interactive work and then once 
> I’ve done my exploration I switch back to emacs with either scala-metals or 
> Python mode.
>
>
>
>
> I think the main takeaway is: do what feels best for you, there is no one 
> true way to develop in Spark.
>
>
>
>
> On Fri, Oct 1, 2021 at 1:28 AM Mich Talebzadeh 
> <[mich.talebza...@gmail.com][mich.talebzadeh_gmail.com]> wrote:
>
>
> > Thanks guys for your comments.
> >
> >
> >
> >
> > I agree with you Florian that opening a terminal say in VSC allows you to 
> > run a shell script (an sh file) to submit your spark code, however, this 
> > really makes sense if your IDE is running on a Linux host submitting a job 
> > to a Kubernetes cluster or YARN cluster.
> >
> >
> >
> >
> > For Python, I will go with PyCharm which is specific to the Python world. 
> > With Spark, I have used IntelliJ with Spark plug in on MAC for development 
> > work. Then created a JAR file, gzipped the whole project and scped to an 
> > IBM sandbox, untarred it and ran it with a pre-prepared shell with 
> > environment plugin for dev, test, staging etc.
> >
> >
> >
> >
> > IDE is also useful for looking at csv, tsv type files or creating json from 
> > one form to another. For json validation,especially if the file is too 
> > large, you may have restriction loading the file to web json validator 
> > because of the risk of proprietary data being exposed. There is a tool 
> > called[ jq][jq] (a lightweight and flexible command-line JSON processor), 
> > that comes pretty handy to validate json. Download and install it on OS and 
> > run it as
> >
> >
> >
> >
> > zcat <json\_file>.tgz \| jq
> >
> >
> >
> >
> > That will validate the whole tarred and gzipped json file. Otherwise most 
> > of these IDE tools come with add-on plugins, for various needs. My 
> > preference would be to use the best available IDE for the job. VSC I would 
> > consider as a general purpose tool. If all fails, one can always use OS 
> > stuff like vi, vim, sed, awk etc 🤔
> >
> >
> >
> >
> >
> >
> >
> > Cheers
> >
> >
> >
> >
> > ![uc_export_download_id_1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ_revid_0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ][][view
> >  my Linkedin profile][]
> >
> > **Disclaimer:** Use it at your own risk.Any and all responsibility for any 
> > loss, damage or destruction of data or any other property which may arise 
> > from relying on this email's technical content is explicitly disclaimed. 
> > The author will in no case be liable for any monetary damages arising from 
> > such loss, damage or destruction.
> >
> >
> >
> >
> >
> >
> >
> > On Fri, 1 Oct 2021 at 06:55, Florian CASTELAIN 
> > <[florian.castel...@redlab.io][Florian.CASTELAIN_redlab.io]> wrote:
> >
> >
> > > Hello.
> > >
> > >
> > >
> > >
> > > Any "evolved" code editor allows you to create tasks (or builds, or 
> > > whatever they are called in the IDE you chose). If you do not find 
> > > anything that packages by default all you need, you could just create 
> > > your own tasks.
> > >
> > > *For yarn, one needs to open a terminal and submit from there.*
> > >
> > >
> > >
> > >
> > >
> > > You can create task(s) that launch your yarn commands.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > *With VSC, you get stuff for working with json files but I am not sure 
> > > with a plugin for Python*
> > >
> > >
> > >
> > >
> > >
> > > In your json task configuration, you can launch whatever you want: 
> > > python, shell. I bet you could launch your favorite video game (just make 
> > > a task called "let's have a break" 😉)
> > >
> > >
> > >
> > >
> > > Just to say, if you want everything exactly the way you want, I do not 
> > > think you will find an IDE that does it. You will have to customize it. 
> > > (correct me if wrong, of course).
> > >
> > >
> > >
> > >
> > > Have a good day.
> > >
> > >
> > >
> > >
> > > [**![signature\_299490615][signature_299490615]**][signature_299490615_signature_299490615]
> > >
> > > | ------------------------------------------------------------ |  | 
> > > -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > >  |
> > > | [![Banner][signature_299490615]][Banner_signature_299490615] |  | 
> > > **Florian CASTELAIN**    **Ingénieur Logiciel**  [72 Rue de la 
> > > République, 76140 Le Petit-Quevilly][72 Rue de la République_ 76140 Le 
> > > Petit-Quevilly]   m: +33 616 530 226   e: 
> > > [florian.castel...@redlab.io][florian.castelain_redlab.io] w: 
> > > [www.redlab.io][Banner_signature_299490615] |
> > >
> > >
> > >
> > >
> > > --------------------
> > >
> > > **De :** Jeff Zhang <[zjf...@gmail.com][zjffdu_gmail.com]>
> > > **Envoyé :** jeudi 30 septembre 2021 13:57
> > > **À :** Mich Talebzadeh 
> > > <[mich.talebza...@gmail.com][mich.talebzadeh_gmail.com]>
> > > **Cc :** user @spark <[user@spark.apache.org][user_spark.apache.org]>
> > > **Objet :** Re: Choice of IDE for Spark
> > >
> > > IIRC, you want an IDE for pyspark on yarn ?
> > >
> > >
> > >
> > >
> > > Mich Talebzadeh <[mich.talebza...@gmail.com][mich.talebzadeh_gmail.com]> 
> > > 于2021年9月30日周四 下午7:00写道:
> > >
> > >
> > > > Hi,
> > > >
> > > >
> > > >
> > > >
> > > > This may look like a redundant question but it comes about because of 
> > > > the advent of Cloud workstation usage like Amazon workspaces and others.
> > > >
> > > >
> > > >
> > > >
> > > > With IntelliJ you are OK with Spark & Scala. With PyCharm you are fine 
> > > > with PySpark and the virtual environment. Mind you as far as I know 
> > > > PyCharm only executes spark-submit in local mode. For yarn, one needs 
> > > > to open a terminal and submit from there.
> > > >
> > > >
> > > >
> > > >
> > > > However, in Amazon workstation, you get [ Visual Studio Code][Visual 
> > > > Studio Code] (VSC, an MS product) and openoffice installed. With VSC, 
> > > > you get stuff for working with json files but I am not sure with a 
> > > > plugin for Python etc, will it be as good as PyCharm? Has anyone used 
> > > > VSC in anger for Spark and if so what is the experience?
> > > >
> > > >
> > > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ![uc_export_download_id_1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ_revid_0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ][][view
> > > >  my Linkedin profile][]
> > > >
> > > > **Disclaimer:** Use it at your own risk.Any and all responsibility for 
> > > > any loss, damage or destruction of data or any other property which may 
> > > > arise from relying on this email's technical content is explicitly 
> > > > disclaimed. The author will in no case be liable for any monetary 
> > > > damages arising from such loss, damage or destruction.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > \--
> > >
> > >
> > > Best Regards
> > >
> > > Jeff Zhang
>
> \--
>
>
> Twitter: https://twitter.com/holdenkarau
>
>
> Books (Learning Spark, High Performance Spark, etc.): 
> [https://amzn.to/2MaRAG9 ][https_amzn.to_2MaRAG9]
>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau


[mich.talebzadeh_gmail.com]: mailto:mich.talebza...@gmail.com
[jq]: https://stedolan.github.io/jq/
[uc_export_download_id_1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ_revid_0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]:
 
https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ
[view my Linkedin profile]: 
https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/
[Florian.CASTELAIN_redlab.io]: mailto:florian.castel...@redlab.io
[signature_299490615]:
[signature_299490615_signature_299490615]: https://www.neuroo.ai/
[Banner_signature_299490615]: http://www.redlab.io/
[72 Rue de la République_ 76140 Le Petit-Quevilly]: 
https://www.google.com/maps/search/72+Rue+de+la+R%C3%A9publique,+76140+Le+Petit-Quevilly?entry=gmail&source=g
[florian.castelain_redlab.io]: mailto:florian.castel...@redlab.io
[zjffdu_gmail.com]: mailto:zjf...@gmail.com
[user_spark.apache.org]: mailto:user@spark.apache.org
[Visual Studio Code]: https://code.visualstudio.com/
[https_amzn.to_2MaRAG9]: https://amzn.to/2MaRAG9

Attachment: publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to