Hi, I believe both Ayan and Jayesh have made valuable points. From my experience the companies that traditionally have a Python team in house would like to add Spark capabilities to their inventory not least because
1. There is more Python code running around that will benefit from modern technologies like Spark adding parallel processing capabilities 2. People are more familiar with Python than say Scala. It is more cost effective to leverage the existing skills than retrain with Scala 3. There are loads of Python code used with PySpark but not necessarily all with Data science involvement. they use it for general Analytics, ETL etc 4. The rise of Data Science has given more impetus to Python. Hence Databricks etc are now keen on pushing Python across as well. Not all functionality is available in Python compared to Scala 5. If you want to learn Python and PySpark, then you need to put efforts into learning the language Python itself as well. Also as you have already done so, best to learn the Python extensive libraries like Pandas etc. However you ought to aim for being a Python practitioner without PySpark as well. 6. Check the web. There are excellent free courses around on Python. HTH Mich view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 14 Apr 2021 at 21:05, Lalwani, Jayesh <jlalw...@amazon.com.invalid> wrote: > There is no good answer to the question “Have I learnt enough”. You can > never learn enough. You have to constrantly learn > > > > Practically, if you want to make a career out of using technology XYZ, you > only need to learn enough XYZ to get a job doing XYZ. Once you get a job > doing XYZ, other people are paying you to learn more XYZ. You want to know > if you know Python enough to do PySpark, look at what companies are asking > for. Go for interviews. > > > > Just speaking from experience, most jobs that call for Python + Spark tend > to be data science jobs. These jobs also require you to have a data science > background. > > > > *From: *ayan guha <guha.a...@gmail.com> > *Date: *Wednesday, April 14, 2021 at 3:55 PM > *To: *"ashok34...@yahoo.com.INVALID" <ashok34...@yahoo.com.invalid> > *Cc: *"user@spark.apache.org" <user@spark.apache.org> > *Subject: *RE: [EXTERNAL] Python level of knowledge for Spark and PySpark > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > The answer always is "it depends". At the outset it seems you are in > pretty good shape and have all the key skills you need. All I can suggest > is try to take inherent benefits of the language and hone your coding > practices > > > > > > On Thu, 15 Apr 2021 at 2:25 am, ashok34...@yahoo.com.INVALID > <ashok34...@yahoo.com.invalid> wrote: > > Hi gurus, > > > > I have knowledge of Java, Scala and good enough knowledge of Spark, Spark > SQL and Spark Functional programing with Scala. > > > > I have started using Python with Spark PySpark. > > > > Wondering, in order to be proficient in PySpark, how much good knowledge > of Python programing is needed? I know the answer may be very good > knowledge, but in practice how much is good enough. I can write Python in > IDE like PyCharm similar to the way Scala works and can run the programs. > Does expert knowledge of Python is prerequisite for PySpark? I also know > Pandas and am also familiar with plotting routines like matplotlib. > > > > Warmest > > > > Ashok > > -- > > Best Regards, > Ayan Guha >