On Dataproc package kafka-python does not exist not installed as standard sudo su - to root and install it as above
as root pip list|grep kafka root@ctpcluster-m:~# pip install kafka-python Collecting kafka-python Downloading kafka_python-2.0.2-py2.py3-none-any.whl (246 kB) |████████████████████████████████| 246 kB 22.0 MB/s Installing collected packages: kafka-python Successfully installed kafka-python-2.0.2 hduser@ctpcluster-m: /home/hduser> pip list|grep kafka kafka-python 2.0.2 HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 18 Feb 2022 at 08:39, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Have you installed the correct package kafka-python? > > *pip install kafka-python* > Collecting kafka-python > Downloading kafka_python-2.0.2-py2.py3-none-any.whl (246 kB) > |████████████████████████████████| 246 kB 1.9 MB/s > Installing collected packages: kafka-python > Successfully installed kafka-python-2.0.2 > > > *pip list|grep kafka* > *kafka-python 2.0.2* > > *python3* > Python 3.7.3 (default, Apr 3 2021, 20:42:31) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux > Type "help", "copyright", "credits" or "license" for more information. > *>>> from kafka import KafkaProducer* > *>>>* > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 18 Feb 2022 at 07:45, karan alang <karan.al...@gmail.com> wrote: > >> Hello All, >> >> I've a GCP Dataproc cluster, and i'm running a Spark StructuredStreaming >> job on this. >> I'm trying to use KafkaProducer to push aggregated data into a Kafka >> topic, however when i import KafkaProducer (from kafka import >> KafkaProducer), it gives error >> >> ``` >> >> Traceback (most recent call last): >> >> File >> >> "/tmp/7e27e272e64b461dbdc2e5083dc23202/StructuredStreaming_GCP_Versa_Sase_gcloud.py", >> line 14, in <module> >> >> from kafka.producer import KafkaProducer >> >> File "/opt/conda/default/lib/python3.8/site-packages/kafka/__init__.py", >> line 23, in <module> >> >> from kafka.producer import KafkaProducer >> >> File >> >> "/opt/conda/default/lib/python3.8/site-packages/kafka/producer/__init__.py", >> line 4, in <module> >> >> from .simple import SimpleProducer >> >> File >> "/opt/conda/default/lib/python3.8/site-packages/kafka/producer/simple.py", >> line 54 >> >> return '<SimpleProducer batch=%s>' % self.async >> ``` >> >> As part of the initialization actions, i'm installing the following : >> --- >> >> pip install pypi >> pip install kafka-python >> pip install google-cloud-storage >> pip install pandas >> >> --- >> >> Additional details in stackoverflow : >> >> https://stackoverflow.com/questions/71169869/gcp-dataproc-getting-error-in-importing-kafkaproducer >> >> Any ideas on what needs to be to fix this ? >> tia! >> >