Actually it's json with specific structure from API server. But the task is to check constantly if new data appears on API server and load it to Kafka.
Full pipeline can be presented like that: REST API -> Kafka -> some processing -> Kafka/Mongo -> … Best regards, Stanislav Porotikov From: Mich Talebzadeh <mich.talebza...@gmail.com> Sent: Wednesday, December 27, 2023 6:17 PM To: Поротиков Станислав Вячеславович <s.poroti...@skbkontur.ru.invalid> Cc: user@spark.apache.org Subject: Re: Pyspark UDF as a data source for streaming Ok so you want to generate some random data and load it into Kafka on a regular interval and the rest? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom [Рисунок удален отправителем.] view my Linkedin profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Wed, 27 Dec 2023 at 12:16, Поротиков Станислав Вячеславович <s.poroti...@skbkontur.ru.invalid> wrote: Hello! Is it possible to write pyspark UDF, generated data to streaming dataframe? I want to get some data from REST API requests in real time and consider to save this data to dataframe. And then put it to Kafka. I can't realise how to create streaming dataframe from generated data. I am new in spark streaming. Could you give me some hints? Best regards, Stanislav Porotikov