date:20200809

Re: Spark streaming receivers

2020-08-09 Thread Dark Crusader

Hi Russell, This is super helpful. Thank you so much. Can you elaborate on the differences between structured streaming vs dstreams? How would the number of receivers required etc change? On Sat, 8 Aug, 2020, 10:28 pm Russell Spitzer, wrote: > Note, none of this applies to Direct streaming appr

回复：[Spark-Kafka-Streaming] Verifying the approach for multiple queries

2020-08-09 Thread tianlangstudio

Hello, Sir! What about process and group the data first then write grouped data to Kafka topics A and B. Then read topic A or B from another Spark Application and process it more. Like the term ETL's mean. TianlangStudio Some of the biggest lies: I will start tomorrow/Others are better

[Spark-Kafka-Streaming] Verifying the approach for multiple queries

2020-08-09 Thread Amit Joshi

Hi, I have a scenario where a kafka topic is being written with different types of json records. I have to regroup the records based on the type and then fetch the schema and parse and write as parquet. I have tried structured programming. But dynamic schema is a constraint. So I have used DStream

regexp_extract regex for extracting the columns from string

2020-08-09 Thread anbutech

Hi All, I have a following info.in the data column. <1000> date=2020-08-01 time=20:50:04 name=processing id=123 session=new packt=20 orgin=null address=null dest=fgjglgl here I want to create a separate column for the above key value pairs after the integer <1000> separated by spaces. Is there a