[ https://issues.apache.org/jira/browse/SPARK-21948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhaP524 updated SPARK-21948: ---------------------------- Remaining Estimate: 12h Original Estimate: 12h > How to use spark streaming for deal two table from one topic of topic?? > ----------------------------------------------------------------------- > > Key: SPARK-21948 > URL: https://issues.apache.org/jira/browse/SPARK-21948 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Submit, Structured Streaming > Affects Versions: 2.1.1 > Environment: kafka:0.10.0 > Spark:2.1.1 > Reporter: zhaP524 > Attachments: QQ图片20170908080946.png > > Original Estimate: 12h > Remaining Estimate: 12h > > Now, I have such A requirement, I want from A topic of kafka、 A group of > receiving data, receive to DirectStream contains two tables of data > (A<Master>, B<Slave>), my ultimate goal is the two tables of data according > to some fields join operation, will produce the results into the Database; > My previous operations are as follows: > 1. Use batch processing directly at the DirectStream layer, filter out the > data of A and B, produce their respective DF, and then use Spark SQL to > perform the join operation, and the results are then entered into the > library;But this kind of situation will exist problems, A and B table 1: N > relationship, when selected A batch, may lead to A full load, and part B > table loaded only, lead to the calculation results is only part of the next > batch came in already consumed A data, table B subsequent data has no > associated data, leading to loss of data;I don't know if you can store the A > table data using the queue and so on, until the total load of B table data is > loaded and processed to generate the complete result. In this case, the > essence is similar to Spark Streaming Window Operation > 2、 > 2, the use of Spark Streaming Window Operation for processing, I can isolate > A and B table when DirectStream flow stream, but the join Operation shall be > carried out in the Window, the Window Operation did not support the transform > Operation such as map, lead to process cannot go > through(java.io.NotSerializableException: > org.apache.kafka.clients.consumer.ConsumerRecord > Serialization stack: > - object not serializable (class: > org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic > = doc, partition = 0,...), > I don't know what should I do?? > So, I was wondering if there was a problem with my scene?Or do I have a > technical problem?Is there a solution to my business scenario without > introducing other components?Trouble to give directions.TKS -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org