[jira] [Updated] (SPARK-21948) How to use spark streaming for deal two table from one topic of topic??

zhaP524 (JIRA) Thu, 07 Sep 2017 18:19:21 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-21948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zhaP524 updated SPARK-21948:
----------------------------
    Remaining Estimate: 12h
     Original Estimate: 12h

> How to use spark streaming for deal two table from one topic of topic??
> -----------------------------------------------------------------------
>
>                 Key: SPARK-21948
>                 URL: https://issues.apache.org/jira/browse/SPARK-21948
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Submit, Structured Streaming
>    Affects Versions: 2.1.1
>         Environment: kafka:0.10.0
> Spark:2.1.1
>            Reporter: zhaP524
>         Attachments: QQ图片20170908080946.png
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Now, I have such A requirement, I want from A topic of kafka、 A group of 
> receiving data, receive to DirectStream contains two tables of data 
> (A<Master>, B<Slave>), my ultimate goal is the two tables of data according 
> to some fields join operation, will produce the results into the Database；
> My previous operations are as follows:
> 1. Use batch processing directly at the DirectStream layer, filter out the 
> data of A and B, produce their respective DF, and then use Spark SQL to 
> perform the join operation, and the results are then entered into the 
> library;But this kind of situation will exist problems, A and B table 1: N 
> relationship, when selected A batch, may lead to A full load, and part B 
> table loaded only, lead to the calculation results is only part of the next 
> batch came in already consumed A data, table B subsequent data has no 
> associated data, leading to loss of data;I don't know if you can store the A 
> table data using the queue and so on, until the total load of B table data is 
> loaded and processed to generate the complete result. In this case, the 
> essence is similar to Spark Streaming Window Operation
> 2、
> 2, the use of Spark Streaming Window Operation for processing, I can isolate 
> A and B table when DirectStream flow stream, but the join Operation shall be 
> carried out in the Window, the Window Operation did not support the transform 
> Operation such as map, lead to process cannot go 
> through(java.io.NotSerializableException: 
> org.apache.kafka.clients.consumer.ConsumerRecord
> Serialization stack:
>       - object not serializable (class: 
> org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic 
> = doc, partition = 0,...), 
> I don't know what should I do??
> So, I was wondering if there was a problem with my scene?Or do I have a 
> technical problem?Is there a solution to my business scenario without 
> introducing other components?Trouble to give directions.TKS



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21948) How to use spark streaming for deal two table from one topic of topic??

Reply via email to