I have a spark streaming job that reads from several kinesis streams and unions 
them together in a single streaming context.

val streams = ingestionStreams.map(streamName => {
  KinesisInputDStream.builder.checkpointAppName(s"${jobName}_$streamName")
    .streamName(streamName)
    .streamingContext(ssc)
    .endpointUrl(endpointUrl)
    .regionName(regionName)
    .initialPositionInStream(InitialPositionInStream.TRIM_HORIZON)
    .checkpointInterval(kinesisCheckpointInterval)
    .storageLevel(StorageLevel.MEMORY_ONLY)
    .buildWithMessageHandler(KinesisRecordHandler.recordHandler)
})

import spark.sqlContext.implicits._
ssc.union(streams)
  .checkpoint(batchInterval)
  .foreachRDD(jsonRdd => ...)


I see correct numbers of records within the Spark Streaming tab in the UI. 
However the number of actual records processed by foreachRDD is less.

Within the executor logs I see many ProvisionedThroughputExceededException 
however this should be benign in that the KCL should retry those records.

Unfortunately I am not seeing the missing records processed at a later date. 
Where to look next?


. . . . . . . . . . . . . . . . . . . . . . . . . . .

Richard Moorhead
Software Engineer
richard.moorh...@c2fo.com<mailto:richard.moorh...@gmail.com>

C2FO: The World's Market for Working CapitalĀ®

[http://c2fo.com/wp-content/uploads/sites/1/2016/03/LinkedIN.png] 
<https://www.linkedin.com/company/c2fo?trk=vsrp_companies_res_name&trkInfo=VSRPsearchId%3A125658601427902817660%2CVSRPtargetId%3A1555109%2CVSRPcmpt%3Aprimary>
 [http://c2fo.com/wp-content/uploads/sites/1/2016/03/YouTube.png]  
<https://www.youtube.com/c/C2FOMarket> 
[http://c2fo.com/wp-content/uploads/sites/1/2016/03/Twitter.png]  
<https://twitter.com/C2FO> 
[http://c2fo.com/wp-content/uploads/sites/1/2016/03/Googleplus.png]  
<https://plus.google.com/+C2foMarket/posts> 
[http://c2fo.com/wp-content/uploads/sites/1/2016/03/Facebook.png]  
<https://www.facebook.com/C2FOMarketplace> 
[http://c2fo.com/wp-content/uploads/sites/1/2016/03/Forbes-Fintech-50.png] 
<https://c2fo.com/media-coverage/c2fo-included-forbes-fintech-50>

The information contained in this message and any attachment may be privileged, 
confidential, and protected from disclosure. If you are not the intended 
recipient, or an employee, or agent responsible for delivering this message to 
the intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication is strictly prohibited. If you 
have received this communication in error, please notify us immediately by 
replying to the message and deleting from your computer.

Reply via email to