Re: Multiple Kinesis Streams in a single Streaming job

Chris Fregly Thu, 14 May 2015 21:02:46 -0700

another option (not really recommended, but worth mentioning) would be to 
change the region of dynamodb to be separate from the other stream - and even 
separate from the stream itself.


this isn't available right now, but will be in Spark 1.4.

> On May 14, 2015, at 6:47 PM, Erich Ess <[email protected]> wrote:
> 
> Hi Tathagata,
> 
> I think that's exactly what's happening.
> 
> The error message is: 
> "com.amazonaws.services.kinesis.model.InvalidArgumentException: 
> StartingSequenceNumber 
> 49550673839151225431779125105915140284622031848663416866 used in 
> GetShardIterator on shard shardId-000000000002 in stream erich-test under 
> account xxxxxxx is invalid because it did not come from this stream".
> 
> I looked at the DynamoDB table and each job has single table and that table 
> does not contain any stream identification information, only shard 
> checkpointing data.  I think the error is that when it tries to read from 
> stream B, it's using checkpointing data for stream A and errors out.  So it 
> appears, at first glance, that currently you can't read from multiple Kinesis 
> streams in a single job.  I haven't tried this, but it might be possible for 
> this to work if I force each stream to have different shard IDs so there is 
> no ambiguity in the DynamoDB table; however, that's clearly not a feasible 
> production solution.
> 
> Thanks,
> -Erich
> 
>> On Thu, May 14, 2015 at 8:34 PM, Tathagata Das <[email protected]> wrote:
>> A possible problem may be that the kinesis stream in 1.3 uses the 
>> SparkContext app name, as the Kinesis Application Name, that is used by the 
>> Kinesis Client Library to save checkpoints in DynamoDB. Since both kinesis 
>> DStreams are using the Kinesis application name (as they are in the same 
>> StreamingContext / SparkContext / Spark app name), KCL may be doing weird 
>> overwriting checkpoint information of both Kinesis streams into the same 
>> DynamoDB table. Either ways, this is going to be fixed in Spark 1.4. 
>> 
>> On Thu, May 14, 2015 at 4:10 PM, Chris Fregly <[email protected]> wrote:
>>> have you tried to union the 2 streams per the KinesisWordCountASL example 
>>> where 2 streams (against the same Kinesis stream in this case) are created 
>>> and union'd?
>>> 
>>> it should work the same way - including union() of streams from totally 
>>> different source types (kafka, kinesis, flume).
>>> 
>>> 
>>> 
>>>> On Thu, May 14, 2015 at 2:07 PM, Tathagata Das <[email protected]> wrote:
>>>> What is the error you are seeing?
>>>> 
>>>> TD
>>>> 
>>>>> On Thu, May 14, 2015 at 9:00 AM, Erich Ess <[email protected]> 
>>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> Is it possible to setup streams from multiple Kinesis streams and process
>>>>> them in a single job?  From what I have read, this should be possible,
>>>>> however, the Kinesis layer errors out whenever I try to receive from more
>>>>> than a single Kinesis Stream.
>>>>> 
>>>>> Here is the code.  Currently, I am focused on just getting receivers setup
>>>>> and working for the two Kinesis Streams, as such, this code just attempts 
>>>>> to
>>>>> print out the contents of both streams:
>>>>> 
>>>>> implicit val formats = Serialization.formats(NoTypeHints)
>>>>> 
>>>>> val conf = new SparkConf().setMaster("local[*]").setAppName("test")
>>>>> val ssc = new StreamingContext(conf, Seconds(1))
>>>>> 
>>>>> val rawStream = KinesisUtils.createStream(ssc, "erich-test",
>>>>> "kinesis.us-east-1.amazonaws.com", Duration(1000),
>>>>> InitialPositionInStream.TRIM_HORIZON, StorageLevel.MEMORY_ONLY)
>>>>> rawStream.map(msg => new String(msg)).print
>>>>> 
>>>>> val loaderStream = KinesisUtils.createStream(
>>>>>   ssc,
>>>>>   "dev-loader",
>>>>>   "kinesis.us-east-1.amazonaws.com",
>>>>>   Duration(1000),
>>>>>   InitialPositionInStream.TRIM_HORIZON,
>>>>>   StorageLevel.MEMORY_ONLY)
>>>>> 
>>>>> val loader = loaderStream.map(msg => new String(msg)).print
>>>>> 
>>>>> ssc.start()
>>>>> 
>>>>> Thanks,
>>>>> -Erich
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> View this message in context: 
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-Kinesis-Streams-in-a-single-Streaming-job-tp22889.html
>>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
> 
> 
> 
> -- 
> Erich Ess | CTO
> c. 310-703-6058
> @SimpleRelevance | 130 E Randolph, Ste 1650 | Chicago, IL 60601
> Machine Learning For Marketers
> 
> Named a top startup to watch in Crain's — View the Article.
> 
> SimpleRelevance.com | Facebook | Twitter | Blog

Re: Multiple Kinesis Streams in a single Streaming job

Reply via email to