another option (not really recommended, but worth mentioning) would be to change the region of dynamodb to be separate from the other stream - and even separate from the stream itself.
this isn't available right now, but will be in Spark 1.4. > On May 14, 2015, at 6:47 PM, Erich Ess <[email protected]> wrote: > > Hi Tathagata, > > I think that's exactly what's happening. > > The error message is: > "com.amazonaws.services.kinesis.model.InvalidArgumentException: > StartingSequenceNumber > 49550673839151225431779125105915140284622031848663416866 used in > GetShardIterator on shard shardId-000000000002 in stream erich-test under > account xxxxxxx is invalid because it did not come from this stream". > > I looked at the DynamoDB table and each job has single table and that table > does not contain any stream identification information, only shard > checkpointing data. I think the error is that when it tries to read from > stream B, it's using checkpointing data for stream A and errors out. So it > appears, at first glance, that currently you can't read from multiple Kinesis > streams in a single job. I haven't tried this, but it might be possible for > this to work if I force each stream to have different shard IDs so there is > no ambiguity in the DynamoDB table; however, that's clearly not a feasible > production solution. > > Thanks, > -Erich > >> On Thu, May 14, 2015 at 8:34 PM, Tathagata Das <[email protected]> wrote: >> A possible problem may be that the kinesis stream in 1.3 uses the >> SparkContext app name, as the Kinesis Application Name, that is used by the >> Kinesis Client Library to save checkpoints in DynamoDB. Since both kinesis >> DStreams are using the Kinesis application name (as they are in the same >> StreamingContext / SparkContext / Spark app name), KCL may be doing weird >> overwriting checkpoint information of both Kinesis streams into the same >> DynamoDB table. Either ways, this is going to be fixed in Spark 1.4. >> >> On Thu, May 14, 2015 at 4:10 PM, Chris Fregly <[email protected]> wrote: >>> have you tried to union the 2 streams per the KinesisWordCountASL example >>> where 2 streams (against the same Kinesis stream in this case) are created >>> and union'd? >>> >>> it should work the same way - including union() of streams from totally >>> different source types (kafka, kinesis, flume). >>> >>> >>> >>>> On Thu, May 14, 2015 at 2:07 PM, Tathagata Das <[email protected]> wrote: >>>> What is the error you are seeing? >>>> >>>> TD >>>> >>>>> On Thu, May 14, 2015 at 9:00 AM, Erich Ess <[email protected]> >>>>> wrote: >>>>> Hi, >>>>> >>>>> Is it possible to setup streams from multiple Kinesis streams and process >>>>> them in a single job? From what I have read, this should be possible, >>>>> however, the Kinesis layer errors out whenever I try to receive from more >>>>> than a single Kinesis Stream. >>>>> >>>>> Here is the code. Currently, I am focused on just getting receivers setup >>>>> and working for the two Kinesis Streams, as such, this code just attempts >>>>> to >>>>> print out the contents of both streams: >>>>> >>>>> implicit val formats = Serialization.formats(NoTypeHints) >>>>> >>>>> val conf = new SparkConf().setMaster("local[*]").setAppName("test") >>>>> val ssc = new StreamingContext(conf, Seconds(1)) >>>>> >>>>> val rawStream = KinesisUtils.createStream(ssc, "erich-test", >>>>> "kinesis.us-east-1.amazonaws.com", Duration(1000), >>>>> InitialPositionInStream.TRIM_HORIZON, StorageLevel.MEMORY_ONLY) >>>>> rawStream.map(msg => new String(msg)).print >>>>> >>>>> val loaderStream = KinesisUtils.createStream( >>>>> ssc, >>>>> "dev-loader", >>>>> "kinesis.us-east-1.amazonaws.com", >>>>> Duration(1000), >>>>> InitialPositionInStream.TRIM_HORIZON, >>>>> StorageLevel.MEMORY_ONLY) >>>>> >>>>> val loader = loaderStream.map(msg => new String(msg)).print >>>>> >>>>> ssc.start() >>>>> >>>>> Thanks, >>>>> -Erich >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-Kinesis-Streams-in-a-single-Streaming-job-tp22889.html >>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] > > > > -- > Erich Ess | CTO > c. 310-703-6058 > @SimpleRelevance | 130 E Randolph, Ste 1650 | Chicago, IL 60601 > Machine Learning For Marketers > > Named a top startup to watch in Crain's — View the Article. > > SimpleRelevance.com | Facebook | Twitter | Blog
