I have no idea if this is the cause, but you should use org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.0 instead of org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.2, since emr-4.3.0 includes Spark 1.6.0.
On Wed, Jan 27, 2016 at 3:00 PM Felipe Almeida <falmeida1...@gmail.com> wrote: > Some of my last message is getting trimmed by google but the content is > there. > > FA > > On 27 January 2016 at 22:59, Felipe Almeida <falmeida1...@gmail.com> > wrote: > >> Alright, here is my full code: >> >> *//paragraph one* >> z.load("org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.2") >> >> import com.amazonaws.auth.DefaultAWSCredentialsProviderChain >> import com.amazonaws.regions.RegionUtils >> import com.amazonaws.services.kinesis.AmazonKinesisClient >> import >> com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream >> import com.amazonaws.services.s3.AmazonS3Client >> >> >> import org.apache.spark.streaming.{Minutes, Seconds, StreamingContext} >> import org.apache.spark.streaming.kinesis._ >> import org.apache.spark.storage.StorageLevel >> >> import org.json4s.DefaultFormats >> import org.json4s.jackson.JsonMethods.parse >> >> *//paragraph 2* >> val credentials = new DefaultAWSCredentialsProviderChain().getCredentials >> val endpointUrl = "kinesis.us-east-1.amazonaws.com" >> val appName = "StoredashCollector-debug" >> val streamName = XXXXX >> val windowDuration = Seconds(10) >> val rememberDuration = Minutes(7) >> >> val kinesisClient = new AmazonKinesisClient(credentials) >> kinesisClient.setEndpoint(endpointUrl) >> val numStreams = >> kinesisClient.describeStream(streamName).getStreamDescription.getShards.size >> val regionName = RegionUtils.getRegionByEndpoint(endpointUrl).getName() >> >> // problematic line here >> val ssc = new StreamingContext(sc,windowDuration) >> ssc.remember(rememberDuration) >> >> case class LogEvent( >> DataType: Option[String] >> , RequestType: Option[String] >> , SessionId: Option[String] >> , HostName: Option[String] >> , TransactionAffiliation: Option[String] >> , SkuStockOutFromProductDetail: List[String] >> , SkuStockOutFromShelf: List[String] >> , TransactionId: Option[String] >> , ProductId: Option[String] >> ) >> >> val streams = (0 until 1).map { i => >> KinesisUtils.createStream(ssc, appName, streamName, endpointUrl, >> regionName, >> InitialPositionInStream.LATEST, windowDuration, >> StorageLevel.MEMORY_ONLY) >> } >> >> val unionStream = ssc.union(streams).map(byteArray => new >> String(byteArray)) >> >> // used by json4s >> implicit val formats = DefaultFormats >> val eventsStream = unionStream.map{ >> str => parse(str).extract[LogEvent] >> } >> >> eventsStream.print(10) >> >> ssc.start() >> >> *// output of the previous paragraph including error message:* >> >> credentials: com.amazonaws.auth.AWSCredentials = >> com.amazonaws.auth.BasicSessionCredentials@1181e472 endpointUrl: String >> = kinesis.us-east-1.amazonaws.com appName: String = >> StoredashCollector-debug streamName: String = XXXX windowDuration: >> org.apache.spark.streaming.Duration = 10000 ms rememberDuration: >> org.apache.spark.streaming.Duration = 420000 ms kinesisClient: >> com.amazonaws.services.kinesis.AmazonKinesisClient = >> com.amazonaws.services.kinesis.AmazonKinesisClient@275e9422 numStreams: >> Int = 2 regionName: String = us-east-1 <console>:45: error: overloaded >> method constructor StreamingContext with alternatives: (path: >> String,sparkContext: >> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext >> <and> (path: String,hadoopConf: >> org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext >> <and> (conf: org.apache.spark.SparkConf,batchDuration: >> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext >> <and> (sparkContext: >> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration: >> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext >> cannot be applied to >> (org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext, >> org.apache.spark.streaming.Duration) val ssc = new >> StreamingContext(sc,windowDuration) >> >> >> That's it; I've sent you my cluster Id per e-mail. >> >> FA >> >> On 27 January 2016 at 21:51, Jonathan Kelly <jonathaka...@gmail.com> >> wrote: >> >>> That's a really weird issue! Sorry though, I can't seem to reproduce it, >>> not in spark-shell nor in Zeppelin (on emr-4.3.0). This is all I'm running: >>> >>> import org.apache.spark.streaming._ >>> val ssc = new StreamingContext(sc, Seconds(1)) >>> >>> Is there anything else you're running? Are you specifying any special >>> configuration when creating the cluster? Do you have a cluster id I can >>> take a look at? Also, can you reproduce this in spark-shell or just in >>> Zeppelin? >>> >>> ~ Jonathan >>> >>> On Wed, Jan 27, 2016 at 12:46 PM Felipe Almeida <falmeida1...@gmail.com> >>> wrote: >>> >>>> Hi Johnathan. >>>> >>>> That issue (using* z**.load("mavenproject"*) is solved indeed >>>> (thanks), but now I can't create a StreamingContext from the current >>>> SparkContext (variable sc). >>>> >>>> When trying to run *val ssc = new StreamingContext(sc,Seconds(1))*, I >>>> get the following error: >>>> >>>> <console>:45: error: overloaded method constructor StreamingContext >>>> with alternatives: (path: String,sparkContext: >>>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext >>>> <and> (path: String,hadoopConf: >>>> org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext >>>> <and> (conf: org.apache.spark.SparkConf,batchDuration: >>>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext >>>> <and> (sparkContext: >>>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration: >>>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext >>>> cannot be applied to >>>> (org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext, >>>> org.apache.spark.streaming.Duration) val ssc = new >>>> StreamingContext(sc,Seconds(1)) >>>> >>>> See how the modules leading up to the class name are funny: >>>> org.apache.spark.org.apache.... it look's like it's it a loop. And one can >>>> see in the error message that what I did was indeed use a SparkContext and >>>> then a Duration, as per the docs. >>>> >>>> FA >>>> >>>> On 27 January 2016 at 02:14, Jonathan Kelly <jonathaka...@gmail.com> >>>> wrote: >>>> >>>>> Yeah, I saw your self-reply on SO but did not have time to reply there >>>>> also before leaving for the day. I hope emr-4.3.0 works well for you! >>>>> Please let me know if you run into any other issues, especially with Spark >>>>> or Zeppelin, or if you have any feature requests. >>>>> >>>>> Thanks, >>>>> Jonathan >>>>> On Tue, Jan 26, 2016 at 6:02 PM Felipe Almeida <falmeida1...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thank you very much Jonathan; I'll be sure to try out the new label >>>>>> as soon as I can! >>>>>> >>>>>> I eventually found a workaround which involved passing the package in >>>>>> the *--packages* for spark-submit, which was under >>>>>> */usr/lib/zeppelin/conf/zeppelin.sh*: >>>>>> http://stackoverflow.com/a/35006137/436721 >>>>>> >>>>>> Anyway, thanks for the reply, will let you know if the problem >>>>>> persists. >>>>>> >>>>>> FA >>>>>> >>>>>> On 26 January 2016 at 23:19, Jonathan Kelly <jonathaka...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Felipe, >>>>>>> >>>>>>> I'm really sorry you had to deal with this frustration, but the good >>>>>>> news is that we on EMR have fixed this issue and just released the fix >>>>>>> in >>>>>>> emr-4.3.0 today. (At least, it is available in some regions today and >>>>>>> should be available in the other regions by some time tomorrow.) >>>>>>> >>>>>>> BTW, this was discussed on another thread last month: >>>>>>> http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Zeppelin-notes-version-control-scheduler-and-external-deps-td1730.html >>>>>>> (Not >>>>>>> sure why I show up with the name "Work". Oh well. =P) >>>>>>> >>>>>>> ~ Jonathan >>>>>>> >>>>>>> On Mon, Jan 25, 2016 at 5:21 PM Felipe Almeida < >>>>>>> falmeida1...@gmail.com> wrote: >>>>>>> >>>>>>>> I cannot, for the love of *** make `z.load("maven-project")` work >>>>>>>> for Zeppelin 0.5.5 on Amazon Elastic MapReduce. I keep getting A *Null >>>>>>>> Pointer Exception* due to, I think, something related to Sonatype. >>>>>>>> >>>>>>>> I've also asked a question on SO about this: >>>>>>>> http://stackoverflow.com/questions/35005455/java-npe-when-loading-a-dependency-from-maven-from-within-zeppelin-on-aws-emr >>>>>>>> >>>>>>>> Has anyone gone through this? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> FA >>>>>>>> >>>>>>>> -- >>>>>>>> “Every time you stay out late; every time you sleep in; every time >>>>>>>> you miss a workout; every time you don’t give 100% – You make it that >>>>>>>> much >>>>>>>> easier for me to beat you.” - Unknown author >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> “Every time you stay out late; every time you sleep in; every time >>>>>> you miss a workout; every time you don’t give 100% – You make it that >>>>>> much >>>>>> easier for me to beat you.” - Unknown author >>>>>> >>>>> >>>> >>>> >>>> -- >>>> “Every time you stay out late; every time you sleep in; every time you >>>> miss a workout; every time you don’t give 100% – You make it that much >>>> easier for me to beat you.” - Unknown author >>>> >>> >> >> >> -- >> “Every time you stay out late; every time you sleep in; every time you >> miss a workout; every time you don’t give 100% – You make it that much >> easier for me to beat you.” - Unknown author >> > > > > -- > “Every time you stay out late; every time you sleep in; every time you > miss a workout; every time you don’t give 100% – You make it that much > easier for me to beat you.” - Unknown author >