Re: Java NPE when Loading a Dependency from Maven from within Zeppelin on AWS EMR

Felipe Almeida Wed, 27 Jan 2016 15:00:30 -0800

Some of my last message is getting trimmed by google but the content is
there.


FA

On 27 January 2016 at 22:59, Felipe Almeida <falmeida1...@gmail.com> wrote:

> Alright, here is my full code:
>
> *//paragraph one*
> z.load("org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.2")
>
> import com.amazonaws.auth.DefaultAWSCredentialsProviderChain
> import com.amazonaws.regions.RegionUtils
> import com.amazonaws.services.kinesis.AmazonKinesisClient
> import
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
> import com.amazonaws.services.s3.AmazonS3Client
>
>
> import org.apache.spark.streaming.{Minutes, Seconds, StreamingContext}
> import org.apache.spark.streaming.kinesis._
> import org.apache.spark.storage.StorageLevel
>
> import org.json4s.DefaultFormats
> import org.json4s.jackson.JsonMethods.parse
>
> *//paragraph 2*
> val credentials = new DefaultAWSCredentialsProviderChain().getCredentials
> val endpointUrl = "kinesis.us-east-1.amazonaws.com"
> val appName = "StoredashCollector-debug"
> val streamName = XXXXX
> val windowDuration = Seconds(10)
> val rememberDuration = Minutes(7)
>
> val kinesisClient = new AmazonKinesisClient(credentials)
> kinesisClient.setEndpoint(endpointUrl)
> val numStreams =
> kinesisClient.describeStream(streamName).getStreamDescription.getShards.size
> val regionName = RegionUtils.getRegionByEndpoint(endpointUrl).getName()
>
> // problematic line here
> val ssc = new StreamingContext(sc,windowDuration)
> ssc.remember(rememberDuration)
>
> case class LogEvent(
>                      DataType: Option[String]
>                      , RequestType: Option[String]
>                      , SessionId: Option[String]
>                      , HostName: Option[String]
>                      , TransactionAffiliation: Option[String]
>                      , SkuStockOutFromProductDetail: List[String]
>                      , SkuStockOutFromShelf: List[String]
>                      , TransactionId: Option[String]
>                      , ProductId: Option[String]
>                    )
>
> val streams = (0 until 1).map { i =>
>   KinesisUtils.createStream(ssc, appName, streamName, endpointUrl,
> regionName,
>     InitialPositionInStream.LATEST, windowDuration,
> StorageLevel.MEMORY_ONLY)
> }
>
> val unionStream = ssc.union(streams).map(byteArray => new
> String(byteArray))
>
> // used by json4s
> implicit val formats = DefaultFormats
> val eventsStream = unionStream.map{
>     str => parse(str).extract[LogEvent]
> }
>
> eventsStream.print(10)
>
> ssc.start()
>
> *// output of the previous paragraph including error message:*
>
> credentials: com.amazonaws.auth.AWSCredentials =
> com.amazonaws.auth.BasicSessionCredentials@1181e472 endpointUrl: String =
> kinesis.us-east-1.amazonaws.com appName: String =
> StoredashCollector-debug streamName: String = XXXX windowDuration:
> org.apache.spark.streaming.Duration = 10000 ms rememberDuration:
> org.apache.spark.streaming.Duration = 420000 ms kinesisClient:
> com.amazonaws.services.kinesis.AmazonKinesisClient =
> com.amazonaws.services.kinesis.AmazonKinesisClient@275e9422 numStreams:
> Int = 2 regionName: String = us-east-1 <console>:45: error: overloaded
> method constructor StreamingContext with alternatives: (path:
> String,sparkContext:
> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext
> <and> (path: String,hadoopConf:
> org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext
> <and> (conf: org.apache.spark.SparkConf,batchDuration:
> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
> <and> (sparkContext:
> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration:
> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
> cannot be applied to
> (org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,
> org.apache.spark.streaming.Duration) val ssc = new
> StreamingContext(sc,windowDuration)
>
>
> That's it; I've sent you my cluster Id per e-mail.
>
> FA
>
> On 27 January 2016 at 21:51, Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
>
>> That's a really weird issue! Sorry though, I can't seem to reproduce it,
>> not in spark-shell nor in Zeppelin (on emr-4.3.0). This is all I'm running:
>>
>> import org.apache.spark.streaming._
>> val ssc = new StreamingContext(sc, Seconds(1))
>>
>> Is there anything else you're running? Are you specifying any special
>> configuration when creating the cluster? Do you have a cluster id I can
>> take a look at? Also, can you reproduce this in spark-shell or just in
>> Zeppelin?
>>
>> ~ Jonathan
>>
>> On Wed, Jan 27, 2016 at 12:46 PM Felipe Almeida <falmeida1...@gmail.com>
>> wrote:
>>
>>> Hi Johnathan.
>>>
>>> That issue (using* z**.load("mavenproject"*) is solved indeed (thanks),
>>> but now I can't create a StreamingContext from the current SparkContext
>>> (variable sc).
>>>
>>> When trying to run *val ssc = new StreamingContext(sc,Seconds(1))*, I
>>> get the following error:
>>>
>>> <console>:45: error: overloaded method constructor StreamingContext with
>>> alternatives: (path: String,sparkContext:
>>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext
>>> <and> (path: String,hadoopConf:
>>> org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext
>>> <and> (conf: org.apache.spark.SparkConf,batchDuration:
>>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
>>> <and> (sparkContext:
>>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration:
>>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
>>> cannot be applied to
>>> (org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,
>>> org.apache.spark.streaming.Duration) val ssc = new
>>> StreamingContext(sc,Seconds(1))
>>>
>>> See how the modules leading up to the class name are funny:
>>> org.apache.spark.org.apache.... it look's like it's it a loop. And one can
>>> see in the error message that what I did was indeed use a SparkContext and
>>> then a Duration, as per the docs.
>>>
>>> FA
>>>
>>> On 27 January 2016 at 02:14, Jonathan Kelly <jonathaka...@gmail.com>
>>> wrote:
>>>
>>>> Yeah, I saw your self-reply on SO but did not have time to reply there
>>>> also before leaving for the day. I hope emr-4.3.0 works well for you!
>>>> Please let me know if you run into any other issues, especially with Spark
>>>> or Zeppelin, or if you have any feature requests.
>>>>
>>>> Thanks,
>>>> Jonathan
>>>> On Tue, Jan 26, 2016 at 6:02 PM Felipe Almeida <falmeida1...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you very much Jonathan; I'll be sure to try out the new label as
>>>>> soon as I can!
>>>>>
>>>>> I eventually found a workaround which involved passing the package in
>>>>> the *--packages* for spark-submit, which was under
>>>>> */usr/lib/zeppelin/conf/zeppelin.sh*:
>>>>> http://stackoverflow.com/a/35006137/436721
>>>>>
>>>>> Anyway, thanks for the reply, will let you know if the problem
>>>>> persists.
>>>>>
>>>>> FA
>>>>>
>>>>> On 26 January 2016 at 23:19, Jonathan Kelly <jonathaka...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Felipe,
>>>>>>
>>>>>> I'm really sorry you had to deal with this frustration, but the good
>>>>>> news is that we on EMR have fixed this issue and just released the fix in
>>>>>> emr-4.3.0 today. (At least, it is available in some regions today and
>>>>>> should be available in the other regions by some time tomorrow.)
>>>>>>
>>>>>> BTW, this was discussed on another thread last month:
>>>>>> http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Zeppelin-notes-version-control-scheduler-and-external-deps-td1730.html
>>>>>>  (Not
>>>>>> sure why I show up with the name "Work". Oh well. =P)
>>>>>>
>>>>>> ~ Jonathan
>>>>>>
>>>>>> On Mon, Jan 25, 2016 at 5:21 PM Felipe Almeida <
>>>>>> falmeida1...@gmail.com> wrote:
>>>>>>
>>>>>>> I cannot, for the love of *** make `z.load("maven-project")` work
>>>>>>> for Zeppelin 0.5.5 on Amazon Elastic MapReduce. I keep getting A *Null
>>>>>>> Pointer Exception* due to, I think, something related to Sonatype.
>>>>>>>
>>>>>>> I've also asked a question on SO about this:
>>>>>>> http://stackoverflow.com/questions/35005455/java-npe-when-loading-a-dependency-from-maven-from-within-zeppelin-on-aws-emr
>>>>>>>
>>>>>>> Has anyone gone through this?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> FA
>>>>>>>
>>>>>>> --
>>>>>>> “Every time you stay out late; every time you sleep in; every time
>>>>>>> you miss a workout; every time you don’t give 100% – You make it that 
>>>>>>> much
>>>>>>> easier for me to beat you.” - Unknown author
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> “Every time you stay out late; every time you sleep in; every time you
>>>>> miss a workout; every time you don’t give 100% – You make it that much
>>>>> easier for me to beat you.” - Unknown author
>>>>>
>>>>
>>>
>>>
>>> --
>>> “Every time you stay out late; every time you sleep in; every time you
>>> miss a workout; every time you don’t give 100% – You make it that much
>>> easier for me to beat you.” - Unknown author
>>>
>>
>
>
> --
> “Every time you stay out late; every time you sleep in; every time you
> miss a workout; every time you don’t give 100% – You make it that much
> easier for me to beat you.” - Unknown author
>



-- 
“Every time you stay out late; every time you sleep in; every time you miss
a workout; every time you don’t give 100% – You make it that much easier
for me to beat you.” - Unknown author

Re: Java NPE when Loading a Dependency from Maven from within Zeppelin on AWS EMR

Reply via email to