Re: Java NPE when Loading a Dependency from Maven from within Zeppelin on AWS EMR

Jonathan Kelly Wed, 27 Jan 2016 15:17:29 -0800

I have no idea if this is the cause, but you should
use org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.0 instead
of org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.2, since emr-4.3.0
includes Spark 1.6.0.


On Wed, Jan 27, 2016 at 3:00 PM Felipe Almeida <falmeida1...@gmail.com>
wrote:

> Some of my last message is getting trimmed by google but the content is
> there.
>
> FA
>
> On 27 January 2016 at 22:59, Felipe Almeida <falmeida1...@gmail.com>
> wrote:
>
>> Alright, here is my full code:
>>
>> *//paragraph one*
>> z.load("org.apache.spark:spark-streaming-kinesis-asl_2.10:1.5.2")
>>
>> import com.amazonaws.auth.DefaultAWSCredentialsProviderChain
>> import com.amazonaws.regions.RegionUtils
>> import com.amazonaws.services.kinesis.AmazonKinesisClient
>> import
>> com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
>> import com.amazonaws.services.s3.AmazonS3Client
>>
>>
>> import org.apache.spark.streaming.{Minutes, Seconds, StreamingContext}
>> import org.apache.spark.streaming.kinesis._
>> import org.apache.spark.storage.StorageLevel
>>
>> import org.json4s.DefaultFormats
>> import org.json4s.jackson.JsonMethods.parse
>>
>> *//paragraph 2*
>> val credentials = new DefaultAWSCredentialsProviderChain().getCredentials
>> val endpointUrl = "kinesis.us-east-1.amazonaws.com"
>> val appName = "StoredashCollector-debug"
>> val streamName = XXXXX
>> val windowDuration = Seconds(10)
>> val rememberDuration = Minutes(7)
>>
>> val kinesisClient = new AmazonKinesisClient(credentials)
>> kinesisClient.setEndpoint(endpointUrl)
>> val numStreams =
>> kinesisClient.describeStream(streamName).getStreamDescription.getShards.size
>> val regionName = RegionUtils.getRegionByEndpoint(endpointUrl).getName()
>>
>> // problematic line here
>> val ssc = new StreamingContext(sc,windowDuration)
>> ssc.remember(rememberDuration)
>>
>> case class LogEvent(
>>                      DataType: Option[String]
>>                      , RequestType: Option[String]
>>                      , SessionId: Option[String]
>>                      , HostName: Option[String]
>>                      , TransactionAffiliation: Option[String]
>>                      , SkuStockOutFromProductDetail: List[String]
>>                      , SkuStockOutFromShelf: List[String]
>>                      , TransactionId: Option[String]
>>                      , ProductId: Option[String]
>>                    )
>>
>> val streams = (0 until 1).map { i =>
>>   KinesisUtils.createStream(ssc, appName, streamName, endpointUrl,
>> regionName,
>>     InitialPositionInStream.LATEST, windowDuration,
>> StorageLevel.MEMORY_ONLY)
>> }
>>
>> val unionStream = ssc.union(streams).map(byteArray => new
>> String(byteArray))
>>
>> // used by json4s
>> implicit val formats = DefaultFormats
>> val eventsStream = unionStream.map{
>>     str => parse(str).extract[LogEvent]
>> }
>>
>> eventsStream.print(10)
>>
>> ssc.start()
>>
>> *// output of the previous paragraph including error message:*
>>
>> credentials: com.amazonaws.auth.AWSCredentials =
>> com.amazonaws.auth.BasicSessionCredentials@1181e472 endpointUrl: String
>> = kinesis.us-east-1.amazonaws.com appName: String =
>> StoredashCollector-debug streamName: String = XXXX windowDuration:
>> org.apache.spark.streaming.Duration = 10000 ms rememberDuration:
>> org.apache.spark.streaming.Duration = 420000 ms kinesisClient:
>> com.amazonaws.services.kinesis.AmazonKinesisClient =
>> com.amazonaws.services.kinesis.AmazonKinesisClient@275e9422 numStreams:
>> Int = 2 regionName: String = us-east-1 <console>:45: error: overloaded
>> method constructor StreamingContext with alternatives: (path:
>> String,sparkContext:
>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext
>> <and> (path: String,hadoopConf:
>> org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext
>> <and> (conf: org.apache.spark.SparkConf,batchDuration:
>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
>> <and> (sparkContext:
>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration:
>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
>> cannot be applied to
>> (org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,
>> org.apache.spark.streaming.Duration) val ssc = new
>> StreamingContext(sc,windowDuration)
>>
>>
>> That's it; I've sent you my cluster Id per e-mail.
>>
>> FA
>>
>> On 27 January 2016 at 21:51, Jonathan Kelly <jonathaka...@gmail.com>
>> wrote:
>>
>>> That's a really weird issue! Sorry though, I can't seem to reproduce it,
>>> not in spark-shell nor in Zeppelin (on emr-4.3.0). This is all I'm running:
>>>
>>> import org.apache.spark.streaming._
>>> val ssc = new StreamingContext(sc, Seconds(1))
>>>
>>> Is there anything else you're running? Are you specifying any special
>>> configuration when creating the cluster? Do you have a cluster id I can
>>> take a look at? Also, can you reproduce this in spark-shell or just in
>>> Zeppelin?
>>>
>>> ~ Jonathan
>>>
>>> On Wed, Jan 27, 2016 at 12:46 PM Felipe Almeida <falmeida1...@gmail.com>
>>> wrote:
>>>
>>>> Hi Johnathan.
>>>>
>>>> That issue (using* z**.load("mavenproject"*) is solved indeed
>>>> (thanks), but now I can't create a StreamingContext from the current
>>>> SparkContext (variable sc).
>>>>
>>>> When trying to run *val ssc = new StreamingContext(sc,Seconds(1))*, I
>>>> get the following error:
>>>>
>>>> <console>:45: error: overloaded method constructor StreamingContext
>>>> with alternatives: (path: String,sparkContext:
>>>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext)org.apache.spark.streaming.StreamingContext
>>>> <and> (path: String,hadoopConf:
>>>> org.apache.hadoop.conf.Configuration)org.apache.spark.streaming.StreamingContext
>>>> <and> (conf: org.apache.spark.SparkConf,batchDuration:
>>>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
>>>> <and> (sparkContext:
>>>> org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,batchDuration:
>>>> org.apache.spark.streaming.Duration)org.apache.spark.streaming.StreamingContext
>>>> cannot be applied to
>>>> (org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.org.apache.spark.SparkContext,
>>>> org.apache.spark.streaming.Duration) val ssc = new
>>>> StreamingContext(sc,Seconds(1))
>>>>
>>>> See how the modules leading up to the class name are funny:
>>>> org.apache.spark.org.apache.... it look's like it's it a loop. And one can
>>>> see in the error message that what I did was indeed use a SparkContext and
>>>> then a Duration, as per the docs.
>>>>
>>>> FA
>>>>
>>>> On 27 January 2016 at 02:14, Jonathan Kelly <jonathaka...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yeah, I saw your self-reply on SO but did not have time to reply there
>>>>> also before leaving for the day. I hope emr-4.3.0 works well for you!
>>>>> Please let me know if you run into any other issues, especially with Spark
>>>>> or Zeppelin, or if you have any feature requests.
>>>>>
>>>>> Thanks,
>>>>> Jonathan
>>>>> On Tue, Jan 26, 2016 at 6:02 PM Felipe Almeida <falmeida1...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you very much Jonathan; I'll be sure to try out the new label
>>>>>> as soon as I can!
>>>>>>
>>>>>> I eventually found a workaround which involved passing the package in
>>>>>> the *--packages* for spark-submit, which was under
>>>>>> */usr/lib/zeppelin/conf/zeppelin.sh*:
>>>>>> http://stackoverflow.com/a/35006137/436721
>>>>>>
>>>>>> Anyway, thanks for the reply, will let you know if the problem
>>>>>> persists.
>>>>>>
>>>>>> FA
>>>>>>
>>>>>> On 26 January 2016 at 23:19, Jonathan Kelly <jonathaka...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Felipe,
>>>>>>>
>>>>>>> I'm really sorry you had to deal with this frustration, but the good
>>>>>>> news is that we on EMR have fixed this issue and just released the fix 
>>>>>>> in
>>>>>>> emr-4.3.0 today. (At least, it is available in some regions today and
>>>>>>> should be available in the other regions by some time tomorrow.)
>>>>>>>
>>>>>>> BTW, this was discussed on another thread last month:
>>>>>>> http://apache-zeppelin-users-incubating-mailing-list.75479.x6.nabble.com/Zeppelin-notes-version-control-scheduler-and-external-deps-td1730.html
>>>>>>>  (Not
>>>>>>> sure why I show up with the name "Work". Oh well. =P)
>>>>>>>
>>>>>>> ~ Jonathan
>>>>>>>
>>>>>>> On Mon, Jan 25, 2016 at 5:21 PM Felipe Almeida <
>>>>>>> falmeida1...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I cannot, for the love of *** make `z.load("maven-project")` work
>>>>>>>> for Zeppelin 0.5.5 on Amazon Elastic MapReduce. I keep getting A *Null
>>>>>>>> Pointer Exception* due to, I think, something related to Sonatype.
>>>>>>>>
>>>>>>>> I've also asked a question on SO about this:
>>>>>>>> http://stackoverflow.com/questions/35005455/java-npe-when-loading-a-dependency-from-maven-from-within-zeppelin-on-aws-emr
>>>>>>>>
>>>>>>>> Has anyone gone through this?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> FA
>>>>>>>>
>>>>>>>> --
>>>>>>>> “Every time you stay out late; every time you sleep in; every time
>>>>>>>> you miss a workout; every time you don’t give 100% – You make it that 
>>>>>>>> much
>>>>>>>> easier for me to beat you.” - Unknown author
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> “Every time you stay out late; every time you sleep in; every time
>>>>>> you miss a workout; every time you don’t give 100% – You make it that 
>>>>>> much
>>>>>> easier for me to beat you.” - Unknown author
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> “Every time you stay out late; every time you sleep in; every time you
>>>> miss a workout; every time you don’t give 100% – You make it that much
>>>> easier for me to beat you.” - Unknown author
>>>>
>>>
>>
>>
>> --
>> “Every time you stay out late; every time you sleep in; every time you
>> miss a workout; every time you don’t give 100% – You make it that much
>> easier for me to beat you.” - Unknown author
>>
>
>
>
> --
> “Every time you stay out late; every time you sleep in; every time you
> miss a workout; every time you don’t give 100% – You make it that much
> easier for me to beat you.” - Unknown author
>

Re: Java NPE when Loading a Dependency from Maven from within Zeppelin on AWS EMR

Reply via email to