Submit job to spark cluster Error ErrorMonitor dropping message...

2016-05-02 Thread Tenghuan He
disassociated, removing it. Can anyone help me? Thanks in advance Tenghuan He

Re: About nested RDD

2016-04-08 Thread Tenghuan He
ope I made it clear. ​ On Fri, Apr 8, 2016 at 4:22 PM, Holden Karau wrote: > It seems like the union function on RDDs might be what you are looking > for, or was there something else you were trying to achieve? > > > On Thursday, April 7, 2016, Tenghuan He wrote: > >> Hi

About nested RDD

2016-04-07 Thread Tenghuan He
Hi all, I know that nested RDDs are not possible like linke rdd1.map(x => x + rdd2.count()) I tried to create a custome RDD like following class MyRDD(base: RDD, part: Partitioner) extends RDD[(K, V)] { var rdds = new ArrayBuffer.empty[RDD[(K, (V, Int))]] def update(rdd: RDD[_]) { udds += rdd

Re: partition an empty RDD

2016-04-07 Thread Tenghuan He
> > On Thu, Apr 7, 2016 at 5:52 AM, Tenghuan He wrote: > > Hi all, > > > > I want to create an empty rdd and partition it > > > > val buffer: RDD[(K, (V, Int))] = base.context.emptyRDD[(K, (V, > > Int))].partitionBy(new HashPartitioner(5)) > > but got

partition an empty RDD

2016-04-06 Thread Tenghuan He
Hi all, I want to create an empty rdd and partition it val buffer: RDD[(K, (V, Int))] = base.context.emptyRDD[(K, (V, Int))].partitionBy(new HashPartitioner(5)) but got Error: No ClassTag available for K scala needs at runtime to have information about K , but how to solve this? Thanks in adva

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
n Mon, Mar 28, 2016 at 11:01 AM, Tenghuan He wrote: > Thanks very much Ted > > I added MyRDD.scala to the spark source code and rebuilt the whole spark > project, using myrdd.asInstanceOf[MyRDD] doesn't work. It seems that MyRDD > is not exposed to the spark-shell. > > F

Re: Re: IntelliJ idea not work well with spark

2016-03-27 Thread Tenghuan He
​Hi Wenchao, I use steps described in the page and it works great, you can have a try:) http://danielnee.com/2015/01/setting-up-intellij-for-spark/​ On Mon, Mar 28, 2016 at 9:38 AM, 吴文超 wrote: > for the simplest word count, > val wordCounts = textFile.flatMap(line => line.split(" ")).map(word =

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
d RDD and include your custom logic in the subclass. > > On Sun, Mar 27, 2016 at 10:14 AM, Tenghuan He > wrote: > >> ​Thanks Ted, >> >> but I have a doubt that as the code ​above (line 4) in the spark-shell >> shows myrdd is already a MyRDD, does that not mak

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
RDD in spark-shell. >> >> BTW I don't think it is good practice to add custom method to base RDD. >> >> On Sun, Mar 27, 2016 at 9:44 AM, Tenghuan He >> wrote: >> >>> Hi Ted, >>> >>> The codes are running in spark-shell &

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
MyRDD as the return type. > Or, you can cast myrdd as MyRDD in spark-shell. > > BTW I don't think it is good practice to add custom method to base RDD. > > On Sun, Mar 27, 2016 at 9:44 AM, Tenghuan He wrote: > >> Hi Ted, >> >> The codes are running in spar

Re: Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
s > > On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He wrote: > >> ​Hi everyone, >> >> I am creating a custom RDD which extends RDD and add a custom method, >> however the custom method cannot be found. >> The custom RDD looks like the following: >

Custom RDD in spark, cannot find custom method

2016-03-27 Thread Tenghuan He
​Hi everyone, I am creating a custom RDD which extends RDD and add a custom method, however the custom method cannot be found. The custom RDD looks like the following: class MyRDD[K, V]( var base: RDD[(K, V)], part: Partitioner ) extends RDD[(K, V)](base.context, Nil) { def *

Building spark submodule source code

2016-03-20 Thread Tenghuan He
to rebuild the whole spark project instead the spark-core submodule to make the changes work? Rebuiling the whole project is too time consuming, is there any better choice? Thanks & Best Regards Tenghuan He

Re: RDD[org.apache.spark.sql.Row] filter ERROR

2016-02-21 Thread Tenghuan He
df0.schema > schema: org.apache.spark.sql.types.StructType = > StructType(StructField(A,StringType,true), StructField(B,StringType,true), > StructField(C,StringType,true), StructField(num,IntegerType,false)) > > scala> val rdd1 = rdd0.filter(r => !idList.contains(r(3))) > rdd1: org.apache.spark.rdd.RDD

RDD[org.apache.spark.sql.Row] filter ERROR

2016-02-21 Thread Tenghuan He
to the closure. If the outer object is not serializable, then RDD DAG serialization would fail. You can simply reference the field member with a separate variable to workaround this: Can anyone tell me why? Thanks in advance :) Thanks & Best regards Tenghuan He [Stage 11:>