Re: SparkOscope: Enabling Spark Optimization through Cross-stack Monitoring and Visualization

2016-02-05 Thread Pete Robbins
Yiannis, I'm interested in what you've done here as I was looking for ways to allow the Spark UI to display custom metrics in a pluggable way without having to modify the Spark source code. It would be good to see if we could have modify your code to add extension points into the UI so we could co

Re: Building Spark with Custom Hadoop Version

2016-02-05 Thread Steve Loughran
> On 4 Feb 2016, at 23:11, Ted Yu wrote: > > Assuming your change is based on hadoop-2 branch, you can use 'mvn install' > command which would put artifacts under 2.8.0-SNAPSHOT subdir in your local > maven repo. > > Here is an example: > ~/.m2/repository/org/apache/hadoop/hadoop-hdfs/2.8.0-S

Re: Building Spark with Custom Hadoop Version

2016-02-05 Thread Steve Loughran
> On 4 Feb 2016, at 23:11, Ted Yu wrote: > > Assuming your change is based on hadoop-2 branch, you can use 'mvn install' > command which would put artifacts under 2.8.0-SNAPSHOT subdir in your local > maven repo. > + generally, unless you want to run all the hadoop tests, set the -DskipTes

Spark process failing to receive data from the Kafka queue in yarn-client mode.

2016-02-05 Thread Rachana Srivastava
I am trying to run following code using yarn-client mode in but getting slow readprocessor error mentioned below but the code works just fine in the local mode. Any pointer is really appreciated. Line of code to receive data from the Kafka Queue: JavaPairReceiverInputDStream messages = KafkaU

Preserving partitioning with dataframe select

2016-02-05 Thread Matt Cheah
Hi everyone, When using raw RDDs, it is possible to have a map() operation indicate that the partitioning for the RDD would be preserved by the map operation. This makes it easier to reduce the overhead of shuffles by ensuring that RDDs are co-partitioned when they are joined. When I'm using D