Dear all,
I have a question about how to measure the runtime for a Spak application.
Here is an example:
- On the Spark UI: the total duration time is 2.0 minutes = 120 seconds
as following
[image: Screen Shot 2016-07-09 at 11.45.44 PM.png]
- However, when I check the jobs launched by
Hi All,
I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use
org.apache.spark.serializer.KryoSerializer as spark.serializer. And in the
dependency I added Kyro-3.0.3 as following:
com.esotericsoftware:kryo:3.0.3
When I wrot
Hi All,
I am running some spark scala code on zeppelin on CDH 5.5.1 (Spark version
1.5.0). I customized the Spark interpreter to use org.apache.spark.
serializer.KryoSerializer as spark.serializer. And in the dependency I
added Kyro-3.0.3 as following:
com.esotericsoftware:kryo:3.0.3
When I wro
Dear all,
Is there any way to change the host location for a certain partition of RDD?
"protected def getPreferredLocations(split: Partition)" can be used to
initialize the location, but how to change it after the initialization?
Thanks,
Fei
de the
> getPreferredLocations() to implement the logic of dynamic changing of the
> locations.
> > On Dec 30, 2016, at 12:06, Fei Hu wrote:
> >
> > Dear all,
> >
> > Is there any way to change the host location for a certain partition of
> RDD?
> >
> > "
Dear all,
I tried to customize my own RDD. In the getPreferredLocations() function, I
used the following code to query anonter RDD, which was used as an input to
initialize this customized RDD:
* val results: Array[Array[DataChunkPartition]] =
context.runJob(partitionsRDD, (con
Job inside getPreferredLocations().
> You can take a look at the source code of HadoopRDD to help you implement
> getPreferredLocations()
> appropriately.
>
> On Dec 31, 2016, at 09:48, Fei Hu wrote:
>
> That is a good idea.
>
> I tried add the following code to get ge
Dear all,
I want to equally divide a RDD partition into two partitions. That means,
the first half of elements in the partition will create a new partition,
and the second half of elements in the partition will generate another new
partition. But the two new partitions are required to be at the sa
locality.
Thanks,
Fei
On Sun, Jan 15, 2017 at 2:33 AM, Rishi Yadav wrote:
> Can you provide some more details:
> 1. How many partitions does RDD have
> 2. How big is the cluster
> On Sat, Jan 14, 2017 at 3:59 PM Fei Hu wrote:
>
>> Dear all,
>>
>> I want to equall
iting to HDFS, but it might still be a narrow dependency (satisfying your
> requirements) if you increase the # of partitions.
>
> Best,
> Anastasios
>
> On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu wrote:
>
>> Dear all,
>>
>> I want to equally divide a RDD partition
CoalescedRDD code to implement your
> requirement.
>
> Good luck!
> Cheers,
> Anastasios
>
>
> On Sun, Jan 15, 2017 at 5:39 PM, Fei Hu wrote:
>
>> Hi Anastasios,
>>
>> Thanks for your reply. If I just increase the numPartitions to be twice
>> larger
l be a narrow dependency (satisfying
> > your
> > requirements) if you increase the # of partitions.
> >
> > Best,
> > Anastasios
> >
> > On Sun, Jan 15, 2017 at 12:58 AM, Fei Hu <
>
> > hufei68@
>
> > > wrote:
> >
> &g
don’t think
> RDD number of partitions will be increased.
>
>
>
> Thanks,
>
> Jasbir
>
>
>
> *From:* Fei Hu [mailto:hufe...@gmail.com]
> *Sent:* Sunday, January 15, 2017 10:10 PM
> *To:* zouz...@cs.toronto.edu
> *Cc:* user @spark ; dev@spark.apache.org
> *Su
need to add few logic in compute() to
> decide which half of the parent partition is needed to output. And you need
> to get the correct preferred locations for the partitions sharing the same
> parent partition.
>
>
> Fei Hu wrote
> > Hi Liang-Chi,
> >
> > Yes, y
, 2017 at 2:07 PM, Pradeep Gollakota
wrote:
> Usually this kind of thing can be done at a lower level in the InputFormat
> usually by specifying the max split size. Have you looked into that
> possibility with your InputFormat?
>
> On Sun, Jan 15, 2017 at 9:42 PM, Fei Hu wrote:
15 matches
Mail list logo