I have replace default java serialization with Kyro.
It indeed reduce the shuffle size and the performance has been improved,
however the shuffle speed remains unchanged.
I am quite newbie to Spark, does anyone have idea about towards which
direction I should go to find the root cause?
周千昊 于2015年
ta, cause we kinda copied MR implementation
> into Spark.
>
> Let us know if more info is needed.
>
> On Fri, Oct 23, 2015 at 10:24 AM, 周千昊 wrote:
>
> > +kylin dev list
> >
> > 周千昊 于2015年10月23日周五 上午10:20写道:
> >
> > > Hi, Reynold
> > >
+kylin dev list
周千昊 于2015年10月23日周五 上午10:20写道:
> Hi, Reynold
> Using glom() is because it is easy to adapt to calculation logic
> already implemented in MR. And o be clear, we are still in POC.
> Since the results shows there is almost no difference between this
> glo
ems unnecessarily expensive to materialize each
> partition in memory.
>
>
> On Thu, Oct 22, 2015 at 2:02 AM, 周千昊 wrote:
>
>> Hi, spark community
>> I have an application which I try to migrate from MR to Spark.
>> It will do some calculations from Hive and o
Hi, spark community
I have an application which I try to migrate from MR to Spark.
It will do some calculations from Hive and output to hfile which will
be bulk load to HBase Table, details as follow:
Rdd input = getSourceInputFromHive()
Rdd> mapSideResult =
input.glom().mapP
I am thinking that creating a shared object outside the closure, use this
object to hold the byte array.
will this work?
周千昊 于2015年8月14日周五 下午4:02写道:
> Hi,
> All I want to do is that,
> 1. read from some source
> 2. do some calculation to get some byte array
> 3.
Hi,
All I want to do is that,
1. read from some source
2. do some calculation to get some byte array
3. write the byte array to hdfs
In hadoop, I can share an ImmutableByteWritable, and do some
System.arrayCopy, it will prevent the application from creating a lot of
small object
ing spark in production? Spark 1.3 is better than spark1.4.
>
> -- 原始邮件 ------
> *发件人:* "周千昊";;
> *发送时间:* 2015年8月14日(星期五) 中午11:14
> *收件人:* "Sea"<261810...@qq.com>; "dev@spark.apache.org"<
> dev@spark.apache.org>;
Hi Sea
I have updated spark to 1.4.1, however the problem still exists, any
idea?
Sea <261810...@qq.com>于2015年8月14日周五 上午12:36写道:
> Yes, I guess so. I see this bug before.
>
>
> -- 原始邮件 ------
> *发件人:* "周千昊";;
> *发送时间:* 2015年8月13日
Hi sea
Is it the same issue as https://issues.apache.org/jira/browse/SPARK-8368
Sea <261810...@qq.com>于2015年8月13日周四 下午6:52写道:
> Are you using 1.4.0? If yes, use 1.4.1
>
>
> -- 原始邮件 ------
> *发件人:* "周千昊";;
> *发送时间:* 2015年8月13日
Hi,
I am using spark 1.4 when an issue occurs to me.
I am trying to use the aggregate function:
JavaRdd rdd = some rdd;
HashMap zeroValue = new HashMap();
// add initial key-value pair for zeroValue
rdd.aggregate(zeroValue,
new Function2,
11 matches
Mail list logo