To add, schema evaluation is better for parquet compared to orc (at the
cost of a bit slowness) as orc is truly index based;
especially useful in case you would want to delete some column later.
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?pr
https://github.com/airbnb/airbnb-spark-thrift
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>
On Thu, Mar 1, 2018 at 6:05 AM, Nikhil Goyal wrote:
> Hi guys,
>
> I have a RDD of thrift struct. I want to convert it i
Hi,
Is there any config to change the storage memory fraction for driver; as
i'm not caching anything in driver and by default it is picking from
spark.memory.fraction (0.9)
spark.memory.storageFraction (0.6);
whose value i've set as per my executor usage.
Regards,
Sushrut Ikhar
[im
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>
Can you add more details like are you using rdds/datasets/sql ..; are you
doing group by/ joins ; is your input splittable?
btw, you can pass the config the same way you are passing memryOverhead:
e.g.
--conf spark.default.parallelism=1000 or through spark-context in code
Regards,
Sushrut Ikhar
Well the issue was because I was using some non thread-safe functions for
generating the key.
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>
On Tue, Dec 15, 2015 at 2:27 PM, Paweł Szulc wrote:
> Hard to imagine. Can yo
-1.4.1.
Thanks in advance.
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>
Hi,
I have myself used union in a similar case. And applied reduceByKey on it.
Union + reduceByKey will suffice join... but you will have to first use Map
so that all values are of same datatype
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutik
This presentation may clarify many of your doubts.
https://www.youtube.com/watch?v=7ooZ4S7Ay6Y
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>
On Mon, Nov 2, 2015 at 7:15 PM, Denny Lee wrote:
> In addition, you may want
shows that no RDD partitioned are actually being cached.
How do I split then without shuffling thrice?
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<https://about.me/sushrutikhar?promo=email_sig>
Hey Jean,
Thanks for the quick response. I am using spark 1.4.1 pre-built with hadoop
2.6.
Yes the Yarn cluster has multiple running worker nodes.
It would a great help if you can tell how to look for the executors logs.
Regards,
Sushrut Ikhar
[image: https://]about.me/sushrutikhar
<ht
is now gated
for [5000] ms. Reason is: [Disassociated].
I believe that executors are starting but are unable to connect back to the
driver.
How do I resolve this?
Also, I need help in locating the driver and executor node logs.
Thanks.
Regards,
Sushrut Ikhar
[image: https://]about.me
12 matches
Mail list logo