rtitionedTables
>
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables>
>
> DataFrameWriter also has a partitionBy method.
>
> On Thu, Aug 20, 2015 at 7:29 AM, VIJAYAKUMAR JAWAHARLAL <mailto:sparkh...@data2o.io>> w
Hi
I have a question regarding data frame partition. I read a hive table from
spark and following spark api converts it as DF.
test_df = sqlContext.sql(“select * from hivetable1”)
How does spark decide partition of test_df? Is there a way to partition test_df
based on some column while readin
llect an unbounded amount of items
> into memory could be causing it.
>
> Either way, the logs for the executors should be able to give you some
> insight, have you looked at those yet?
>
> On Tue, Aug 18, 2015 at 6:26 PM, VIJAYAKUMAR JAWAHARLAL <mailto:spar
Hi All
Why am I getting ExecutorLostFailure and executors are completely lost for rest
of the processing? Eventually it makes job to fail. One thing for sure that lot
of shuffling happens across executors in my program.
Is there a way to understand and debug ExecutorLostFailure? Any pointers
Hi
I am trying to compute stats on a lookup table from spark which resides in
hive. I am invoking spark API as follows. It gives me NoSuchTableException.
Table is double verified and subsequent statement “sqlContext.sql(“select *
from cpatext.lkup”)” picks up the table correctly. I am wondering
n.
>
>
>
>
> On 8/17/15, 12:39 PM, "VIJAYAKUMAR JAWAHARLAL" wrote:
>
>> Thanks for your help
>>
>> I tried to cache the lookup tables and left out join with the big table
>> (DF). Join does not seem to be using broadcast join-still it goes w
0:27 AM, Silvio Fiorito
> wrote:
>
> You could cache the lookup DataFrames, it’ll then do a broadcast join.
>
>
>
>
> On 8/14/15, 9:39 AM, "VIJAYAKUMAR JAWAHARLAL" wrote:
>
>> Hi
>>
>> I am facing huge performance problem when I am trying
Hi
I am facing huge performance problem when I am trying to left outer join very
big data set (~140GB) with bunch of small lookups [Start schema type]. I am
using data frame in spark sql. It looks like data is shuffled and skewed when
that join happens. Is there any way to improve performance