Re: Data frame created from hive table and its partition

2015-08-20 Thread VIJAYAKUMAR JAWAHARLAL
rtitionedTables > > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables> > > DataFrameWriter also has a partitionBy method. > > On Thu, Aug 20, 2015 at 7:29 AM, VIJAYAKUMAR JAWAHARLAL <mailto:sparkh...@data2o.io>> w

Data frame created from hive table and its partition

2015-08-20 Thread VIJAYAKUMAR JAWAHARLAL
Hi I have a question regarding data frame partition. I read a hive table from spark and following spark api converts it as DF. test_df = sqlContext.sql(“select * from hivetable1”) How does spark decide partition of test_df? Is there a way to partition test_df based on some column while readin

Re: What is the reason for ExecutorLostFailure?

2015-08-19 Thread VIJAYAKUMAR JAWAHARLAL
llect an unbounded amount of items > into memory could be causing it. > > Either way, the logs for the executors should be able to give you some > insight, have you looked at those yet? > > On Tue, Aug 18, 2015 at 6:26 PM, VIJAYAKUMAR JAWAHARLAL <mailto:spar

What is the reason for ExecutorLostFailure?

2015-08-18 Thread VIJAYAKUMAR JAWAHARLAL
Hi All Why am I getting ExecutorLostFailure and executors are completely lost for rest of the processing? Eventually it makes job to fail. One thing for sure that lot of shuffling happens across executors in my program. Is there a way to understand and debug ExecutorLostFailure? Any pointers

COMPUTE STATS on hive table - NoSuchTableException

2015-08-18 Thread VIJAYAKUMAR JAWAHARLAL
Hi I am trying to compute stats on a lookup table from spark which resides in hive. I am invoking spark API as follows. It gives me NoSuchTableException. Table is double verified and subsequent statement “sqlContext.sql(“select * from cpatext.lkup”)” picks up the table correctly. I am wondering

Re: Left outer joining big data set with small lookups

2015-08-18 Thread VIJAYAKUMAR JAWAHARLAL
n. > > > > > On 8/17/15, 12:39 PM, "VIJAYAKUMAR JAWAHARLAL" wrote: > >> Thanks for your help >> >> I tried to cache the lookup tables and left out join with the big table >> (DF). Join does not seem to be using broadcast join-still it goes w

Re: Left outer joining big data set with small lookups

2015-08-17 Thread VIJAYAKUMAR JAWAHARLAL
0:27 AM, Silvio Fiorito > wrote: > > You could cache the lookup DataFrames, it’ll then do a broadcast join. > > > > > On 8/14/15, 9:39 AM, "VIJAYAKUMAR JAWAHARLAL" wrote: > >> Hi >> >> I am facing huge performance problem when I am trying

Left outer joining big data set with small lookups

2015-08-14 Thread VIJAYAKUMAR JAWAHARLAL
Hi I am facing huge performance problem when I am trying to left outer join very big data set (~140GB) with bunch of small lookups [Start schema type]. I am using data frame in spark sql. It looks like data is shuffled and skewed when that join happens. Is there any way to improve performance