Re: cache DataFrame

2016-02-11 Thread Gaurav Agarwal
Thanks for the below info. I have one more question. I have my own framework where the Sql query is already build ,so I am thinking instead of using data frame filter criteria I could use Dataframe d=sqlcontext.Sql(" and append query here"). d.printschema() List row =d.collectaslist(); Here when I

Re: cache DataFrame

2016-02-11 Thread Rishabh Wadhawan
Hi Gaurav Spark will not load the tables into memory at both the points as DataFrames are just abstractions of something that might happen in future when you actually throw an (ACTION) like say df.collectAsList or df.show. When you run DataFrame df = sContext.load("jdbc","(select * from employe

cache DataFrame

2016-02-11 Thread Gaurav Agarwal
Hi When the dataFrame will load the table into memory when it reads from HIVe/Phoenix or from any database. These are two points where need one info , when tables will be loaded into memory or cached when at point 1 or point 2 below. 1. DataFrame df = sContext.load("jdbc","(select * from employ

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Romi Kuntsman
> > BUT, after change limit(500) to limit(1000). The code report > NullPointerException. > I had a similar situation, and the problem was with a certain record. Try to find which records are returned when you limit to 1000 but not returned when you limit to 500. Could it be a NPE thrown from Pixel

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-29 Thread Zhang, Jingyu
Thanks Romi, I resize the dataset to 7MB, however, the code show NullPointerException exception as well. Did you try to cache a DataFrame with just a single row? Yes, I tried. But, Same problem. . Do you rows have any columns with null values? No, I had filter out null values before cache the

Re: NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-28 Thread Romi Kuntsman
Did you try to cache a DataFrame with just a single row? Do you rows have any columns with null values? Can you post a code snippet here on how you load/generate the dataframe? Does dataframe.rdd.cache work? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Thu, Oct 29, 2015 at 4:33

NullPointerException when cache DataFrame in Java (Spark1.5.1)

2015-10-28 Thread Zhang, Jingyu
It is not a problem to use JavaRDD.cache() for 200M data (all Objects read form Json Format). But when I try to use DataFrame.cache(), It shown exception in below. My machine can cache 1 G data in Avro format without any problem. 15/10/29 13:26:23 INFO GeneratePredicate: Code generated in 154.531