SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would a

Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
hat is the format of the table? Is the table > partitioned? > > Thanks, > > Yin > > On Sun, Jul 12, 2015 at 6:01 PM, ayan guha wrote: > >> Describe computes statistics, so it will try to query the table. The one >> you are looking for is df.printSchema()

Re: Basic Spark SQL question

2015-07-13 Thread Jerrick Hoang
Well for adhoc queries you can use the CLI On Mon, Jul 13, 2015 at 5:34 PM, Ron Gonzalez wrote: > Hi, > I have a question for Spark SQL. Is there a way to be able to use Spark > SQL on YARN without having to submit a job? > Bottom line here is I want to be able to reduce the latency of runni

hive-site.xml spark1.3

2015-07-13 Thread Jerrick Hoang
Hi all, I'm having conf/hive-site.xml pointing to my Hive metastore but sparksql CLI doesn't pick it up. (copying the same conf/ files to spark1.4 and 1.2 works fine). Just wondering if someone has seen this before, Thanks

Getting not implemented by the TFS FileSystem implementation

2015-07-14 Thread Jerrick Hoang
Hi all, I'm upgrading from spark1.3 to spark1.4 and when trying to run spark-sql CLI. It gave an ```ava.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation``` exception. I did not get this error with 1.3 and I don't use any TFS FileSystem. Full stack trace is

Re: Getting not implemented by the TFS FileSystem implementation

2015-07-16 Thread Jerrick Hoang
So, this has to do with the fact that 1.4 has a new way to interact with HiveMetastore, still investigating. Would really appreciate if anybody has any insights :) On Tue, Jul 14, 2015 at 4:28 PM, Jerrick Hoang wrote: > Hi all, > > I'm upgrading from spark1.3 to spark1.4 and when

Spark-hive parquet schema evolution

2015-07-18 Thread Jerrick Hoang
Hi all, I'm aware of the support for schema evolution via DataFrame API. Just wondering what would be the best way to go about dealing with schema evolution with Hive metastore tables. So, say I create a table via SparkSQL CLI, how would I deal with Parquet schema evolution? Thanks, J

Re: Spark-hive parquet schema evolution

2015-07-20 Thread Jerrick Hoang
I'm new to Spark, any ideas would be much appreciated! Thanks On Sat, Jul 18, 2015 at 11:11 AM, Jerrick Hoang wrote: > Hi all, > > I'm aware of the support for schema evolution via DataFrame API. Just > wondering what would be the best way to go about dealing with schema

Re: Spark-hive parquet schema evolution

2015-07-21 Thread Jerrick Hoang
ith Hive metastore tables"? Hive > doesn't take schema evolution into account. Could you please give a > concrete use case? Are you trying to write Parquet data with extra columns > into an existing metastore Parquet table? > > Cheng > > > On 7/21/15 1:04 AM, Jerrick H

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Jerrick Hoang
how big is the dataset? how complicated is the query? On Sun, Jul 26, 2015 at 12:47 AM Louis Hust wrote: > Hi, all, > > I am using spark DataFrame to fetch small table from MySQL, > and i found it cost so much than directly access MySQL Using JDBC. > > Time cost for Spark is about 2033ms, and di

Spark failed while trying to read parquet files

2015-08-07 Thread Jerrick Hoang
Hi all, I have a partitioned parquet table (very small table with only 2 partitions). The version of spark is 1.4.1, parquet version is 1.7.0. I applied this patch to spark [SPARK-7743] so I assume that spark can read parquet files normally, however, I'm getting this when trying to do a simple `se

Re: Spark failed while trying to read parquet files

2015-08-07 Thread Jerrick Hoang
g PARQUET-136, which has been fixed in (the real) Parquet > 1.7.0 https://issues.apache.org/jira/browse/PARQUET-136 > > Cheng > > > On 8/8/15 6:20 AM, Jerrick Hoang wrote: > > Hi all, > > I have a partitioned parquet table (very small table with only 2 > partitions

Refresh table

2015-08-10 Thread Jerrick Hoang
Hi all, I'm a little confused about how refresh table (SPARK-5833) should work. So I did the following, val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double") df1.write.parquet("hdfs:///test_table/key=1") Then I created an external table by doing, CREATE EXTERNAL TABLE `tm

Spark Sql behaves strangely with tables with a lot of partitions

2015-08-19 Thread Jerrick Hoang
Hi all, I did a simple experiment with Spark SQL. I created a partitioned parquet table with only one partition (date=20140701). A simple `select count(*) from table where date=20140701` would run very fast (0.1 seconds). However, as I added more partitions the query takes longer and longer. When

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-19 Thread Jerrick Hoang
d, Aug 19, 2015 at 7:51 PM, Jerrick Hoang > wrote: > >> Hi all, >> >> I did a simple experiment with Spark SQL. I created a partitioned parquet >> table with only one partition (date=20140701). A simple `select count(*) >> from table where date=20140701` would run v

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-19 Thread Jerrick Hoang
u can try set the spark.sql.sources.partitionDiscovery.enabled to > false. > > > > BTW, which version are you using? > > > > Hao > > > > *From:* Jerrick Hoang [mailto:jerrickho...@gmail.com] > *Sent:* Thursday, August 20, 2015 12:16 PM > *To:* Philip Weaver

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-21 Thread Jerrick Hoang
> On Wed, Aug 19, 2015 at 10:53 PM, Cheng, Hao wrote: > >> Can you make some more profiling? I am wondering if the driver is busy >> with scanning the HDFS / S3. >> >> Like jstack >> >> >> >> And also, it’s will be great if you can paste the ph

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-21 Thread Jerrick Hoang
try with hadoop version 2.7.1 .. It is known that s3a works really > well with parquet which is available in 2.7. They fixed lot of issues > related to metadata reading there... > On Aug 21, 2015 11:24 PM, "Jerrick Hoang" wrote: > >> @Cheng, Hao : Physical plans

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-23 Thread Jerrick Hoang
anybody has any suggestions? On Fri, Aug 21, 2015 at 3:14 PM, Jerrick Hoang wrote: > Is there a workaround without updating Hadoop? Would really appreciate if > someone can explain what spark is trying to do here and what is an easy way > to turn this off. Thanks all! > > On Fr

Re: Spark Sql behaves strangely with tables with a lot of partitions

2015-08-24 Thread Jerrick Hoang
chael Armbrust [mailto:mich...@databricks.com] > *Sent:* Monday, August 24, 2015 2:13 PM > *To:* Philip Weaver > *Cc:* Jerrick Hoang ; Raghavendra Pandey < > raghavendra.pan...@gmail.com>; User ; Cheng, Hao < > hao.ch...@intel.com> > > *Subject:* Re: Spark Sql behaves strangely w

Re: Spark cluster multi tenancy

2015-08-26 Thread Jerrick Hoang
Would be interested to know the answer too. On Wed, Aug 26, 2015 at 11:45 AM, Sadhan Sood wrote: > Interestingly, if there is nothing running on dev spark-shell, it recovers > successfully and regains the lost executors. Attaching the log for that. > Notice, the "Registering block manager .." st