Hi all,
I'm new to Spark and this question may be trivial or has already been
answered, but when I do a 'describe table' from SparkSQL CLI it seems to
try looking at all records at the table (which takes a really long time for
big table) instead of just giving me the metadata of the table. Would
a
hat is the format of the table? Is the table
> partitioned?
>
> Thanks,
>
> Yin
>
> On Sun, Jul 12, 2015 at 6:01 PM, ayan guha wrote:
>
>> Describe computes statistics, so it will try to query the table. The one
>> you are looking for is df.printSchema()
Well for adhoc queries you can use the CLI
On Mon, Jul 13, 2015 at 5:34 PM, Ron Gonzalez
wrote:
> Hi,
> I have a question for Spark SQL. Is there a way to be able to use Spark
> SQL on YARN without having to submit a job?
> Bottom line here is I want to be able to reduce the latency of runni
Hi all,
I'm having conf/hive-site.xml pointing to my Hive metastore but sparksql
CLI doesn't pick it up. (copying the same conf/ files to spark1.4 and 1.2
works fine). Just wondering if someone has seen this before,
Thanks
Hi all,
I'm upgrading from spark1.3 to spark1.4 and when trying to run spark-sql
CLI. It gave an ```ava.lang.UnsupportedOperationException: Not implemented
by the TFS FileSystem implementation``` exception. I did not get this error
with 1.3 and I don't use any TFS FileSystem. Full stack trace is
So, this has to do with the fact that 1.4 has a new way to interact with
HiveMetastore, still investigating. Would really appreciate if anybody has
any insights :)
On Tue, Jul 14, 2015 at 4:28 PM, Jerrick Hoang
wrote:
> Hi all,
>
> I'm upgrading from spark1.3 to spark1.4 and when
Hi all,
I'm aware of the support for schema evolution via DataFrame API. Just
wondering what would be the best way to go about dealing with schema
evolution with Hive metastore tables. So, say I create a table via SparkSQL
CLI, how would I deal with Parquet schema evolution?
Thanks,
J
I'm new to Spark, any ideas would be much appreciated! Thanks
On Sat, Jul 18, 2015 at 11:11 AM, Jerrick Hoang
wrote:
> Hi all,
>
> I'm aware of the support for schema evolution via DataFrame API. Just
> wondering what would be the best way to go about dealing with schema
ith Hive metastore tables"? Hive
> doesn't take schema evolution into account. Could you please give a
> concrete use case? Are you trying to write Parquet data with extra columns
> into an existing metastore Parquet table?
>
> Cheng
>
>
> On 7/21/15 1:04 AM, Jerrick H
how big is the dataset? how complicated is the query?
On Sun, Jul 26, 2015 at 12:47 AM Louis Hust wrote:
> Hi, all,
>
> I am using spark DataFrame to fetch small table from MySQL,
> and i found it cost so much than directly access MySQL Using JDBC.
>
> Time cost for Spark is about 2033ms, and di
Hi all,
I have a partitioned parquet table (very small table with only 2
partitions). The version of spark is 1.4.1, parquet version is 1.7.0. I
applied this patch to spark [SPARK-7743] so I assume that spark can read
parquet files normally, however, I'm getting this when trying to do a
simple `se
g PARQUET-136, which has been fixed in (the real) Parquet
> 1.7.0 https://issues.apache.org/jira/browse/PARQUET-136
>
> Cheng
>
>
> On 8/8/15 6:20 AM, Jerrick Hoang wrote:
>
> Hi all,
>
> I have a partitioned parquet table (very small table with only 2
> partitions
Hi all,
I'm a little confused about how refresh table (SPARK-5833) should work. So
I did the following,
val df1 = sc.makeRDD(1 to 5).map(i => (i, i * 2)).toDF("single", "double")
df1.write.parquet("hdfs:///test_table/key=1")
Then I created an external table by doing,
CREATE EXTERNAL TABLE `tm
Hi all,
I did a simple experiment with Spark SQL. I created a partitioned parquet
table with only one partition (date=20140701). A simple `select count(*)
from table where date=20140701` would run very fast (0.1 seconds). However,
as I added more partitions the query takes longer and longer. When
d, Aug 19, 2015 at 7:51 PM, Jerrick Hoang
> wrote:
>
>> Hi all,
>>
>> I did a simple experiment with Spark SQL. I created a partitioned parquet
>> table with only one partition (date=20140701). A simple `select count(*)
>> from table where date=20140701` would run v
u can try set the spark.sql.sources.partitionDiscovery.enabled to
> false.
>
>
>
> BTW, which version are you using?
>
>
>
> Hao
>
>
>
> *From:* Jerrick Hoang [mailto:jerrickho...@gmail.com]
> *Sent:* Thursday, August 20, 2015 12:16 PM
> *To:* Philip Weaver
> On Wed, Aug 19, 2015 at 10:53 PM, Cheng, Hao wrote:
>
>> Can you make some more profiling? I am wondering if the driver is busy
>> with scanning the HDFS / S3.
>>
>> Like jstack
>>
>>
>>
>> And also, it’s will be great if you can paste the ph
try with hadoop version 2.7.1 .. It is known that s3a works really
> well with parquet which is available in 2.7. They fixed lot of issues
> related to metadata reading there...
> On Aug 21, 2015 11:24 PM, "Jerrick Hoang" wrote:
>
>> @Cheng, Hao : Physical plans
anybody has any suggestions?
On Fri, Aug 21, 2015 at 3:14 PM, Jerrick Hoang
wrote:
> Is there a workaround without updating Hadoop? Would really appreciate if
> someone can explain what spark is trying to do here and what is an easy way
> to turn this off. Thanks all!
>
> On Fr
chael Armbrust [mailto:mich...@databricks.com]
> *Sent:* Monday, August 24, 2015 2:13 PM
> *To:* Philip Weaver
> *Cc:* Jerrick Hoang ; Raghavendra Pandey <
> raghavendra.pan...@gmail.com>; User ; Cheng, Hao <
> hao.ch...@intel.com>
>
> *Subject:* Re: Spark Sql behaves strangely w
Would be interested to know the answer too.
On Wed, Aug 26, 2015 at 11:45 AM, Sadhan Sood wrote:
> Interestingly, if there is nothing running on dev spark-shell, it recovers
> successfully and regains the lost executors. Attaching the log for that.
> Notice, the "Registering block manager .." st
21 matches
Mail list logo