Congratulations!! Jerry, you really deserve it.
Hao
-Original Message-
From: Mridul Muralidharan [mailto:mri...@gmail.com]
Sent: Tuesday, August 29, 2017 12:04 PM
To: Matei Zaharia
Cc: dev ; Saisai Shao
Subject: Re: Welcoming Saisai (Jerry) Shao as a committer
Congratulations Jerry, w
-1
Breaks the existing applications while using the Script Transformation in Spark
SQL, as the default Record/Column delimiter class changed since we don’t get
the default conf value from HiveConf any more, see SPARK-16515;
This is a regression.
From: Reynold Xin [mailto:r...@databricks.com]
I think you probably need to write some code as you need to support the ES,
there are 2 options per my understanding:
Create a new Data Source from scratch, but you probably need to overwrite the
interface at:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/s
ced and fully addressed before RDDs.
They would be presented as the normal/default/standard way to do things in
Spark. RDDs, in contrast, would be presented later as a kind of lower-level,
closer-to-the-metal API that can be used in atypical, more specialized contexts
where DataFrames or DataSets don&
I am not sure what the best practice for this specific problem, but it’s really
worth to think about it in 2.0, as it is a painful issue for lots of users.
By the way, is it also an opportunity to deprecate the RDD API (or internal API
only?)? As lots of its functionality overlapping with DataFr
Yes, we definitely need to think how to handle this case, probably even more
common than both sorted/partitioned tables case, can you jump to the jira and
leave comment there?
From: Alex Nastetsky [mailto:alex.nastet...@vervemobile.com]
Sent: Tuesday, November 10, 2015 3:03 AM
To: Cheng, Hao
Cc
turn on
-- Forwarded message --
From: gen tang mailto:gen.tan...@gmail.com>>
Date: Fri, Nov 6, 2015 at 12:14 AM
Subject: Re: dataframe slow down with tungsten turn on
To: "Cheng, Hao" mailto:hao.ch...@intel.com>>
Hi,
My application is as follows:
1. crea
problem as you described, probably we can add additional checking /
reporting rule for the abuse.
From: Jeff Zhang [mailto:zjf...@gmail.com]
Sent: Thursday, November 5, 2015 1:55 PM
To: Cheng, Hao
Cc: dev@spark.apache.org
Subject: Re: Why LibSVMRelation and CsvRelation don't extends HadoopFsRel
Probably 2 reasons:
1. HadoopFsRelation was introduced since 1.4, but seems CsvRelation was
created based on 1.3
2. HadoopFsRelation introduces the concept of Partition, which probably
not necessary for LibSVMRelation.
But I think it will be easy to change as extending from HadoopFsR
BTW, 1 min V.S. 2 Hours, seems quite weird, can you provide more information on
the ETL work?
From: Cheng, Hao [mailto:hao.ch...@intel.com]
Sent: Thursday, November 5, 2015 12:56 PM
To: gen tang; dev@spark.apache.org
Subject: RE: dataframe slow down with tungsten turn on
1.5 has critical
1.5 has critical performance / bug issues, you’d better try 1.5.1 or 1.5.2rc
version.
From: gen tang [mailto:gen.tan...@gmail.com]
Sent: Thursday, November 5, 2015 12:43 PM
To: dev@spark.apache.org
Subject: Fwd: dataframe slow down with tungsten turn on
Hi,
In fact, I tested the same code with
Yes, we probably need more change for the data source API if we need to
implement it in a generic way.
BTW, I create the JIRA by copy most of words from Alex. ☺
https://issues.apache.org/jira/browse/SPARK-11512
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Thursday, November 5, 2015 1:36
Hi Gsvic, Can you please provide detail code / steps to reproduce that?
Hao
-Original Message-
From: gsvic [mailto:victora...@gmail.com]
Sent: Monday, October 19, 2015 3:55 AM
To: dev@spark.apache.org
Subject: ShuffledHashJoin Possible Issue
I am doing some experiments with join algorit
We actually meet the similiar problem in a real case, see
https://issues.apache.org/jira/browse/SPARK-10474
After checking the source code, the external sort memory management strategy
seems the root cause of the issue.
Currently, we allocate the 4MB (page size) buffer as initial in the beginni
Not sure if it’s too late, but we found a critical bug at
https://issues.apache.org/jira/browse/SPARK-10466
UnsafeRow ser/de will cause assert error, particularly for sort-based shuffle
with data spill, this is not acceptable as it’s very common in a large table
joins.
From: Reynold Xin [mailto
OK, thanks, probably just myself…
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Friday, August 14, 2015 11:04 AM
To: Cheng, Hao
Cc: Josh Rosen; dev
Subject: Re: Automatically deleting pull request comments left by AmplabJenkins
I tried accessing just now.
It took several seconds before the
I found the https://spark-prs.appspot.com/ is super slow while open it in a new
window recently, not sure just myself or everybody experience the same, is
there anyways to speed up?
From: Josh Rosen [mailto:rosenvi...@gmail.com]
Sent: Friday, August 14, 2015 10:21 AM
To: dev
Subject: Re: Automat
Firstly, spark.sql.autoBroadcastJoinThreshold only works for the EQUAL JOIN.
Currently, for the non-equal join, if the join type is the INNER join, then it
will be done by CartesianProduct join and BroadcastNestedLoopJoin works for the
outer joins.
In the BroadcastnestedLoopJoin, the table with
Yes, it's a known issue, either set a bigger heap size for driver, or you can
try to set the ` spark.sql.thriftServer.incrementalCollect=true` , it's work
around for the query returns a huge result set.
From: Judy Nash [mailto:judyn...@exchange.microsoft.com]
Sent: Wednesday, July 8, 2015 11:53
It means the data shuffling, and its arguments also show the partitioning
strategy.
-Original Message-
From: invkrh [mailto:inv...@gmail.com]
Sent: Monday, June 8, 2015 9:34 PM
To: dev@spark.apache.org
Subject: [SparkSQL ] What is Exchange in physical plan for ?
Hi,
DataFrame.explain()
Add another Blocker issue, just created! It seems a regression.
https://issues.apache.org/jira/browse/SPARK-7853
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, May 25, 2015 3:37 PM
To: Patrick Wendell
Cc: dev@spark.apache.org
Subject: Re: [VOTE] Release Apa
Thanks for reporting this.
We intend to support the multiple metastore versions in a single
build(hive-0.13.1) by introducing the IsolatedClientLoader, but probably you’re
hitting the bug, please file a jira issue for this.
I will keep investigating on this also.
Hao
From: Mark Hamstra [mail
Spark SQL just load the query result as a new source (via JDBC), so DO NOT
confused with the Spark SQL tables. They are totally independent database
systems.
From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID]
Sent: Friday, May 15, 2015 1:59 PM
To: Cheng, Hao; Dev
Subject: Re: Does Spark SQL
You need to register the “dataFrame” as a table first and then do queries on
it? Do you mean that also failed?
From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID]
Sent: Friday, May 15, 2015 1:10 PM
To: Yi Zhang; Dev
Subject: Re: Does Spark SQL (JDBC) support nest select with current version
If I
Can you use the Varchar or String instead? Currently, Spark SQL will convert
the varchar into string type internally(without max length limitation).
However, "char" type is not supported yet.
-Original Message-
From: A.M.Chan [mailto:kaka_1...@163.com]
Sent: Friday, March 20, 2015 9:56
I am not so sure if Hive supports change the metastore after initialized, I
guess not. Spark SQL totally rely on Hive Metastore in HiveContext, probably
that's why it doesn't work as expected for Q1.
BTW, in most of cases, people configure the metastore settings in
hive-site.xml, and will not c
Not so sure about your question, but the SparkStrategies.scala and
Optimizer.scala is a good start if you want to get details of the join
implementation or optimization.
-Original Message-
From: Andrew Ash [mailto:and...@andrewash.com]
Sent: Friday, January 16, 2015 4:52 AM
To: Reynold
I am wondering if we can provide more friendly API, other than configuration
for this purpose. What do you think Patrick?
Cheng Hao
-Original Message-
From: Patrick Wendell [mailto:pwend...@gmail.com]
Sent: Thursday, December 25, 2014 3:22 PM
To: Shao, Saisai
Cc: u...@spark.apache.org
Part of it can be found at:
https://github.com/apache/spark/pull/3429/files#diff-f88c3e731fcb17b1323b778807c35b38R34
Sorry it's a TO BE reviewed PR, but still should be informative.
Cheng Hao
-Original Message-
From: Alessandro Baretta [mailto:alexbare...@gmail.com]
Sent: F
I've created(reused) the PR https://github.com/apache/spark/pull/3336,
hopefully we can fix this regression.
Thanks for the reporting.
Cheng Hao
-Original Message-
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Saturday, December 6, 2014 4:51 AM
To: kb
+1, that definitely will speeds up the PR reviewing / merging.
-Original Message-
From: Cheng Lian [mailto:lian.cs@gmail.com]
Sent: Thursday, November 6, 2014 12:46 PM
To: dev
Subject: Re: [VOTE] Designating maintainers for some Spark components
+1 since this is already the de facto
Hive-thriftserver module is not included while specifying the profile
hive-0.13.1.
-Original Message-
From: Jianshi Huang [mailto:jianshi.hu...@gmail.com]
Sent: Monday, October 27, 2014 4:48 PM
To: dev@spark.apache.org
Subject: Build with Hive 0.13.1 doesn't have datanucleus and parquet
the HiveDriver will always get the null value
when retrieving HiveConf.
Cheng Hao
From: Du Li [mailto:l...@yahoo-inc.com.INVALID]
Sent: Thursday, September 18, 2014 7:51 AM
To: u...@spark.apache.org; dev@spark.apache.org
Subject: problem with HiveContext inside Actor
Hi,
Wonder anybody had simila
Yes, the root cause for that is the output ObjectInspector in SerDe
implementation doesn't reflect the real typeinfo.
Hive actually provides the API like
TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(TypeInfo) for the
mapping.
You probably need to update the code at
https://github.
If so, probably we need to add the SQL dialects switching support for
SparkSQLCLI, as Fei suggested. What do you think the priority for this?
-Original Message-
From: Cheng Lian [mailto:lian.cs@gmail.com]
Sent: Friday, August 15, 2014 1:57 PM
To: Cheng, Hao
Cc: scwf; dev
Actually the SQL Parser (another SQL dialect in SparkSQL) is quite weak, and
only support some basic queries, not sure what's the plan for its enhancement.
-Original Message-
From: scwf [mailto:wangf...@huawei.com]
Sent: Friday, August 15, 2014 11:22 AM
To: dev@spark.apache.org
Subject:
36 matches
Mail list logo