from:"Cheng, Hao"

RE: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Cheng, Hao

Congratulations!! Jerry, you really deserve it. Hao -Original Message- From: Mridul Muralidharan [mailto:mri...@gmail.com] Sent: Tuesday, August 29, 2017 12:04 PM To: Matei Zaharia Cc: dev ; Saisai Shao Subject: Re: Welcoming Saisai (Jerry) Shao as a committer Congratulations Jerry, w

RE: [VOTE] Release Apache Spark 2.0.0 (RC4)

2016-07-14 Thread Cheng, Hao

-1 Breaks the existing applications while using the Script Transformation in Spark SQL, as the default Record/Column delimiter class changed since we don’t get the default conf value from HiveConf any more, see SPARK-16515; This is a regression. From: Reynold Xin [mailto:r...@databricks.com]

RE: new datasource

2015-11-19 Thread Cheng, Hao

I think you probably need to write some code as you need to support the ES, there are 2 options per my understanding: Create a new Data Source from scratch, but you probably need to overwrite the interface at: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/s

RE: A proposal for Spark 2.0

2015-11-12 Thread Cheng, Hao

ced and fully addressed before RDDs. They would be presented as the normal/default/standard way to do things in Spark. RDDs, in contrast, would be presented later as a kind of lower-level, closer-to-the-metal API that can be used in atypical, more specialized contexts where DataFrames or DataSets don&

RE: A proposal for Spark 2.0

2015-11-12 Thread Cheng, Hao

I am not sure what the best practice for this specific problem, but it’s really worth to think about it in 2.0, as it is a painful issue for lots of users. By the way, is it also an opportunity to deprecate the RDD API (or internal API only?)? As lots of its functionality overlapping with DataFr

RE: Sort Merge Join from the filesystem

2015-11-09 Thread Cheng, Hao

Yes, we definitely need to think how to handle this case, probably even more common than both sorted/partitioned tables case, can you jump to the jira and leave comment there? From: Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] Sent: Tuesday, November 10, 2015 3:03 AM To: Cheng, Hao Cc

RE: dataframe slow down with tungsten turn on

2015-11-05 Thread Cheng, Hao

turn on -- Forwarded message -- From: gen tang mailto:gen.tan...@gmail.com>> Date: Fri, Nov 6, 2015 at 12:14 AM Subject: Re: dataframe slow down with tungsten turn on To: "Cheng, Hao" mailto:hao.ch...@intel.com>> Hi, My application is as follows: 1. crea

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Cheng, Hao

problem as you described, probably we can add additional checking / reporting rule for the abuse. From: Jeff Zhang [mailto:zjf...@gmail.com] Sent: Thursday, November 5, 2015 1:55 PM To: Cheng, Hao Cc: dev@spark.apache.org Subject: Re: Why LibSVMRelation and CsvRelation don't extends HadoopFsRel

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

2015-11-04 Thread Cheng, Hao

Probably 2 reasons: 1. HadoopFsRelation was introduced since 1.4, but seems CsvRelation was created based on 1.3 2. HadoopFsRelation introduces the concept of Partition, which probably not necessary for LibSVMRelation. But I think it will be easy to change as extending from HadoopFsR

RE: dataframe slow down with tungsten turn on

2015-11-04 Thread Cheng, Hao

BTW, 1 min V.S. 2 Hours, seems quite weird, can you provide more information on the ETL work? From: Cheng, Hao [mailto:hao.ch...@intel.com] Sent: Thursday, November 5, 2015 12:56 PM To: gen tang; dev@spark.apache.org Subject: RE: dataframe slow down with tungsten turn on 1.5 has critical

RE: dataframe slow down with tungsten turn on

2015-11-04 Thread Cheng, Hao

1.5 has critical performance / bug issues, you’d better try 1.5.1 or 1.5.2rc version. From: gen tang [mailto:gen.tan...@gmail.com] Sent: Thursday, November 5, 2015 12:43 PM To: dev@spark.apache.org Subject: Fwd: dataframe slow down with tungsten turn on Hi, In fact, I tested the same code with

RE: Sort Merge Join from the filesystem

2015-11-04 Thread Cheng, Hao

Yes, we probably need more change for the data source API if we need to implement it in a generic way. BTW, I create the JIRA by copy most of words from Alex. ☺ https://issues.apache.org/jira/browse/SPARK-11512 From: Reynold Xin [mailto:r...@databricks.com] Sent: Thursday, November 5, 2015 1:36

RE: ShuffledHashJoin Possible Issue

2015-10-18 Thread Cheng, Hao

Hi Gsvic, Can you please provide detail code / steps to reproduce that? Hao -Original Message- From: gsvic [mailto:victora...@gmail.com] Sent: Monday, October 19, 2015 3:55 AM To: dev@spark.apache.org Subject: ShuffledHashJoin Possible Issue I am doing some experiments with join algorit

RE: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Cheng, Hao

We actually meet the similiar problem in a real case, see https://issues.apache.org/jira/browse/SPARK-10474 After checking the source code, the external sort memory management strategy seems the root cause of the issue. Currently, we allocate the 4MB (page size) buffer as initial in the beginni

RE: [VOTE] Release Apache Spark 1.5.0 (RC3)

2015-09-06 Thread Cheng, Hao

Not sure if it’s too late, but we found a critical bug at https://issues.apache.org/jira/browse/SPARK-10466 UnsafeRow ser/de will cause assert error, particularly for sort-based shuffle with data spill, this is not acceptable as it’s very common in a large table joins. From: Reynold Xin [mailto

RE: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Cheng, Hao

OK, thanks, probably just myself… From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Friday, August 14, 2015 11:04 AM To: Cheng, Hao Cc: Josh Rosen; dev Subject: Re: Automatically deleting pull request comments left by AmplabJenkins I tried accessing just now. It took several seconds before the

RE: Automatically deleting pull request comments left by AmplabJenkins

2015-08-13 Thread Cheng, Hao

I found the https://spark-prs.appspot.com/ is super slow while open it in a new window recently, not sure just myself or everybody experience the same, is there anyways to speed up? From: Josh Rosen [mailto:rosenvi...@gmail.com] Sent: Friday, August 14, 2015 10:21 AM To: dev Subject: Re: Automat

RE: Potential bug broadcastNestedLoopJoin or default value of spark.sql.autoBroadcastJoinThreshold

2015-08-11 Thread Cheng, Hao

Firstly, spark.sql.autoBroadcastJoinThreshold only works for the EQUAL JOIN. Currently, for the non-equal join, if the join type is the INNER join, then it will be done by CartesianProduct join and BroadcastNestedLoopJoin works for the outer joins. In the BroadcastnestedLoopJoin, the table with

RE: thrift server reliability issue

2015-07-07 Thread Cheng, Hao

Yes, it's a known issue, either set a bigger heap size for driver, or you can try to set the ` spark.sql.thriftServer.incrementalCollect=true` , it's work around for the query returns a huge result set. From: Judy Nash [mailto:judyn...@exchange.microsoft.com] Sent: Wednesday, July 8, 2015 11:53

RE: [SparkSQL ] What is Exchange in physical plan for ?

2015-06-08 Thread Cheng, Hao

It means the data shuffling, and its arguments also show the partitioning strategy. -Original Message- From: invkrh [mailto:inv...@gmail.com] Sent: Monday, June 8, 2015 9:34 PM To: dev@spark.apache.org Subject: [SparkSQL ] What is Exchange in physical plan for ? Hi, DataFrame.explain()

RE: [VOTE] Release Apache Spark 1.4.0 (RC2)

2015-05-25 Thread Cheng, Hao

Add another Blocker issue, just created! It seems a regression. https://issues.apache.org/jira/browse/SPARK-7853 -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, May 25, 2015 3:37 PM To: Patrick Wendell Cc: dev@spark.apache.org Subject: Re: [VOTE] Release Apa

RE: SparkSQL errors in 1.4 rc when using with Hive 0.12 metastore

2015-05-24 Thread Cheng, Hao

Thanks for reporting this. We intend to support the multiple metastore versions in a single build(hive-0.13.1) by introducing the IsolatedClientLoader, but probably you’re hitting the bug, please file a jira issue for this. I will keep investigating on this also. Hao From: Mark Hamstra [mail

RE: Does Spark SQL (JDBC) support nest select with current version

2015-05-14 Thread Cheng, Hao

Spark SQL just load the query result as a new source (via JDBC), so DO NOT confused with the Spark SQL tables. They are totally independent database systems. From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID] Sent: Friday, May 15, 2015 1:59 PM To: Cheng, Hao; Dev Subject: Re: Does Spark SQL

RE: Does Spark SQL (JDBC) support nest select with current version

2015-05-14 Thread Cheng, Hao

You need to register the “dataFrame” as a table first and then do queries on it? Do you mean that also failed? From: Yi Zhang [mailto:zhangy...@yahoo.com.INVALID] Sent: Friday, May 15, 2015 1:10 PM To: Yi Zhang; Dev Subject: Re: Does Spark SQL (JDBC) support nest select with current version If I

RE: Add Char support in SQL dataTypes

2015-03-19 Thread Cheng, Hao

Can you use the Varchar or String instead? Currently, Spark SQL will convert the varchar into string type internally(without max length limitation). However, "char" type is not supported yet. -Original Message- From: A.M.Chan [mailto:kaka_1...@163.com] Sent: Friday, March 20, 2015 9:56

RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?

2015-03-10 Thread Cheng, Hao

I am not so sure if Hive supports change the metastore after initialized, I guess not. Spark SQL totally rely on Hive Metastore in HiveContext, probably that's why it doesn't work as expected for Q1. BTW, in most of cases, people configure the metastore settings in hive-site.xml, and will not c

RE: Join implementation in SparkSQL

2015-01-15 Thread Cheng, Hao

Not so sure about your question, but the SparkStrategies.scala and Optimizer.scala is a good start if you want to get details of the join implementation or optimization. -Original Message- From: Andrew Ash [mailto:and...@andrewash.com] Sent: Friday, January 16, 2015 4:52 AM To: Reynold

RE: Question on saveAsTextFile with overwrite option

2014-12-24 Thread Cheng, Hao

I am wondering if we can provide more friendly API, other than configuration for this purpose. What do you think Patrick? Cheng Hao -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Thursday, December 25, 2014 3:22 PM To: Shao, Saisai Cc: u...@spark.apache.org

RE: Where are the docs for the SparkSQL DataTypes?

2014-12-11 Thread Cheng, Hao

Part of it can be found at: https://github.com/apache/spark/pull/3429/files#diff-f88c3e731fcb17b1323b778807c35b38R34 Sorry it's a TO BE reviewed PR, but still should be informative. Cheng Hao -Original Message- From: Alessandro Baretta [mailto:alexbare...@gmail.com] Sent: F

RE: CREATE TABLE AS SELECT does not work with temp tables in 1.2.0

2014-12-06 Thread Cheng, Hao

I've created(reused) the PR https://github.com/apache/spark/pull/3336, hopefully we can fix this regression. Thanks for the reporting. Cheng Hao -Original Message- From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Saturday, December 6, 2014 4:51 AM To: kb

RE: [VOTE] Designating maintainers for some Spark components

2014-11-05 Thread Cheng, Hao

+1, that definitely will speeds up the PR reviewing / merging. -Original Message- From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Thursday, November 6, 2014 12:46 PM To: dev Subject: Re: [VOTE] Designating maintainers for some Spark components +1 since this is already the de facto

RE: Build with Hive 0.13.1 doesn't have datanucleus and parquet dependencies.

2014-10-27 Thread Cheng, Hao

Hive-thriftserver module is not included while specifying the profile hive-0.13.1. -Original Message- From: Jianshi Huang [mailto:jianshi.hu...@gmail.com] Sent: Monday, October 27, 2014 4:48 PM To: dev@spark.apache.org Subject: Build with Hive 0.13.1 doesn't have datanucleus and parquet

RE: problem with HiveContext inside Actor

2014-09-17 Thread Cheng, Hao

the HiveDriver will always get the null value when retrieving HiveConf. Cheng Hao From: Du Li [mailto:l...@yahoo-inc.com.INVALID] Sent: Thursday, September 18, 2014 7:51 AM To: u...@spark.apache.org; dev@spark.apache.org Subject: problem with HiveContext inside Actor Hi, Wonder anybody had simila

RE: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-31 Thread Cheng, Hao

Yes, the root cause for that is the output ObjectInspector in SerDe implementation doesn't reflect the real typeinfo. Hive actually provides the API like TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(TypeInfo) for the mapping. You probably need to update the code at https://github.

RE: [sql]enable spark sql cli support spark sql

2014-08-15 Thread Cheng, Hao

If so, probably we need to add the SQL dialects switching support for SparkSQLCLI, as Fei suggested. What do you think the priority for this? -Original Message- From: Cheng Lian [mailto:lian.cs@gmail.com] Sent: Friday, August 15, 2014 1:57 PM To: Cheng, Hao Cc: scwf; dev

RE: [sql]enable spark sql cli support spark sql

2014-08-14 Thread Cheng, Hao

Actually the SQL Parser (another SQL dialect in SparkSQL) is quite weak, and only support some basic queries, not sure what's the plan for its enhancement. -Original Message- From: scwf [mailto:wangf...@huawei.com] Sent: Friday, August 15, 2014 11:22 AM To: dev@spark.apache.org Subject:

RE: Welcoming Saisai (Jerry) Shao as a committer

RE: [VOTE] Release Apache Spark 2.0.0 (RC4)

RE: new datasource

RE: A proposal for Spark 2.0

RE: A proposal for Spark 2.0

RE: Sort Merge Join from the filesystem

RE: dataframe slow down with tungsten turn on

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

RE: Why LibSVMRelation and CsvRelation don't extends HadoopFsRelation ?

RE: dataframe slow down with tungsten turn on

RE: dataframe slow down with tungsten turn on

RE: Sort Merge Join from the filesystem

RE: ShuffledHashJoin Possible Issue

RE: Unable to acquire memory errors in HiveCompatibilitySuite

RE: [VOTE] Release Apache Spark 1.5.0 (RC3)

RE: Automatically deleting pull request comments left by AmplabJenkins

RE: Automatically deleting pull request comments left by AmplabJenkins

RE: Potential bug broadcastNestedLoopJoin or default value of spark.sql.autoBroadcastJoinThreshold

RE: thrift server reliability issue

RE: [SparkSQL ] What is Exchange in physical plan for ?

RE: [VOTE] Release Apache Spark 1.4.0 (RC2)

RE: SparkSQL errors in 1.4 rc when using with Hive 0.12 metastore

RE: Does Spark SQL (JDBC) support nest select with current version

RE: Does Spark SQL (JDBC) support nest select with current version

RE: Add Char support in SQL dataTypes

RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?

RE: Join implementation in SparkSQL

RE: Question on saveAsTextFile with overwrite option

RE: Where are the docs for the SparkSQL DataTypes?

RE: CREATE TABLE AS SELECT does not work with temp tables in 1.2.0

RE: [VOTE] Designating maintainers for some Spark components

RE: Build with Hive 0.13.1 doesn't have datanucleus and parquet dependencies.

RE: problem with HiveContext inside Actor

RE: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

RE: [sql]enable spark sql cli support spark sql

RE: [sql]enable spark sql cli support spark sql

36 matches

Site Navigation

Mail list logo

Footer information