[discuss]Support RDD using JDBC data source in PySpark

2022-09-19 Thread javaca...@163.com
o me. I am a bigdata engineer, like to contribute for open source. I already summit 2 PR for Apache Flink(FLINK-26609, FLINK-26728) and its merged\closed. So i think if i can get the jira ticket, i can implemented it fairly well. thanks. . javaca...@163.com

How to persistent database/table created in sparkSession

2017-12-04 Thread 163
Hi, How can I persistent database/table created in spark application? object TestPersistentDB { def main(args:Array[String]): Unit = { val spark = SparkSession.builder() .appName("Create persistent table") .config("spark.

SparkSQL not support CharType

2017-11-22 Thread 163
Hi, when I use Dataframe with table schema, It goes wrong: val test_schema = StructType(Array( StructField("id", IntegerType, false), StructField("flag", CharType(1), false), StructField("time", DateType, false))); val df = spark.read.format("com.databricks.spark.csv") .schema(test_s

Re: How to tune the performance of Tpch query5 within Spark

2017-07-16 Thread 163
I change the UDF but the performance seems still slow. What can I do else? > 在 2017年7月14日,下午8:34,Wenchen Fan 写道: > > Try to replace your UDF with Spark built-in expressions, it should be as > simple as `$”x” * (lit(1) - $”y”)`. > >> On 14 Jul 2017, at 5:46 PM, 163 > &

How to tune the performance of Tpch query5 within Spark

2017-07-14 Thread 163
I modify the tech query5 to DataFrame: val forders = spark.read.parquet("hdfs://dell127:20500/SparkParquetDoubleTimestamp100G/orders ”).filter("o_orderdate < 1995-01-01 and o_orderdate >= 1994-01-01").select("o_custkey", "o_orderkey") val flineitem = spark.read.parquet("hdfs://dell127:20500/Spa

How to tune the performance of Tpch query5 within Spark

2017-07-14 Thread 163
> > I modify the tech query5 to DataFrame: > val forders = > spark.read.parquet("hdfs://dell127:20500/SparkParquetDoubleTimestamp100G/orders > > ”).filter("o_orderdate > < 1995-01-01 and o_orderdate >= 1994-01-01").select("o_custkey", > "o_orderkey") > val flineitem = > spark.read.parquet("

Kafka Support new topic subscriptions without requiring restart of the streaming context

2016-08-07 Thread r7raul1...@163.com
How to add new topic to kafka without requiring restart of the streaming context? r7raul1...@163.com

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread fightf...@163.com
Hi, Have you ever considered cassandra as a replacement ? We are now almost the seem usage as your engine, e.g. using mysql to store initial aggregated data. Can you share more about your kind of Cube queries ? We are very interested in that arch too : ) Best, Sun. fightf...@163.com

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-09 Thread fightf...@163.com
prompt response. fightf...@163.com From: tsh Date: 2015-11-10 02:56 To: fightf...@163.com; user; dev Subject: Re: OLAP query using spark dataframe with cassandra Hi, I'm in the same position right now: we are going to implement something like OLAP BI + Machine Learning explorations on the

Re: Re: OLAP query using spark dataframe with cassandra

2015-11-08 Thread fightf...@163.com
of olap architecture. And we are happy to hear more use case from this community. Best, Sun. fightf...@163.com From: Jörn Franke Date: 2015-11-09 14:40 To: fightf...@163.com CC: user; dev Subject: Re: OLAP query using spark dataframe with cassandra Is there any distributor supporting

OLAP query using spark dataframe with cassandra

2015-11-08 Thread fightf...@163.com
-apache-cassandra-and-spark fightf...@163.com

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-05-12 Thread fightf...@163.com
Hi, there Which version are you using ? Actually the problem seems gone after we change our spark version from 1.2.0 to 1.3.0 Not sure what the internal changes did. Best, Sun. fightf...@163.com From: Night Wolf Date: 2015-05-12 22:05 To: fightf...@163.com CC: Patrick Wendell; user; dev

How to create a Row from a List or Array in Spark using Scala

2015-02-28 Thread r7raul1...@163.com
import org.apache.spark.sql.catalyst.expressions._ val values: JavaArrayList[Any] = new JavaArrayList() computedValues = Row(values.get(0),values.get(1)) //It is not good by use get(index). How to create a Row from a List or Array in Spark using Scala . r7raul1...@163.com

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-12 Thread fightf...@163.com
application? Does spark provide such configs for achieving that goal? We know that this is trickle to get it working. Just want to know that how could this be resolved, or from other possible channel for we did not cover. Expecting for your kind advice. Thanks, Sun. fightf...@163.com

Re: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data sets

2015-02-11 Thread fightf...@163.com
Hi, Really have no adequate solution got for this issue. Expecting any available analytical rules or hints. Thanks, Sun. fightf...@163.com From: fightf...@163.com Date: 2015-02-09 11:56 To: user; dev Subject: Re: Sort Shuffle performance issues about using AppendOnlyMap for large data