Re: Welcome Jose Torres as a Spark committer

2019-02-05 Thread Kazuaki Ishizaki
Congratulations, Jose! Kazuaki Ishizaki From: Gengliang Wang To: dev Date: 2019/01/31 18:32 Subject:Re: Welcome Jose Torres as a Spark committer Congrats Jose! 在 2019年1月31日,上午6:51,Bryan Cutler 写道: Congrats Jose! On Tue, Jan 29, 2019, 10:48 AM Shixiong Zhu

[VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-05 Thread Takeshi Yamamuro
Please vote on releasing the following candidate as Apache Spark version 2.3.3. The vote is open until February 8 6:00PM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.3 [ ] -1 Do not release this package because

Distributed tracing and Spark

2019-02-05 Thread loicg
Hi all, I am interested in instrumenting Spark with OpenTracing to get good user level infirmation about the tasks being executed. I started doing some work, mainly in TransportClient and TransportRequestHandler, to start OpenTracing spans when sending an RpcRequest and finish the spans in a modi

Re: DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-05 Thread Ryan Blue
Shubham, DataSourceV2 passes Spark's internal representation to your source and expects Spark's internal representation back from the source. That's why you consume and produce InternalRow: "internal" indicates that Spark doesn't need to convert the values. Spark's internal representation for a d

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger
To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See https://kafka.apache.org/10/javadoc/?org/apache/kafka/clients/c

Re: Array indexing functions

2019-02-05 Thread Sean Owen
Is it standard SQL or implemented in Hive? Because UDFs are so relatively easy in Spark we don't need tons of builtins like an RDBMS does. On Tue, Feb 5, 2019, 7:43 AM Petar Zečević > Hi everybody, > I finally created the JIRA ticket and the pull request for the two array > indexing functions: >

Re: Array indexing functions

2019-02-05 Thread Petar Zečević
Hi everybody, I finally created the JIRA ticket and the pull request for the two array indexing functions: https://issues.apache.org/jira/browse/SPARK-26826 Can any of the committers please check it out? Thanks, Petar Petar Zečević writes: > Hi, > I implemented two array functions that are

DataSourceV2 producing wrong date value in Custom Data Writer

2019-02-05 Thread Shubham Chaurasia
Hi All, I am using custom DataSourceV2 implementation (*Spark version 2.3.2*) Here is how I am trying to pass in *date type *from spark shell. scala> val df = > sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype", > col("datetype").cast("date")) > scala> df.write.format("com