SPARK-17647 seems still broken on 2.2.0

2017-12-18 Thread Dong Jiang
Hi, I made a comment to the following ticket: https://issues.apache.org/jira/browse/SPARK-17647 I believe it is still broken on 2.2.0, about backslash escaping. Can someone take a look? Thanks, Dong -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --

Corrupt parquet file

2018-02-05 Thread Dong Jiang
Hi, We are running on Spark 2.2.1, generating parquet files, like the following pseudo code df.write.parquet(...) We have recently noticed parquet file corruptions, when reading the parquet in Spark or Presto, as the following: Caused by: org.apache.parquet.io.ParquetDecodingException: Can not r

Re: Corrupt parquet file

2018-02-05 Thread Dong Jiang
a recurrence? Can you share your experience? Thanks, Dong From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Monday, February 5, 2018 at 12:38 PM To: Dong Jiang Cc: Spark Dev List Subject: Re: Corrupt parquet file Dong, We see this from time to time as well. In my experience, it

Re: Corrupt parquet file

2018-02-05 Thread Dong Jiang
uot; Date: Monday, February 5, 2018 at 1:34 PM To: Dong Jiang Cc: Spark Dev List Subject: Re: Corrupt parquet file We ensure the bad node is removed from our cluster and reprocess to replace the data. We only see this once or twice a year, so it isn't a significant problem. We've d

Re: Corrupt parquet file

2018-02-05 Thread Dong Jiang
before, what do you do to prevent a recurrence? Thanks, Dong From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Monday, February 5, 2018 at 12:46 PM To: Dong Jiang Cc: Spark Dev List Subject: Re: Corrupt parquet file If you can still access the logs, then you should be able to

Re: Corrupt parquet file

2018-02-12 Thread Dong Jiang
back the entire data set, and then copy from HDFS to S3. Any other thoughts? From: Steve Loughran Date: Monday, February 12, 2018 at 2:27 PM To: "rb...@netflix.com" Cc: Dong Jiang , Apache Spark Dev Subject: Re: Corrupt parquet file What failure mode is likely here? As the uploads

Spark SQL unexpected behavior when comparing timestamp to date

2018-03-02 Thread Dong Jiang
Hi, I opened a JIRA ticket https://issues.apache.org/jira/browse/SPARK-23549, I don't know if anyone can take a look? Spark SQL unexpected behavior when comparing timestamp to date scala> spark.version res1: String = 2.2.1 scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp) betw