after seeing Hyukjin Kwon's comment in SPARK-17583 i think its safe to say that what i am seeing with csv is not bug or regression. it was unintended and/or unreliable behavior in spark 2.0.x
On Wed, Nov 30, 2016 at 5:56 PM, Koert Kuipers <ko...@tresata.com> wrote: > running our inhouse unit-tests (that work with spark 2.0.2) against spark > 2.1.0-rc1 i see the following issues. > > any test that use avro (spark-avro 3.1.0) have this error: > java.lang.AbstractMethodError > at org.apache.spark.sql.execution.datasources.FileFormatWriter$ > SingleDirectoryWriteTask.<init>(FileFormatWriter.scala:232) > at org.apache.spark.sql.execution.datasources. > FileFormatWriter$.org$apache$spark$sql$execution$ > datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:182) > at org.apache.spark.sql.execution.datasources. > FileFormatWriter$$anonfun$write$1$$anonfun$3.apply( > FileFormatWriter.scala:129) > at org.apache.spark.sql.execution.datasources. > FileFormatWriter$$anonfun$write$1$$anonfun$3.apply( > FileFormatWriter.scala:128) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run( > Executor.scala:282) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > so looks like some api got changed or broken. i dont know if this is an > issue or if this is OK. > > also a bunch of unit test related to reading and writing csv files fail. > the issue seems to be newlines inside quoted values. this worked before and > now it doesnt work anymore. i dont know if this was an accidentally > supported feature and its ok to be broken? i am not even sure it is a good > idea to support newlines inside quoted values. anyhow they still get > written out the same way as before, but now when reading it back in things > break down. > > > On Mon, Nov 28, 2016 at 8:25 PM, Reynold Xin <r...@databricks.com> wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.1.0. The vote is open until Thursday, December 1, 2016 at 18:00 UTC and >> passes if a majority of at least 3 +1 PMC votes are cast. >> >> [ ] +1 Release this package as Apache Spark 2.1.0 >> [ ] -1 Do not release this package because ... >> >> >> To learn more about Apache Spark, please see http://spark.apache.org/ >> >> The tag to be voted on is v2.1.0-rc1 (80aabc0bd33dc5661a90133156247 >> e7a8c1bf7f5) >> >> The release files, including signatures, digests, etc. can be found at: >> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-bin/ >> >> Release artifacts are signed with the following key: >> https://people.apache.org/keys/committer/pwendell.asc >> >> The staging repository for this release can be found at: >> https://repository.apache.org/content/repositories/orgapachespark-1216/ >> >> The documentation corresponding to this release can be found at: >> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc1-docs/ >> >> >> ======================================= >> How can I help test this release? >> ======================================= >> If you are a Spark user, you can help us test this release by taking an >> existing Spark workload and running on this release candidate, then >> reporting any regressions. >> >> =============================================================== >> What should happen to JIRA tickets still targeting 2.1.0? >> =============================================================== >> Committers should look at those and triage. Extremely important bug >> fixes, documentation, and API tweaks that impact compatibility should be >> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0. >> >> >> >