SparkSQL Added file get Exception: is a directory and recursive is not turned on

2016-07-06 Thread linxi zeng
Hi, all: As recorded in https://issues.apache.org/jira/browse/SPARK-16408, when using Spark-sql to execute sql like: add file hdfs://xxx/user/test; If the HDFS path( hdfs://xxx/user/test) is a directory, then we will get an exception like: org.apache.spark.SparkException: Added file hdfs:

Re: Latest spark release in the 1.4 branch

2016-07-06 Thread Niranda Perera
Thanks Reynold On Thu, Jul 7, 2016 at 11:40 AM, Reynold Xin wrote: > Yes definitely. > > > On Wed, Jul 6, 2016 at 11:08 PM, Niranda Perera > wrote: > >> Thanks Reynold for the prompt response. Do you think we could use a >> 1.4-branch latest build in a production environment? >> >> >> >> On Thu

Re: Latest spark release in the 1.4 branch

2016-07-06 Thread Reynold Xin
Yes definitely. On Wed, Jul 6, 2016 at 11:08 PM, Niranda Perera wrote: > Thanks Reynold for the prompt response. Do you think we could use a > 1.4-branch latest build in a production environment? > > > > On Thu, Jul 7, 2016 at 11:33 AM, Reynold Xin wrote: > >> I think last time I tried I had s

Re: Latest spark release in the 1.4 branch

2016-07-06 Thread Niranda Perera
Thanks Reynold for the prompt response. Do you think we could use a 1.4-branch latest build in a production environment? On Thu, Jul 7, 2016 at 11:33 AM, Reynold Xin wrote: > I think last time I tried I had some trouble releasing it because the > release scripts no longer work with branch-1.4.

Re: Latest spark release in the 1.4 branch

2016-07-06 Thread Reynold Xin
I think last time I tried I had some trouble releasing it because the release scripts no longer work with branch-1.4. You can build from the branch yourself, but it might be better to upgrade to the later versions. On Wed, Jul 6, 2016 at 11:02 PM, Niranda Perera wrote: > Hi guys, > > May I know

Latest spark release in the 1.4 branch

2016-07-06 Thread Niranda Perera
Hi guys, May I know if you have halted development in the Spark 1.4 branch? I see that there is a release tag for 1.4.2 but it was never released. Can we expect a 1.4.x bug fixing release anytime soon? Best -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.

Re: Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

2016-07-06 Thread nirandap
Hi Yash, Yes, AFAIK, that is the expected behavior of the Overwrite mode. I think you can use the following approaches if you want to perform a job on each partitions [1] for each partition in DF : https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/DataFr

Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?

2016-07-06 Thread Yash Sharma
Hi All, While writing a partitioned data frame as partitioned text files I see that Spark deletes all available partitions while writing few new partitions. dataDF.write.partitionBy(“year”, “month”, > “date”).mode(SaveMode.Overwrite).text(“s3://data/test2/events/”) Is this an expected behavior ?

Stopping Spark executors

2016-07-06 Thread Mr rty ff
HiI like to recreate this bug https://issues.apache.org/jira/browse/SPARK-13979They talking about stopping Spark executors.Its not clear exactly how do I stop the executorsThanks

[PySPARK] - Py4J binary transfer survey

2016-07-06 Thread Holden Karau
Hi PySpark Devs, The Py4j developer has a survey up for Py4J users - https://github.com/bartdag/py4j/issues/237 it might be worth our time to provide some input on how we are using and would like to be using Py4J if binary transfer was improved. I'm happy to fill it out with my thoughts - but if o

Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-06 Thread Ted Yu
Running the following command: build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6 -Psparkr -Dhadoop.version=2.7.0 package The build stopped with this test failure: ^[[31m- SPARK-9757 Persist Parquet relation with decimal column *** FAILED ***^[[0m On Wed, Jul 6, 2016 at 6:25 AM, Sea

Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-06 Thread Cody Koeninger
I know some usages of the 0.10 kafka connector will be broken until https://github.com/apache/spark/pull/14026 is merged, but the 0.10 connector is a new feature, so not blocking. Sean I'm assuming the DirectKafkaStreamSuite failure you saw was for 0.8? I'll take another look at it. On Wed, Jul

Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-06 Thread Sean Owen
Yeah we still have some blockers; I agree SPARK-16379 is a blocker which came up yesterday. We also have 5 existing blockers, all doc related: SPARK-14808 Spark MLlib, GraphX, SparkR 2.0 QA umbrella SPARK-14812 ML, Graph 2.0 QA: API: Experimental, DeveloperApi, final, sealed audit SPARK-14816 Upda

Re: [VOTE] Release Apache Spark 2.0.0 (RC2)

2016-07-06 Thread Maciej Bryński
-1 https://issues.apache.org/jira/browse/SPARK-16379 https://issues.apache.org/jira/browse/SPARK-16371 2016-07-06 7:35 GMT+02:00 Reynold Xin : > Please vote on releasing the following candidate as Apache Spark version > 2.0.0. The vote is open until Friday, July 8, 2016 at 23:00 PDT and passes > i

Re: Why's ds.foreachPartition(println) not possible?

2016-07-06 Thread Jacek Laskowski
Thanks Cody, Reynold, and Ryan! Learnt a lot and feel "corrected". Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Jul 6, 2016 at 2:46 AM, Shixiong(Ryan) Zhu

Re: Spark Task failure with File segment length as negative

2016-07-06 Thread Priya Ch
Is anyone resolved this ? Thanks, Padma CH On Wed, Jun 22, 2016 at 4:39 PM, Priya Ch wrote: > Hi All, > > I am running Spark Application with 1.8TB of data (which is stored in Hive > tables format). I am reading the data using HiveContect and processing it. > The cluster has 5 nodes total, 25

Re: MinMaxScaler With features include category variables

2016-07-06 Thread Yuhao Yang
You may also find VectorSlicer and SQLTransformer useful in your case. Just out of curiosity, how would you typically handles categorical features, except for OneHotEncoder. Regards, Yuhao 2016-07-01 4:00 GMT-07:00 Yanbo Liang : > You can combine the columns which are need to be normalized into