Re: Contribution to Apache Spark

2016-09-03 Thread Tomasz Gawęda
Hi, Contribution rules are described here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Pozdrawiam / Best regards, Tomek Gawęda W dniu 2016-09-03 o 21:58, aditya1702 pisze: Hello, I am Aditya Vyas and I am currently in my third year of college doing BTech in my engi

Contribution to Apache Spark

2016-09-03 Thread aditya1702
Hello, I am Aditya Vyas and I am currently in my third year of college doing BTech in my engineering. I know python, a little bit of Java. I want to start contribution in Apache Spark. This is my first time in the field of Big Data. Can someone please help me as to how to get started. Which resourc

Re: critical bugs to be fixed in Spark 2.0.1?

2016-09-03 Thread Miao Wang
I just noticed JoshRosen sent a PR to this bug. From: Miao Wang/San Francisco/IBM@IBMUS To: tomerk11 Cc: dev@spark.apache.org Date: 09/02/2016 04:04 PM Subject:Re: critical bugs to be fixed in Spark 2.0.1? I am trying to reproduce it on my cluster based on your instructi

Re: Is Spark's KMeans unable to handle bigdata?

2016-09-03 Thread Georgios Samaras
Thank you very much Sean! If you would like, this could serve as an answer in StackOverflow's question: [Is Spark's kMeans unable to handle bigdata?]( http://stackoverflow.com/questions/39260820/is-sparks-kmeans-unable-to-handle-bigdata ). Enjoy your weekend, George On Sat, Sep 3, 2016 at 1:22 AM

Re: Catalog, SessionCatalog and ExternalCatalog in spark 2.0

2016-09-03 Thread Kapil Malik
Thanks Raghavendra :) Will look into Analyzer as well. Kapil Malik *Sr. Principal Engineer | Data Platform, Technology* M: +91 8800836581 | T: 0124-433 | EXT: 20910 ASF Centre A | 1st Floor | Udyog Vihar Phase IV | Gurgaon | Haryana | India *Disclaimer:* This communication is for the sole us

Re: Committing Kafka offsets when using DirectKafkaInputDStream

2016-09-03 Thread Cody Koeninger
The Kafka commit api isn't transactional, you aren't going to get exactly once behavior out of it even if you were committing offsets on a per-partition basis. This doesn't really have anything to do with Spark; the old code you posted was already inherently broken. Make your outputs idempotent a

Subscription

2016-09-03 Thread Omkar Reddy
Subscribe me!

Catalog, SessionCatalog and ExternalCatalog in spark 2.0

2016-09-03 Thread Kapil Malik
Hi all, I have a Spark SQL 1.6 application in production which does following on executing sqlContext.sql(...) - 1. Identify the table-name mentioned in query 2. Use an external database to decide where's the data located, in which format (parquet or csv or jdbc) etc. 3. Load the dataframe 4. Regi

Re: Support for Hive 2.x

2016-09-03 Thread Steve Loughran
On 2 Sep 2016, at 18:40, Dongjoon Hyun mailto:dongj...@apache.org>> wrote: Hi, Rostyslav, After your email, I also tried to search in this morning, but I didn't find a proper one. The last related issue is SPARK-8064, `Upgrade Hive to 1.2` https://issues.apache.org/jira/browse/SPARK-8064 If

Re: Is Spark's KMeans unable to handle bigdata?

2016-09-03 Thread Sean Owen
I opened https://issues.apache.org/jira/browse/SPARK-17389 to track some improvements, but by far the big one is that the init steps defaults to 5, when the paper says that 2 is pretty much optimal here. It's much faster with that setting. On Fri, Sep 2, 2016 at 6:45 PM, Georgios Samaras wrote: >