Re: PySpark 2.1 Not instantiating properly

2017-10-20 Thread Marco Mistroni
Hello i believe i followed instructions here to get Spark to work on Windows. The article refers to Win7, but it will work for win10 as well http://nishutayaltech.blogspot.co.uk/2015/04/how-to-run-apache-spark-on-windows7-in.html Jagat posted a similar link on winutils...i believe it would

Re: PySpark 2.1 Not instantiating properly

2017-10-20 Thread Aakash Basu
Hey Marco/Jagat, As I earlier informed you, that I've already done those basic checks and permission changes. eg: D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive, but to no avail. It still throws the same error. At the very first place, I do not understand, without any manual change, how did t

Re: Write to HDFS

2017-10-20 Thread Deepak Sharma
Better use coalesce instead of repatition On Fri, Oct 20, 2017 at 9:47 PM, Marco Mistroni wrote: > Use counts.repartition(1).save.. > Hth > > > On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote: > > Actually, when I run following code, > > val textFile = sc.textFile("Sample.txt") > val co

Re: PySpark 2.1 Not instantiating properly

2017-10-20 Thread Jagat Singh
Do you have winutils in your system relevant for your system. This SO post has infomation related https://stackoverflow.com/questions/34196302/the-root-scratch-dir-tmp-hive-on-hdfs-should-be-writable-current-permissions On 21 October 2017 at 03:16, Marco Mistroni wrote: > Did u build spark or

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-20 Thread lucas.g...@gmail.com
Right, that makes sense and I understood that. The thing I'm wondering about (And i think the answer is 'no' at this stage). When the optimizer is running and pushing predicates down, does it take into account indexing and other storage layer strategies in determining which predicates are process

Re: Write to HDFS

2017-10-20 Thread Marco Mistroni
Use counts.repartition(1).save.. Hth On Oct 20, 2017 3:01 PM, "Uğur Sopaoğlu" wrote: Actually, when I run following code, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByK

Re: PySpark 2.1 Not instantiating properly

2017-10-20 Thread Marco Mistroni
Did u build spark or download the zip? I remember having similar issue...either you have to give write perm to your /tmp directory or there's a spark config you need to override This error is not 2.1 specific...let me get home and check my configs I think I amended my /tmp permissions via xterm

Fwd: PySpark 2.1 Not instantiating properly

2017-10-20 Thread Aakash Basu
Hi, Any help please? What can be the issue? Thanks, Aakash. -- Forwarded message -- From: Aakash Basu Date: Fri, Oct 20, 2017 at 1:00 PM Subject: PySpark 2.1 Not instantiating properly To: user Hi all, I have Spark 2.1 installed in my laptop where I used to run all my program

Re: Does Apache Spark take into account JDBC indexes / statistics when optimizing queries?

2017-10-20 Thread Mich Talebzadeh
here below Gary filtered_df = spark.hiveContext.sql(""" SELECT * FROM df WHERE type = 'type' AND action = 'action' AND audited_changes LIKE '---\ncompany_id:\n- %' """) filtered_audits.registerTempTable("filtered_df") you are using hql to read

Re: Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu
Actually, when I run following code, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) It save the results into more than one partition like part-0, part-1. I w

Re: Write to HDFS

2017-10-20 Thread Marco Mistroni
Hi Could you just create an rdd/df out of what you want to save and store it in hdfs? Hth On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" wrote: > Hi all, > > In word count example, > > val textFile = sc.textFile("Sample.txt") > val counts = textFile.flatMap(line => line.split(" ")) >

Re: Is Spark suited for this use case?

2017-10-20 Thread JG Perrin
I have seen a similar scenario where we load data from a RDBMS into a NoSQL database… Spark made sense for velocity and parallel processing (and cost of licenses :) ). > On Oct 15, 2017, at 21:29, Saravanan Thirumalai > wrote: > > We are an Investment firm and have a MDM platform in oracle a

Re: Java Rdd of String to dataframe

2017-10-20 Thread JG Perrin
SK, Have you considered: Dataset df = spark.read().json(dfWithStringRowsContainingJson); jg > On Oct 11, 2017, at 16:35, sk skk wrote: > > Can we create a dataframe from a Java pair rdd of String . I don’t have a > schema as it will be a dynamic Json. I gave encoders.string class.

Re: Prediction using Classification with text attributes in Apache Spark MLLib

2017-10-20 Thread lmk
Trying to improve the old solution. Do we have a better text classifier now in Spark Mllib? Regards, lmk -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apach

Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu
Hi all, In word count example, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://master:8020/user/abc") I want to write collection of "*c

PySpark 2.1 Not instantiating properly

2017-10-20 Thread Aakash Basu
Hi all, I have Spark 2.1 installed in my laptop where I used to run all my programs. PySpark wasn't used for around 1 month, and after starting it now, I'm getting this exception (I've tried the solutions I could find on Google, but to no avail). Specs: Spark 2.1.1, Python 3.6, HADOOP 2.7, Window