Re: Error in show()

2018-09-06 Thread Apostolos N. Papadopoulos
Can you isolate the row that is causing the problem? I mean start using show(31) up to show(60). Perhaps this will help you to understand the problem. regards, Apostolos On 07/09/2018 01:11 πμ, dimitris plakas wrote: Hello everyone, I am new in Pyspark and i am facing an issue. Let me expl

Re: [External Sender] Re: How to make pyspark use custom python?

2018-09-06 Thread Femi Anthony
Are you sure that pyarrow is deployed on your slave hosts ? If not, you will either have to get it installed or ship it along when you call spark-submit by zipping it up and specifying the zipfile to be shipped using the --py-files zipfile.zip option A quick check would be to ssh to a slave host,

Re: How to make pyspark use custom python?

2018-09-06 Thread mithril
The whole content in `spark-env.sh` is ``` SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=10.104.85.78:2181,10.104.114.131:2181,10.135.2.132:2181 -Dspark.deploy.zookeeper.dir=/spark" PYSPARK_PYTHON="/usr/local/miniconda3/bin/python" ``` I ran `/usr/l

Error in show()

2018-09-06 Thread dimitris plakas
Hello everyone, I am new in Pyspark and i am facing an issue. Let me explain what exactly is the problem. I have a dataframe and i apply on this a map() function (dataframe2=datframe1.rdd.map(custom_function()) dataframe = sqlContext.createDataframe(dataframe2) when i have dataframe.show(30,True

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Mich Talebzadeh
Ok somehow this worked! // Save prices to mongoDB collection val document = sparkContext.parallelize((1 to 1). map(i => Document.parse(s"{key:'$key',ticker:'$ticker',timeissued:'$timeissued',price:$price,CURRENCY:'$CURRENCY',op_type:$op_type,op

Re: How to make pyspark use custom python?

2018-09-06 Thread Patrick McCarthy
It looks like for whatever reason your cluster isn't using the python you distributed, or said distribution doesn't contain what you think. I've used the following with success to deploy a conda environment to my cluster at runtime: https://henning.kropponline.de/2016/09/24/running-pyspark-with-co

Re: CBO not working for Parquet Files

2018-09-06 Thread emlyn
rajat mishra wrote > When I try to computed the statistics for a query where partition column > is in where clause, the statistics returned contains only the sizeInBytes > and not the no of rows count. We are also having the same issue. We have our data in partitioned parquet files and were hoping

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Mich Talebzadeh
thanks if you define columns class as below scala> case class columns(KEY: String, TICKER: String, TIMEISSUED: String, *PRICE: Double)* defined class columns scala> var df = Seq(columns("key", "ticker", "timeissued", 1.23f)).toDF df: org.apache.spark.sql.DataFrame = [KEY: string, TICKER: string .

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Jungtaek Lim
This code works with Spark 2.3.0 via spark-shell. scala> case class columns(KEY: String, TICKER: String, TIMEISSUED: String, PRICE: Float) defined class columns scala> import spark.implicits._ import spark.implicits._ scala> var df = Seq(columns("key", "ticker", "timeissued", 1.23f)).toDF 18/09/

Re: getting error: value toDF is not a member of Seq[columns]

2018-09-06 Thread Mich Talebzadeh
I am trying to understand why spark cannot convert a simple comma separated columns as DF. I did a test I took one line of print and stored it as a one liner csv file as below var allInOne = key+","+ticker+","+timeissued+","+price println(allInOne) cat crap.csv 6e84b11d-cb03-44c0-aab6-37e06e06c

Unsubscribe

2018-09-06 Thread Anu B Nair
Hi, I have tried all possible way to unsubscripted from this group. Can anyone help? -- Anu