Can you isolate the row that is causing the problem? I mean start using
show(31) up to show(60).
Perhaps this will help you to understand the problem.
regards,
Apostolos
On 07/09/2018 01:11 πμ, dimitris plakas wrote:
Hello everyone, I am new in Pyspark and i am facing an issue. Let me
expl
Are you sure that pyarrow is deployed on your slave hosts ? If not, you
will either have to get it installed or ship it along when you call
spark-submit by zipping it up and specifying the zipfile to be shipped
using the
--py-files zipfile.zip option
A quick check would be to ssh to a slave host,
The whole content in `spark-env.sh` is
```
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=10.104.85.78:2181,10.104.114.131:2181,10.135.2.132:2181
-Dspark.deploy.zookeeper.dir=/spark"
PYSPARK_PYTHON="/usr/local/miniconda3/bin/python"
```
I ran `/usr/l
Hello everyone, I am new in Pyspark and i am facing an issue. Let me
explain what exactly is the problem.
I have a dataframe and i apply on this a map() function
(dataframe2=datframe1.rdd.map(custom_function())
dataframe = sqlContext.createDataframe(dataframe2)
when i have
dataframe.show(30,True
Ok somehow this worked!
// Save prices to mongoDB collection
val document = sparkContext.parallelize((1 to 1).
map(i =>
Document.parse(s"{key:'$key',ticker:'$ticker',timeissued:'$timeissued',price:$price,CURRENCY:'$CURRENCY',op_type:$op_type,op
It looks like for whatever reason your cluster isn't using the python you
distributed, or said distribution doesn't contain what you think.
I've used the following with success to deploy a conda environment to my
cluster at runtime:
https://henning.kropponline.de/2016/09/24/running-pyspark-with-co
rajat mishra wrote
> When I try to computed the statistics for a query where partition column
> is in where clause, the statistics returned contains only the sizeInBytes
> and not the no of rows count.
We are also having the same issue. We have our data in partitioned parquet
files and were hoping
thanks if you define columns class as below
scala> case class columns(KEY: String, TICKER: String, TIMEISSUED:
String, *PRICE:
Double)*
defined class columns
scala> var df = Seq(columns("key", "ticker", "timeissued", 1.23f)).toDF
df: org.apache.spark.sql.DataFrame = [KEY: string, TICKER: string .
This code works with Spark 2.3.0 via spark-shell.
scala> case class columns(KEY: String, TICKER: String, TIMEISSUED: String,
PRICE: Float)
defined class columns
scala> import spark.implicits._
import spark.implicits._
scala> var df = Seq(columns("key", "ticker", "timeissued", 1.23f)).toDF
18/09/
I am trying to understand why spark cannot convert a simple comma separated
columns as DF.
I did a test
I took one line of print and stored it as a one liner csv file as below
var allInOne = key+","+ticker+","+timeissued+","+price
println(allInOne)
cat crap.csv
6e84b11d-cb03-44c0-aab6-37e06e06c
Hi,
I have tried all possible way to unsubscripted from this group. Can anyone
help?
--
Anu
11 matches
Mail list logo