Re: Running the driver on a laptop but data is on the Spark server

2020-11-25 Thread Apostolos N. Papadopoulos
ectory on my laptop itself. Am I crazy? Perhaps this isn't a supported way to use Spark? Any help or insights are much appreciated! -Ryan Victory -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: +

Re: Complexity with the data

2022-05-25 Thread Apostolos N. Papadopoulos
re are 6 columns and 4 records in total. These are the sample records. Should I load it as RDD and then may be using a regex should eliminate the new lines? Or how it should be? with ". /n" ? Any suggestions? Thanks, Sid -- Apostolos N. Papadopoulos, Associate Professor Depar

Re: Complexity with the data

2022-05-26 Thread Apostolos N. Papadopoulos
"true").option("quote",     '"').option(     "delimiter", ",").csv("path") What else I can do? Thanks,

Re: Issues getting Apache Spark

2022-05-26 Thread Apostolos N. Papadopoulos
get Spark to work on my laptop. Michael Martin -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++0030312310991918 email:papad...@csd.auth.gr twitter: @papadopoulos_ap web:http://datalab.csd.auth.gr/~apostol

Re: Spark Doubts

2022-06-21 Thread Apostolos N. Papadopoulos
culate the exact partitions needed to load a specific file? Thanks, Sid -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++0030312310991918 email: papad...@csd.auth.gr twitter: @papadopoulos_ap web

Re: How is Spark a memory based solution if it writes data to disk before shuffles?

2022-07-05 Thread Apostolos N. Papadopoulos
/github.com/JerryLead/SparkInternals/blob/master/markdown/english/4-shuffleDetails.md>. How then is this an improvement on map-reduce? Image from https://youtu.be/7ooZ4S7Ay6Y thanks! -- Apostolos N. Papadopoulos, Associate Professor Depar

Re: How to improve efficiency of this piece of code (returning distinct column values)

2023-02-10 Thread Apostolos N. Papadopoulos
ctions.col(columnNames[i]).isNotNull()).select(columnNames[i]).distinct().collectAsList(); for (int j=0;jcolumnList.add(columnValues.get(j).apply(0).toString()); finalList.add(columnList);| How to improve this? Also, can I get the results in JSON format? -- Apostolos N. Papadopoulos, Associate

Re: Increase no of tasks

2018-06-22 Thread Apostolos N. Papadopoulos
560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++003031231099

Re: Create an Empty dataframe

2018-06-30 Thread Apostolos N. Papadopoulos
t() with two columns ("Column1", "Column2"), and i want to append rows dynamically in a for loop. Is there any way to achieve this? Thank you in advance. -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloni

Re: Parallelism: behavioural difference in version 1.2 and 2.1!?

2018-08-29 Thread Apostolos N. Papadopoulos
0.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++0030312310991918 email: papad...@csd.

Re: Error in show()

2018-09-06 Thread Apostolos N. Papadopoulos
the error. The Error is in the attachement Pyspark_Error.txt. Could you please explain me what is this error and how to overpass it? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- Apostolos N. Papadopoulos

Re: Spark job's driver programe consums too much memory

2018-09-07 Thread Apostolos N. Papadopoulos
Thanks -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++0030312310991918 email: papad...@csd.auth.gr twitter: @papadopoulos_ap web: http://dat

Re: Spark job's driver programe consums too much memory

2018-09-07 Thread Apostolos N. Papadopoulos
driver program. If I can write data to hdfs at executor, then the driver memory for my spark job can be reduced. Otherwise does Spark support streaming read from database (i.e. spark streaming + spark sql)? Thanks for your reply. ‐‐‐ Original Message ‐‐‐ On 7 September 2018 4:15 PM

Re: Local vs Cluster

2018-09-14 Thread Apostolos N. Papadopoulos
On 14/09/2018 11:21 πμ, Aakash Basu wrote: Hi, What is the Spark cluster equivalent of standalone's local[N]. I mean, the value we set as a parameter of local as N, which parameter takes it in the cluster mode? Thanks, Aakash. -- Apostolos N. Papadopoulos, Associate Professor Departme

Re: Specifying different version of pyspark.zip and py4j files on worker nodes with Spark pre-installed

2018-10-04 Thread Apostolos N. Papadopoulos
x27;) ,('spark.network.timeout', '800') ,('spark.scheduler.mode', 'FAIR') ,('spark.shuffle.service.enabled', 'true') ,('spark.dynamicAllocation.enabled', 'true') ]) py_files = ['hdfs://emr-header-1.

Re: Apply Kmeans in partitions

2019-01-30 Thread Apostolos N. Papadopoulos
or. How can i apply Kmeans to every partition? Thank you in advance, -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++0030312310991918 email: papad...@csd.auth.gr twitter: @papadopoulos_ap web: http:/

Re: Java Heap Space error - Spark ML

2019-03-22 Thread Apostolos N. Papadopoulos
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Thanks, Asmath -- Apostolos N. Papadopoulos, Associate Professor Dep

Re: writing a small csv to HDFS is super slow

2019-03-22 Thread Apostolos N. Papadopoulos
n. Any clue is highly appreciated! Thanks. -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++0030312310991918 email: papad...@csd.auth.gr twitter: @papadopoulos_ap web:

Re: Serialization error when using scala kernel with Jupyter

2020-02-21 Thread Apostolos N. Papadopoulos
nce of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD I was wondering if anyone has seen this before. Thanks Nikhil -- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle Univers