Hi Users,
Currently, I am trying to use Apache Spark 2.2.0 by using a Jupyter
notebook but not able to achieve it.
I am using Ubuntu 17.10.
I can able to use pyspark in command line as well as spark-shell . Please
give some ideas.
Thanks.
Nandan Priyadarshi
r.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Please help me in this . Thanks. Nandan Priyadarshi
Hi Team,
Any good book recommendations for get in-depth knowledge from zero to
production.
Let me know.
Thanks.
Hi ,
I am getting error :-
---
Py4JError Traceback (most recent call last)
in ()
3 TOTAL = 100
4 dots = sc.parallelize([2.0 * np.random.random(2) - 1.0 for i in range(
TOTAL)]).cache()
> 5 print("Number of random poin
unsubscribe
Hello everyone,
Generally speaking, I guess it's well known that dataframes are much faster
than RDD when it comes to performance.
My question is how do you go around when it comes to transforming a
dataframe using map.
I mean then the dataframe gets converted into RDD, hence now do you again
conv
Hello,
I am trying to combine several small text files (each file is approx
hundreds of MBs to 2-3 gigs) into one big parquet file.
I am loading each one of them and trying to take a union, however this
leads to enormous amounts of partitions, as union keeps on adding the
partitions of the input
for nested structure???
Thanks and Regards,
Nandan
at is the workaround?
thanks
Nandan
10 matches
Mail list logo