Hi ,
I am using EMR machine and I could see the Spark log directory has grown
till 4G.
file name : spark-history-server.out
Need advise how can I reduce the the size of the above mentioned file.
Is there config property which can help me .
Thanks,
Divya
Assuming you don't have your environment variables setup in your
.bash_profile you would do it like this -
import os
import sys
spark_home = '/usr/local/spark'
sys.path.insert(0, spark_home + "/python")
sys.path.insert(0, os.path.join(spark_home,
'python/lib/py4j-0.10.1-src.zip'))
#os.environ['P
Can you be more specific on what you would want to change on the DF level?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Setting-Spark-Properties-on-Dataframes-tp28266p28275.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
Hi Team,
Can some please share any examples on spark java read and write files from
Google Store.
Thanks You in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-java-with-Google-Store-tp28276.html
Sent from the Apache Spark User List mailin
>From the stack it looks to be an error from the explicit call to
>hadoop.fs.FileSystem.
Is the URL scheme for s3n registered?
Does it work when you try to read from s3 from Spark?
_
From: Ankur Srivastava
mailto:ankur.srivast...@gmail.com>>
Sent: Wednesday, January
Hi Marco and respected member,
I have done all the possible things suggested by Forum but still I'm having
same issue:
1. I will migrate my applications to production environment where I will have
more resourcesPalash>> I migrated my application in production where I have
more CPU Cores, Memory
Hi
If it only happens when u run 2 app at same time could it be that these 2
apps somehow run on same host?
Kr
On 5 Jan 2017 9:00 am, "Palash Gupta" wrote:
> Hi Marco and respected member,
>
> I have done all the possible things suggested by Forum but still I'm
> having same issue:
>
> 1. I wil
Hi All,
Using spark is interoperability communication between two
clouds(Google,AWS) possible.
in my use case i need to take Google store as input to spark and do some
processing and finally needs to store in S3 and my spark engine runs on AWS
Cluster.
Please let me back is there any way for thi
Hi Macro,
Yes it was in the same host when problem was found.
Even when I tried to start with different host, the problem is still there.
Any hints or suggestion will be appreciated.
Thanks & Best Regards,
Palash Gupta
From: Marco Mistroni
To: Palash Gupta
Cc: ayan guha ; User
Sent:
Hi all,
I am aware that collect will return a list aggregated on driver, this will
return OOM when we have a too big list.
Is toLocalIterator safe to use with very big list, i want to access all values
one by one.
Basically the goal is to compare two sorted rdds (A and B) to find top k
entries
Why not do that with spark sql to utilise the executors properly, rather than a
sequential filter on the driver.
Select * from A left join B on A.fk = B.fk where B.pk is NULL limit k
If you were sorting just so you could iterate in order, this might save you a
couple of sorts too.
https://rich
If it is in same host...It is expected. Afaik u cannot create >1 spark CTX
on same host.
All I can suggest is to run. Ur apps outside cluster and on 2 different
hosts. If that fails u will need to put. Logs in ur failing app to
determine why it is failing.
If u can send me short snippet for the two
Hi User Team,
I'm trying to schedule resource in spark 2.1.0 using below code but still all
the cpu cores are captured by only single spark application and hence no other
application is starting. Could you please help me out:
sqlContext =
SparkSession.builder.master("spark://172.26.7.192:7077").
On 5 Jan 2017, at 09:58, Manohar753
mailto:manohar.re...@happiestminds.com>> wrote:
Hi All,
Using spark is interoperability communication between two
clouds(Google,AWS) possible.
in my use case i need to take Google store as input to spark and do some
processing and finally needs to store in S
Yes it works to read the vertices and edges data from S3 location and is
also able to write the checkpoint files to S3. It only fails when deleting
the data and that is because it tries to use the default file system. I
tried looking up how to update the default file system but could not find
anyth
This blog post(Not mine) has some nice examples -
https://hadoopist.wordpress.com/2016/08/19/how-to-create-compressed-output-files-in-spark-2-0/
>From the blog -
df.write.mode("overwrite").format("parquet").option("compression",
"none").mode("overwrite").save("/tmp/file_no_compression_parq")
There is a way, you can use
org.apache.spark.sql.functions.monotonicallyIncreasingId it will give each rows
of your dataframe a unique Id
On Tue, Oct 18, 2016 10:36 AM, ayan guha guha.a...@gmail.com
wrote:
Do you have any primary key or unique identifier in your data? Even if multiple
column
I don't use MapR but I use pyspark with jupyter, and this MapR blogpost
looks similar to what I do to setup:
https://community.mapr.com/docs/DOC-1874-how-to-use-jupyter-pyspark-on-mapr
On Thu, Jan 5, 2017 at 3:05 AM, neil90 wrote:
> Assuming you don't have your environment variables setup in y
Hi
might be off topic, but databricks has a web application in whicn you
can use spark with jupyter. have a look at
https://community.cloud.databricks.com
kr
On Thu, Jan 5, 2017 at 7:53 PM, Jon G wrote:
> I don't use MapR but I use pyspark with jupyter, and this MapR blogpost
> looks similar
So, it seems the only way I found for now is a recursive handling of the Row
instances directly, but to do that I have to go back to RDDs, i've put together
a simple test case demonstrating the problem :
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.scalatest.{FlatSpec, Matchers}
Hi Steve,
Thanks for the reply and below is follow-up help needed from you.
Do you mean we can set up two native file system to single sparkcontext ,so
then based on urls prefix( gs://bucket/path and dest s3a://bucket-on-s3/path2)
will that identify and write/read appropriate cloud.
Is that my u
Right, I'd agree, it seems to be only with delete.
Could you by chance run just the delete to see if it fails
FileSystem.get(sc.hadoopConfiguration)
.delete(new Path(somepath), true)
From: Ankur Srivastava
Sent: Thursday, January 5, 2017 10:05:03 AM
To: Felix Che
Hello Experts,
I am trying to allow null values in numeric fields. Here are the details of
the issue I have:
http://stackoverflow.com/questions/41492344/spark-avro-to-parquet-writing-null-values-in-number-fields
I also tried making all columns nullable by using the below function (from
one of the
Yes I did try it out and it choses the local file system as my checkpoint
location starts with s3n://
I am not sure how can I make it load the S3FileSystem.
On Thu, Jan 5, 2017 at 12:12 PM, Felix Cheung
wrote:
> Right, I'd agree, it seems to be only with delete.
>
> Could you by chance run just
Adding DEV mailing list to see if this is a defect with ConnectedComponent
or if they can recommend any solution.
Thanks
Ankur
On Thu, Jan 5, 2017 at 1:10 PM, Ankur Srivastava wrote:
> Yes I did try it out and it choses the local file system as my checkpoint
> location starts with s3n://
>
> I
This is likely a factor of your hadoop config and Spark rather then anything
specific with GraphFrames.
You might have better luck getting assistance if you could isolate the code to
a simple case that manifests the problem (without GraphFrames), and repost.
Fr
Would it be more robust to use the Path when creating the FileSystem?
https://github.com/graphframes/graphframes/issues/160
On Thu, Jan 5, 2017 at 4:57 PM, Felix Cheung
wrote:
> This is likely a factor of your hadoop config and Spark rather then
> anything specific with GraphFrames.
>
> You migh
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
I am using newAPIHadoopFile to process large number of s3 files(around 20
thousand) by passing URLs as comma separated String. It take around *7
minutes* to start the job. I am running the job on EMR 5.2.0 with spark
2.0.2.
Here is the code
Configuration conf = new Configuration();
30 matches
Mail list logo