Hi, Community
I'm a new onboarder in the Spark community and find some lag between Spark and
DBR in Spark UI. This is in DBR for cost based optimizer in Spark UI:
https://docs.databricks.com/en/optimizations/cbo.html#spark-sql-ui. To
implement similar thing in open source part, I've
>
> I've posted a question of StackOverflow. The link is -
> https://stackoverflow.com/questions/78644118/understanding-exchange-in-spark-ui
>
> I haven't got any responses yet. If possible could you please look into
> it? If you need me to write the question in the mailing list, I can do that
> as well.
>
> Thanks & Regards
> Dhruv
>
Hey Team
I've posted a question of StackOverflow. The link is -
https://stackoverflow.com/questions/78644118/understanding-exchange-in-spark-ui
I haven't got any responses yet. If possible could you please look into it?
If you need me to write the question in the mailing list, I can
Hi Team,
We're encountering an issue with Spark UI.
I've documented the details here:
https://issues.apache.org/jira/browse/SPARK-47232
When enabled reverse proxy in master and worker configOptions. We're not
able to access different tabs available in spark UI e.g.(stages,
environme
Hi Team,
We're encountering an issue with Spark UI.
When enabled reverse proxy in master and worker configOptions. We're not
able to access different tabs available in spark UI e.g.(stages,
environment, storage etc.)
We're deploying spark through bitnami helm chart :
https://git
Hello everyone,
I’m really sorry to use this mailing list, but seems impossible to notify a
strange behaviour that is happening with the Spark UI. I’m sending also the
link to the stackoverflow question here
https://stackoverflow.com/questions/76632692/spark-ui-executors-tab-its-empty
I’m
Severity: important
Affected versions:
- Apache Spark 3.1.1 before 3.2.2
Description:
** UNSUPPORTED WHEN ASSIGNED ** The Apache Spark UI offers the possibility to
enable ACLs via the configuration option spark.acls.enable. With an
authentication filter, this checks whether a user has access
Already a know minor issue
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-10141
On Wed, 7 Dec 2022, 15:09 K B M Kaala Subhikshan, <
kbmkaalasubhiks...@gmail.com> wrote:
> Could you explain why the RDD block has a negative value?
>
>
Hi All,
Is there a way that we can filter in all the jobs from the history server
UI / in Spark's API based on the Job Group to which the job belongs to?
Ideally we would like to supply a particular job group, and only see the
jobs associated with that job group in the UI.
Thanks,
Yeachan
Severity: important
Description:
The Apache Spark UI offers the possibility to enable ACLs via the
configuration option spark.acls.enable. With an authentication filter, this
checks whether a user has access permissions to view or modify the
application. If ACLs are enabled, a code path in
ng wrote:
>
>> It is like Web Application Proxy in YARN (
>> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html),
>> to provide easy access for Spark UI when the Spark application is running.
>>
>> When running Spark on Kuberne
for Spark UI when the Spark application is running.
>
> When running Spark on Kubernetes with S3, there is no YARN. The reverse
> proxy here is to behave like that Web Application Proxy. It will
> simplify settings to access Spark UI on Kubernetes.
>
>
> On Mon, May 16,
It is like Web Application Proxy in YARN (
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html),
to provide easy access for Spark UI when the Spark application is running.
When running Spark on Kubernetes with S3, there is no YARN. The reverse
proxy here is
Thanks Holden :)
On Mon, May 16, 2022 at 11:12 PM Holden Karau wrote:
> Oh that’s rad 😊
>
> On Tue, May 17, 2022 at 7:47 AM bo yang wrote:
>
>> Hi Spark Folks,
>>
>> I built a web reverse proxy to access Spark UI on Kubernetes (working
>>
what's the advantage of using reverse proxy for spark UI?
Thanks
On Tue, May 17, 2022 at 1:47 PM bo yang wrote:
> Hi Spark Folks,
>
> I built a web reverse proxy to access Spark UI on Kubernetes (working
> together with https://github.com/GoogleCloudPlatform/spark-on-k8s-ope
Oh that’s rad 😊
On Tue, May 17, 2022 at 7:47 AM bo yang wrote:
> Hi Spark Folks,
>
> I built a web reverse proxy to access Spark UI on Kubernetes (working
> together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).
> Want to share here in case other people ha
Hi Spark Folks,
I built a web reverse proxy to access Spark UI on Kubernetes (working
together with https://github.com/GoogleCloudPlatform/spark-on-k8s-operator).
Want to share here in case other people have similar need.
The reverse proxy code is here:
https://github.com/datapunchorg/spark-ui
Hello,
Now Spark UI does not show HashAggregateExec modes, could we add the
aggregate modes in SparkPlan? I think it's helpful when we analyze a very
complicated SparkPlan.
Am I right?
For example:
SELECT key2, sum(value2) as sum_value2
FROM ( SELECT id % 1 as key2, id as v
Hi,
I am experiencing performance issues in one of my pyspark applications. When I
look at the spark UI, the file and line number of each entry is listed as
. I would like to use the information in the Spark UI for debugging,
but without knowing the correct file and line number for the
any suggestion please.
Thanks
Amit
On Fri, Dec 4, 2020 at 2:27 PM Amit Sharma wrote:
> Is there any memory leak in spark 2.3.3 version as mentioned in below
> Jira.
> https://issues.apache.org/jira/browse/SPARK-29055.
>
> Please let me know how to solve it.
>
> Thanks
> Amit
>
> On Fri, Dec 4,
unsubsribe
Is there any memory leak in spark 2.3.3 version as mentioned in below Jira.
https://issues.apache.org/jira/browse/SPARK-29055.
Please let me know how to solve it.
Thanks
Amit
On Fri, Dec 4, 2020 at 1:55 PM Amit Sharma wrote:
> Can someone help me on this please.
>
>
> Thanks
> Amit
>
> On Wed,
Can someone help me on this please.
Thanks
Amit
On Wed, Dec 2, 2020 at 11:52 AM Amit Sharma wrote:
> Hi , I have a spark streaming job. When I am checking the Excetors tab ,
> there is a Storage Memory column. It displays used memory /total memory.
> What is used memory. Is it memory in use
Hi , I have a spark streaming job. When I am checking the Excetors tab ,
there is a Storage Memory column. It displays used memory /total memory.
What is used memory. Is it memory in use or memory used so far. How would
I know how much memory is unused at 1 point of time.
Thanks
Amit
Hi , I have few questions as below
1. In the spark ui storage tab is displayed 'storage level',' size in
memory' and size on disk, i am not sure it displays RDD ID 16 with memory
usage 76 MB not sure why it is not getting 0 once a request for spark
streaming is completed. I
wrote:
https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
for Spark UI.
Xiao
On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu
mailto:ramesh.biexp...@gmail.com>> wrote:
Hi,
I'm looking for a tutorial/video/material which explains the
content of
variou
https://www.youtube.com/watch?v=YgQgJceojJY (Xiao's video )
On Mon, Jul 20, 2020 at 8:03 AM Xiao Li wrote:
> https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
> for Spark UI.
>
> Xiao
>
> On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu
> wrote:
https://spark.apache.org/docs/3.0.0/web-ui.html is the official doc
for Spark UI.
Xiao
On Sun, Jul 19, 2020 at 1:38 PM venkatadevarapu
wrote:
> Hi,
>
> I'm looking for a tutorial/video/material which explains the content of
> various tabes in SPARK WEB UI.
> Can some on
Hi,
I'm looking for a tutorial/video/material which explains the content of
various tabes in SPARK WEB UI.
Can some one direct me with the relevant info.
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
Is there anyway to make the spark process visible via Spark UI when
running Spark 3.0 on a Hadoop yarn cluster? The spark documentation
talked about replacing Spark UI with the spark history server, but
didn't give much details. Therefore I would assume it is still possible
to use Spa
I am using history server to see previous UIs.
>
> However, my question still remains on viewing old thread dumps, as I
> cannot see them on the old completed spark UIs, only when spark context is
> running.
>
> On Wed, Apr 8, 2020 at 4:01 PM Zahid Rahman wrote:
>
>> Sp
Thanks Zahid, Yes I am using history server to see previous UIs.
However, my question still remains on viewing old thread dumps, as I
cannot see them on the old completed spark UIs, only when spark context is
running.
On Wed, Apr 8, 2020 at 4:01 PM Zahid Rahman wrote:
> Spark UI is o
Spark UI is only available while SparkContext is running.
However You can get to the Spark UI after your application completes or
crashes.
To do this Spark includes a tool called the Spark History Server that
allows you to reconstruct the Spark UI.
You can find up to date information on how
Hi all,
As stated in title, currently when I view the spark UI of a completed spark
job, I see there are thread dump links in the executor tab, but clicking on
them does nothing. Is it possible to see the thread dumps somehow even if
the job finishes? On spark 2.4.5.
Thanks.
--
Cheers,
Ruijing
t; following properties while submitting the spark job
>
> spark.eventLog.enabled true
>
> spark.eventLog.dir
>
>
>
> Thanks and Regards,
>
> Abhishek
>
>
>
> *From:* Battini Lakshman
> *Sent:* Wednesday, January 23, 2019 1:55 PM
> *To:* Rao, Abhishek (Nokia - IN/Bangalo
spark.eventLog.dir
Thanks and Regards,
Abhishek
From: Battini Lakshman
Sent: Wednesday, January 23, 2019 1:55 PM
To: Rao, Abhishek (Nokia - IN/Bangalore)
Subject: Re: Spark UI History server on Kubernetes
HI Abhishek,
Thank you for your response. Could you please let me know the properties you
configured
, 2019 6:02 PM
To: user@spark.apache.org
Subject: Spark UI History server on Kubernetes
Hello,
We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI
using "kubectl port-forward".
However, this spark UI contains currently running Spark application logs, we
wou
Hello,
We are running Spark 2.4 on Kubernetes cluster, able to access the Spark UI
using "kubectl port-forward".
However, this spark UI contains currently running Spark application logs,
we would like to maintain the 'completed' spark application logs as well.
Could some
Done:
https://issues.apache.org/jira/browse/SPARK-25837
On Thu, Oct 25, 2018 at 10:21 AM Marcelo Vanzin wrote:
> Ah that makes more sense. Could you file a bug with that information
> so we don't lose track of this?
>
> Thanks
> On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
> wrote:
> >
> > On
Ah that makes more sense. Could you file a bug with that information
so we don't lose track of this?
Thanks
On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown
wrote:
>
> On my production application I am running ~200 jobs at once, but continue to
> submit jobs in this manner for sometimes ~1 hour.
>
When you say many jobs at once, what ballpark are you talking about?
The code in 2.3+ does try to keep data about all running jobs and
stages regardless of the limit. If you're running into issues because
of that we may have to look again at whether that's the right thing to
do.
On Tue, Oct 23, 20
I believe I may be able to reproduce this now, it seems like it may be
something to do with many jobs at once:
Spark 2.3.1
> spark-shell --conf spark.ui.retainedJobs=1
scala> import scala.concurrent._
scala> import scala.concurrent.ExecutionContext.Implicits.global
scala> for (i <- 0 until 5
Just tried on 2.3.2 and worked fine for me. UI had a single job and a
single stage (+ the tasks related to that single stage), same thing in
memory (checked with jvisualvm).
On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin wrote:
>
> On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
> wrote:
> > I rec
On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown
wrote:
> I recently upgraded to spark 2.3.1 I have had these same settings in my spark
> submit script, which worked on 2.0.2, and according to the documentation
> appear to not have changed:
>
> spark.ui.retainedTasks=1
> spark.ui.retainedStages=1
>
I have the same problem when I upgrade my application from Spark 2.2.1 to
Spark 2.3.2 and run in Yarn client mode.
Also I noticed that in my Spark driver, org.apache.spark.status.TaskDataWrapper
could take up more than 2G of memory.
Shing
On Tuesday, 16 October 2018, 17:34:02 GMT+1, Patr
I recently upgraded to spark 2.3.1 I have had these same settings in my
spark submit script, which worked on 2.0.2, and according to the
documentation appear to not have changed:
spark.ui.retainedTasks=1
spark.ui.retainedStages=1
spark.ui.retainedJobs=1
However in 2.3.1 the UI doesn't seem to res
Hello,
I am having some troubles in using Spark Master UI to figure out some basic
information.
The process is too tedious.
I am using spark 2.2.1 with Spark standalone.
- In cluster mode, how to figure out which driver is related to which
application?
- In supervise mode, how to track the restart
Hi,
I've a job running which shows the Event Timeline as follows, I am trying
to guess the gaps between these single lines, they seem to be parallel but
not immediately sequential with other stages.
Any other insight from this, and what is the cluster doing during these
gaps?
Thanks,
Aakash.
imestamp?
>>
>> Only if the REST API has that feature, don't remember off the top of my
>> head.
>>
>>
>> --
>> Marcelo
>>
>> -----
>> To unsubscribe e-mail: [hidden email]
>>
On Mon, May 7, 2018 at 1:44 AM, Anshi Shrivastava
wrote:
> I've found a KVStore wrapper which stores all the metrics in a LevelDb
> store. This KVStore wrapper is available as a spark-dependency but we cannot
> access the metrics directly from spark since they are all private.
I'm not sure what i
wrote:
> Hi All,
>
> I am using spark 2.3.0 and I wondering what do I need to set to see the
> number of records and processing time for each batch in SPARK UI? The
> default UI doesn't seem to show this.
>
> Thanks@
>
Hi All,
I am using spark 2.3.0 and I wondering what do I need to set to see the
number of records and processing time for each batch in SPARK UI? The
default UI doesn't seem to show this.
Thanks@
Hello,
I am running a streaming app on Spark 2.1.2. The batch interval is set to
5000ms, and when I go to the "Streaming" tab in the Spark UI, it correctly
reports a 5 second batch interval, but the list of batches below only shows one
batch every two minutes (IE the batch time for
*Environment*:
AWS EMR, yarn cluster.
*Description*:
I am trying to use a java filter to protect the access to spark ui, this by
using the property spark.ui.filters; the problem is that when spark is
running on yarn mode, that property is being allways overriden by hadoop
with the filter
*Environment:*
AWS EMR, yarn cluster.
*Description:*
On Spark ui, in Environment and Executors tabs, the links of stdout and
stderr point to the internal address of the executors. This would imply to
expose the executors so that links can be accessed. Shouldn't those links
be point
Hello all,
I am running PySpark Job (v2.0.2) with checkpoint enabled in Mesos cluster
and am using Marathon for orchestration.
When the job is restarted using Marathon, Spark UI is not getting started
at the port specified by Marathon. Instead, it is picking port from the
checkpoint.
Is there a
Hello all,
I am running PySpark Job (v2.0.2) with checkpoint enabled in Mesos cluster
and am using Marathon for orchestration.
When the job is restarted using Marathon, Spark UI is not getting started
at the port specified by Marathon. Instead, it is picking port from the
checkpoint.
Is there a
The batches should all have the same application ID, so use that one. You can
also find the application in the YARN UI to terminate it from there.
Matei
> On Aug 27, 2017, at 10:27 AM, KhajaAsmath Mohammed
> wrote:
>
> Hi,
>
> I am new to spark streaming and not able to find an option to kil
Hi Riccardo,
Thanks for your suggestions.
The thing is that my Spark UI is the one thing that is crashing - and not
the app. In fact the app does end up completing successfully.
That's why I'm a bit confused by this issue?
I'll still try out some of your suggestions.
Thanks and R
tly
>>> peculiar thing I am doing in this app is using a custom PySpark ML
>>> Transformer(Modified from
>>> https://stackoverflow.com/questions/32331848/create-a-custom
>>> -transformer-in-pyspark-ml).
>>> Could this be the issue? How can I d
a custom PySpark ML
>> Transformer(Modified from
>> https://stackoverflow.com/questions/32331848/create-a-custom
>> -transformer-in-pyspark-ml).
>> Could this be the issue? How can I debug why this is happening?
>>
>>
>>
>> --
>> View this mes
gt; Transformer(Modified from
> https://stackoverflow.com/questions/32331848/create-a-
> custom-transformer-in-pyspark-ml).
> Could this be the issue? How can I debug why this is happening?
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/
:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr
Hi all,
I am using Hadoop 2.6.5 and spark 2.1.0 and run a job using spark-submit
and master is set to "yarn". When spark starts, I can load Spark UI page
using port 4040 but no job is shown in the page. After the following logs
(registering application master on yarn) spark UI is not
of Active jobs and tasks according to
> the UI. And my output is correct.
>
>
>
> Is there a JIRA item tracking this?
>
>
>
> *From:* Kuchekar [mailto:kuchekar.nil...@gmail.com]
> *Sent:* Wednesday, November 16, 2016 10:00 AM
> *To:* spark users
> *Subject:* Spark U
Any clue on this.
Jobs are running fine , But not able to access Spark UI in EMR -yarn.
Where I can see statistics like , No of events /per sec and rows processed
for streaming in log files (If UI is not working)
-Saurabh
From: Saurabh Malviya (samalviy)
Sent: Monday, January 09, 2017 10:59
Spark web UI for detailed monitoring for streaming jobs stop rendering after 2
weeks. Its keep looping to fetch the page. Is there any clue I can get that
page. Or logs where I can see how many events coming in spark for each internval
-Saurabh
Was trying something basic to understand tasks stages and shuffles a bit
better in Spark. The dataset is 256 MB
Tried this in zeppelin
val tmpDF = spark.read
.option("header", "true")
.option("delimiter", ",")
.option("inferSchema", "true")
.csv("s3://l4b-d4t4/wikipedia/pageviews-by-secon
, 2016 10:00 AM
To: spark users
Subject: Spark UI shows Jobs are processing, but the files are already written
to S3
Hi,
I am running a spark job, which saves the computed data (massive data) to
S3. On the Spark Ui I see the some jobs are active, but no activity in the
logs. Also on S3
Hi,
I am running a spark job, which saves the computed data (massive data)
to S3. On the Spark Ui I see the some jobs are active, but no activity in
the logs. Also on S3 all the data has be written (verified each bucket -->
it has _SUCCESS file)
Am I missing something?
Thanks.
Kuche
Hi,
I am using Spark 1.6.1, and I am looking at the Event Timeline on "Details
for Stage" Spark UI web page in detail.
I found that the "scheduler delay" on event timeline is somehow
misrepresented. I want to confirm if my understanding is correct.
Here is the detailed des
I'm able to fix.. added servlet 3.0 to classpath
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-error-spark-2-0-1-hadoop-2-6-tp27970p27971.html
Sent from the Apache Spark User List mailing list archive at Nabbl
0}
Let me know if I'm missing something.
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-error-spark-2-0-1-hadoop-2-6-tp27970.html
Sent from the Apache Spark User List mailing list
vide a ReST API for retrieving important
> information about running jobs, this would be far more simpler.
>
> Regards,
>
> Sivakumaran S
>
>
>
> On 27-Aug-2016, at 3:59 PM, Mich Talebzadeh
> wrote:
>
> Thanks Nguyen for the link.
>
> I installed Super R
Thanks Nguyen for the link.
I installed Super Refresh as ADD on to Chrome. By default the refresh is
stop until you set it to x seconds. However, the issue we have is that
Spark UI comes with 6+ tabs and you have to repeat the process for each tab.
However, that messes up the things. For example
i Mich,
>>
>> Unlikely that we can use Zeppelin for dynamic, real time update
>> visualisation. It makes nice, static visuals.
>>
>> I was thinking more on the lines of http://dashingdemo.herokuap
>> p.com/sample
>>
>> The library is http://dashing.i
is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 27 August 2016 at 09:42, Sivakumaran S wrote:
>
>> I would love to participate in developing a dashboard of some sort in
&
ieu
> (or at least complement it) of Spark UI .
>
> Regards,
>
> Sivakumaran S
>
> On 27 Aug 2016 9:34 a.m., Mich Talebzadeh
> wrote:
>
> Are we actually looking for a eal time dashboard of some sort for Spark UI
> interface?
>
> After all one can think a real time
love to participate in developing a dashboard of some sort in lieu
> (or at least complement it) of Spark UI .
>
> Regards,
>
> Sivakumaran S
>
> On 27 Aug 2016 9:34 a.m., Mich Talebzadeh
> wrote:
>
> Are we actually looking for a eal time dashboard of some sort for Sp
Are we actually looking for a eal time dashboard of some sort for Spark UI
interface?
After all one can think a real time dashboard can do this!
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.
o fill an JIRA issue?
>
> What about REST API and httpie updating regularly [3]? Perhaps Metrics
> with ConsoleSink [4]?
>
> [1] https://github.com/apache/spark/blob/master/streaming/
> src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L158
> [2] https://g
r/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L202
[3] http://spark.apache.org/docs/latest/monitoring.html#rest-api
[4] http://spark.apache.org/docs/latest/monitoring.html#metrics
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/m
Hi you can take a look at:
https://github.com/hammerlab/spree
it's a bit outdated but maybe it's still possible to use with some more
recent Spark version.
M.
2016-08-25 11:55 GMT+02:00 Mich Talebzadeh :
> Hi,
>
> This may be already there.
>
> A spark job opens up a UI on port specified by --c
Hi,
This may be already there.
A spark job opens up a UI on port specified by --conf
"spark.ui.port=${SP}" that defaults to 4040.
However, on UI one needs to refresh the page to see the progress.
Can this be polled so it is refreshed automatically
Thanks
Dr Mich Talebzadeh
LinkedIn *
ht
gt;> your spark jobs.
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400p27406.
ng tool like Ganglia to determine were its slow in
> your spark jobs.
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-sh
Hi All,
I have recently started using Spark 1.6.2 for running my spark jobs. But
now my jobs are not getting shown in the spark browser UI, even though the
job is running fine which i can see in shell output.
Any suggestions.
Thanks,
Prashant Verma
es. For e. g.
> xyz_saveAsTextFile(), abc_saveAsTextFile() etc please guide. Thanks in
> advance.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400.html
> Sent from the Apache
.1001560.n3.nabble.com/How-to-give-name-to-Spark-jobs-shown-in-Spark-UI-tp27400.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Ok, so those line numbers in our DAG don't refer to our code. Is there any
way to display (or calculate) line numbers that refer to code we actually
wrote, or is that only possible in Scala Spark?
On Thu, Jul 21, 2016 at 12:24 PM, Jacek Laskowski wrote:
> Hi,
>
> My little understanding of Pytho
That -1 is coming from here:
PythonRDD.writeIteratorToStream(inputIterator, dataOut)
dataOut.writeInt(SpecialLengths.END_OF_DATA_SECTION) —> val
END_OF_DATA_SECTION = -1
dataOut.writeInt(SpecialLengths.END_OF_STREAM)
dataOut.flush()
> On Jul 21, 2016, at 12:24 PM, Jacek Laskowski wrote:
>
>
Hi,
My little understanding of Python-Spark bridge is that at some point
the python code communicates over the wire with Spark's backbone that
includes PythonRDD [1].
When the CallSite can't be computed, it's null:-1 to denote "nothing
could be referred to".
[1]
https://github.com/apache/spark/
>
> It's called a CallSite that shows where the line comes from. You can see
> the code yourself given the python file and the line number.
>
But that's what I don't understand. Which python file? We spark submit one
file called ctr_parsing.py, but it only has 150 lines. So what is
MapPartitions a
On Thu, Jul 21, 2016 at 2:56 AM, C. Josephson wrote:
> I just started looking at the DAG for a Spark Streaming job, and had a
> couple of questions about it (image inline).
>
> 1.) What do the numbers in brackets mean, e.g. PythonRDD[805]?
>
Every RDD has its identifier (as id attribute) within
My stream app is running into problems It seems to slow down over time. How
can I interpret the storage memory column. I wonder if I have a GC problem?
Any idea how I can get GC stats?
Thanks
Andy
Executors (3)
* Memory: 9.4 GB Used (1533.4 MB Total)
* Disk: 0.0 B Used
Executor IDAddressRDD Bloc
,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not repor
ot correct.
>
> We are trying to define our monitoring for our scheduled jobs, and we
> intended to use the Spark UI to catch issues. Can we explain why the UI
> would not report an exception like this? Is there a better approach we
> should use for tracking failures in a Spark
at was in Cassandra), which threw an exception that logged
> out to our spark-submit log.
> However ,the UI never showed any failed stage or job. It appeared as if
> the job finished without error, which is not correct.
>
> We are trying to define our monitoring for our scheduled jobs,
not correct.
We are trying to define our monitoring for our scheduled jobs, and we
intended to use the Spark UI to catch issues. Can we explain why the UI
would not report an exception like this? Is there a better approach we
should use for tracking failures in a Spark job?
We are currently on
ptimize it . That way we are focused on our application
logic rather what framework is doing underneath.
About soln, doesn't spark driver (spark context + event listner) have
knowledge of every job, taskset, task and their current state? Spark UI can
relate job to stage to task then w
1 - 100 of 216 matches
Mail list logo