But now I have another question, how to determine which data node the spark
task is writing? It's really important for diving in the problem .
Regard,
Junfeng Chen
On Thu, Mar 14, 2019 at 2:26 PM Shyam P wrote:
> cool.
>
> On Tue, Mar 12, 2019 at 9:08 AM JF Chen wrote:
>
>> Hi
>> Finally I fo
Hello Paolo,
generally speaking, query planning is mostly based on statistics and
distributions of data values for the involved columns, which might
significantly change over time in a streaming context, so for me it makes a
lot of sense that it is run at every schedule, even though I understand
yo
I am running a grep application on spark 2.3.4 and scala version 2.11. I have
an input textfile of 813MB stored on a remote source (not a part of spark
infrastructure) using hdfs. My application just reads the textfile line by
line from hdfs server and filters for a given keyword in each line and
o
It is possible that the Application Master is not getting started. Try
increasing the memory limit of the application master in yarn-site.xml or in
capacity-scheduler if you have it configured.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
--
Hello everyone,
I have set up a 3node hadoop cluster according to this tutorial:
https://linode.com/docs/databases/hadoop/how-to-install-and-set-up-hadoop-cluster/#run-yarn
and i run the example about yarn (the one with the books) that is described
in this tutorial in order to test if everything w
It doesn't work (except if you're extremely lucky), it will eat your
lunch and will also kick your dog.
And it's not even going to be an option in the next version of Spark.
On Wed, Mar 13, 2019 at 11:38 PM Ido Friedman wrote:
>
> Hi,
>
> I am researching the use of multiple sparkcontext in one
Hi All,
I would like to understand why in a streaming query ( that should not be able
to change its behaviour along iterations ) there is a queryPlanning-Duration
effort ( in my case is 33% of trigger interval ) at every schedule. I don’t
uderstand why this is needed and if it is possible to d
import org.apache.spark.sql.expressions.Window
val partitionBy = Window.partitionBy("name", "sit").orderBy("data_date")
val newDf = df.withColumn("PreviousDate", lag("uniq_im",
1).over(partitionBy))
Cheers...
On Thu, Mar 14, 2019 at 4:55 AM anbu wrote:
> Hi,
>
> To calculate LAG functions dif