Hi,
I have a question regarding passing a dictionary from driver to executors
in spark on yarn. This dictionary is needed in an udf. I am using pyspark.
As I understand this can be passed in two ways:
1. Broadcast the variable and then use it in the udfs
2. Pass the dictionary in the udf itself
hello,Why spark usually off-heap oom when shuffle reader? I read some source
code , When a ResultTask read shuffle data from no-local executor,it has buffer
and spill disk,so why still off-heap oom?
jib...@qq.com
Depending on the Alluxio version you are running, e..g, for 2.0, the
metrics of the local short-circuit read is not turned on by default.
So I would suggest you to first turn on the metrics collecting local
short-circuit reads by setting
alluxio.user.metrics.collection.enabled=true
Regarding the g
Hi Mark,
You can follow the instructions here:
https://docs.alluxio.io/os/user/stable/en/compute/Spark.html#customize-alluxio-user-properties-for-individual-spark-jobs
Something like this:
$ spark-submit \--conf
'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH'
Hello,
I have 2 parquets (each containing 1 file):
- parquet-wide - schema has 25 top level cols + 1 array
- parquet-narrow - schema has 3 top level cols
Both files have same data for given columns.
When I read from parquet-wide spark reports* read 52.6 KB*, from
parquet-narrow *only 2.6 K
I am also interested. Many of the docs/books that I've seen are
practical/examples about usage rather than deep internals of Spark.
On Wed, 18 Sep 2019 21:12:12 -1100 vipul.s.p...@gmail.com wrote
Yes,
I realize what you were looking for, I am also looking for the same docs.
Haven
Hi,
Consider the following statements:
1)
> scala> val df = spark.read.format("com.shubham.MyDataSource").load
> scala> df.show
> +---+---+
> | i| j|
> +---+---+
> | 0| 0|
> | 1| -1|
> | 2| -2|
> | 3| -3|
> | 4| -4|
> +---+---+
> 2)
> scala> val df1 = df.filter("i < 3")
> scala> df1.show
Hi,
How can I create an initial state by hands so that structured streaming
files source only reads data which is semantically (i.e. using a file path
lexicographically) greater than the minimum committed initial state?
Details here:
https://stackoverflow.com/questions/58004832/spark-structured-s
Yes,
I realize what you were looking for, I am also looking for the same docs.
Haven't found em yet. Also, jacek laskowski's gitbooks are the next best
thing to follow. If you haven't yet.
Regards
On Thu, Sep 19, 2019 at 12:46 PM wrote:
> Thanks Vipul,
>
>
>
> I was looking specifically for do
Thanks Vipul,
I was looking specifically for documents spark committer use for reference.
Currently I’ve put custom logs in spark-core sources then building and running
jobs on it.
Form printed logs I try to understand execution flows.
From: Vipul Rajan
Sent: Thursday, September 19, 2019 12:23
11 matches
Mail list logo