Hi Kohki,
Serialization of tasks happens in local mode too and as far as I am
aware there is no way to disable this (although it would definitely be
useful in my opinion).
You can see the local mode as a testing mode, in which you would want to
catch any serialization errors, before they appear i
val schemas = createSchemas(config)
val arr = new Array[String](schemas.size())
lines.map(x => {
val obj = JSON.parseObject(x)
val vs = new Array[Any](schemas.size())
for (i <- 0 until schemas.size()) {
arr(i) = schemas.get(i).name
vs(i) = x.getString(schemas.get(i).name)
}
}
I'm seeing many threads doing deserialization of a task, I understand since
lambda is involved, we can't use Kryo for those purposes. However I'm
running it in local mode, this serialization is not really necessary, no?
Is there any trick I can apply to get rid of this thread contention ? I'm
seei
Hi,
The file stream sink maintains the metadata in the output directory. The
metadata retains the list of files written by the streaming query, and
Spark reads the metadata on listing the files to read.
This is to guarantee end-to-end exactly once on writing files in the
streaming query. There co
Unsubscribe
Well, I have tried almost everything the last 2 days now.
There is no user spark, and whatever I do with the executor image it only runs
for 2 minutes in k8s and then restarts.
The problem seems to be the nogroup that is writing files from executors.
drwxr-xr-x 2185 nogroup4096 Sep
Hi Aurélien,
Spark has endpoints to expose the spark application metrics. These endpoints
can be used as a rest API. You can read more about these here:
https://spark.apache.org/docs/3.1.1/monitoring.html#rest-api
Additionally,
If you want to build your own custom metrics, you can explore spark
All,
This is very surprising and I am sure I might be doing something wrong. The
issue is, the following code is taking 8 hours. It reads a CSV file, takes
the phone number column, extracts the first four digits and then
partitions based on the four digits (phoneseries) and writes to Parquet.
Any
Hi community,
I would like to collect information about the execution of a Spark job
while it is running. Could I define some kind of application metrics (such
as a counter that would be incremented in my code) that I could retrieve
regularly while the job is running?
Thank you for help,
Aurelie
Hi all,
I recently stumbled about a rather strange problem with streaming
sources in one of my tests. I am writing a Parquet file from a
streaming source and subsequently try to append the same data but this
time from a static dataframe. Surprisingly, the number of rows in the
Parquet file remai
On 2021/09/02 06:00:26, Harsh Sharma wrote:
> Please Find reply :
> Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming?
>
> ans :its Spark SQL
>
> Do you use broadcast variables ?
>
> ans : yes we are using broadcast variables
> or are the er
11 matches
Mail list logo