date:20210902

Re: JavaSerializerInstance is slow

2021-09-02 Thread Antonin Delpeuch (lists)

Hi Kohki, Serialization of tasks happens in local mode too and as far as I am aware there is no way to disable this (although it would definitely be useful in my opinion). You can see the local mode as a testing mode, in which you would want to catch any serialization errors, before they appear i

type mismatch

2021-09-02 Thread igyu

val schemas = createSchemas(config) val arr = new Array[String](schemas.size()) lines.map(x => { val obj = JSON.parseObject(x) val vs = new Array[Any](schemas.size()) for (i <- 0 until schemas.size()) { arr(i) = schemas.get(i).name vs(i) = x.getString(schemas.get(i).name) } }

JavaSerializerInstance is slow

2021-09-02 Thread Kohki Nishio

I'm seeing many threads doing deserialization of a task, I understand since lambda is involved, we can't use Kryo for those purposes. However I'm running it in local mode, this serialization is not really necessary, no? Is there any trick I can apply to get rid of this thread contention ? I'm seei

Re: Appending a static dataframe to a stream create Parquet file fails

2021-09-02 Thread Jungtaek Lim

Hi, The file stream sink maintains the metadata in the output directory. The metadata retains the list of files written by the streaming query, and Spark reads the metadata on listing the files to read. This is to guarantee end-to-end exactly once on writing files in the streaming query. There co

Unsubscribe

2021-09-02 Thread 周翔

Unsubscribe

Re: Can’t write to PVC in K8S

2021-09-02 Thread Bjørn Jørgensen

Well, I have tried almost everything the last 2 days now. There is no user spark, and whatever I do with the executor image it only runs for 2 minutes in k8s and then restarts. The problem seems to be the nogroup that is writing files from executors. drwxr-xr-x 2185 nogroup4096 Sep

Re: Get application metric from Spark job

2021-09-02 Thread Haryani, Akshay

Hi Aurélien, Spark has endpoints to expose the spark application metrics. These endpoints can be used as a rest API. You can read more about these here: https://spark.apache.org/docs/3.1.1/monitoring.html#rest-api Additionally, If you want to build your own custom metrics, you can explore spark

Reading CSV and Transforming to Parquet Issue

2021-09-02 Thread ☼ R Nair

All, This is very surprising and I am sure I might be doing something wrong. The issue is, the following code is taking 8 hours. It reads a CSV file, takes the phone number column, extracts the first four digits and then partitions based on the four digits (phoneseries) and writes to Parquet. Any

Get application metric from Spark job

2021-09-02 Thread Aurélien Mazoyer

Hi community, I would like to collect information about the execution of a Spark job while it is running. Could I define some kind of application metrics (such as a counter that would be incremented in my code) that I could retrieve regularly while the job is running? Thank you for help, Aurelie

Appending a static dataframe to a stream create Parquet file fails

2021-09-02 Thread eugen . wintersberger

Hi all, I recently stumbled about a rather strange problem with streaming sources in one of my tests. I am writing a Parquet file from a streaming source and subsequently try to append the same data but this time from a static dataframe. Surprisingly, the number of rows in the Parquet file remai

Re: Connection reset by peer : failed to remove cache rdd

2021-09-02 Thread Harsh Sharma

On 2021/09/02 06:00:26, Harsh Sharma wrote: > Please Find reply : > Do you know when in your application lifecycle it happens? Spark SQL or > > Structured Streaming? > > ans :its Spark SQL > > Do you use broadcast variables ? > > ans : yes we are using broadcast variables > or are the er

Re: JavaSerializerInstance is slow

type mismatch

JavaSerializerInstance is slow

Re: Appending a static dataframe to a stream create Parquet file fails

Unsubscribe

Re: Can’t write to PVC in K8S

Re: Get application metric from Spark job

Reading CSV and Transforming to Parquet Issue

Get application metric from Spark job

Appending a static dataframe to a stream create Parquet file fails

Re: Connection reset by peer : failed to remove cache rdd

11 matches

Site Navigation

Mail list logo

Footer information