date:20220712

How use pattern matching in spark

2022-07-12 Thread Sid

Hi Team, I have a dataset like the below one in .dat file: 13/07/2022abc PWJ PWJABC 513213217ABC GM20 05. 6/20/39 #01000count Now I want to extract the header and tail records which I was able to do it. Now, from the header, I need to extract the date and match it with the current system d

Re: How reading works?

2022-07-12 Thread Sid

Yeah, I understood that now. Thanks for the explanation, Bjorn. Sid On Wed, Jul 6, 2022 at 1:46 AM Bjørn Jørgensen wrote: > Ehh.. What is "*duplicate column*" ? I don't think Spark supports that. > > duplicate column = duplicate rows > > > tir. 5. jul. 2022 kl. 22:13 skrev Bjørn Jørgensen < >

Re: reading each JSON file from dataframe...

2022-07-12 Thread Muthu Jayakumar

Hello Ayan, Thank you for the suggestion. But, I would lose correlation of the JSON file with the other identifier fields. Also, if there are too many files, will it be an issue? Plus, I may not have the same schema across all the files. Hello Enrico, >how does RDD's mapPartitions make a differe

Spark streaming pending mircobatches queue max length

2022-07-12 Thread Anil Dasari

Hello, Spark is adding entry to pending microbatches queue at periodic batch interval. Is there config to set the max size for pending microbatches queue ? Thanks

Re: reading each JSON file from dataframe...

2022-07-12 Thread ayan guha

Another option is: 1. collect the dataframe with file path 2. create a list of paths 3. create a new dataframe with spark.read.json and pass the list of path This will save you lots of headache Ayan On Wed, Jul 13, 2022 at 7:35 AM Enrico Minack wrote: > Hi, > > how does RDD's mapPartitions m

Re: reading each JSON file from dataframe...

2022-07-12 Thread Enrico Minack

Hi, how does RDD's mapPartitions make a difference regarding 1. and 2. compared to Dataset's mapPartitions / map function? Enrico Am 12.07.22 um 22:13 schrieb Muthu Jayakumar: Hello Enrico, Thanks for the reply. I found that I would have to use `mapPartitions` API of RDD to perform this s

Re: reading each JSON file from dataframe...

2022-07-12 Thread Muthu Jayakumar

Hello Enrico, Thanks for the reply. I found that I would have to use `mapPartitions` API of RDD to perform this safely as I have to 1. Read each file from GCS using HDFS FileSystem API. 2. Parse each JSON record in a safe manner. For (1) to work, I do have to broadcast HadoopConfiguration from sp

[Spark][Core] Resource Allocation

2022-07-12 Thread Amin Borjian

I have some problems that I am looking for if there is no solution for them (due to the current implementation) or if there is a way and I was not aware of it. 1) Currently, we can enable and configure dynamic resource allocation based on below documentation. https://spark.apache.org/docs/late

How use pattern matching in spark

Re: How reading works?

Re: reading each JSON file from dataframe...

Spark streaming pending mircobatches queue max length

Re: reading each JSON file from dataframe...

Re: reading each JSON file from dataframe...

Re: reading each JSON file from dataframe...

[Spark][Core] Resource Allocation

8 matches

Site Navigation

Mail list logo

Footer information