date:20210805

Re: Performance Problems Migrating to S3A Committers

2021-08-05 Thread James Yu

See this ticket https://issues.apache.org/jira/browse/HADOOP-17201. It may help your team. From: Johnny Burns Sent: Tuesday, June 22, 2021 3:41 PM To: user@spark.apache.org Cc: data-orchestration-team Subject: Performance Problems Migrating to S3A Committers H

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-05 Thread Sean Owen

Doesn't a persist break stages? On Thu, Aug 5, 2021, 11:40 AM Tom Graves wrote: > As Sean mentioned its only available at Stage level but you said you don't > want to shuffle so splitting into stages doesn't help you. Without more > details it seems like you could "hack" this by just requesting

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-05 Thread Tom Graves

As Sean mentioned its only available at Stage level but you said you don't want to shuffle so splitting into stages doesn't help you. Without more details it seems like you could "hack" this by just requesting an executor with 1 GPU (allowing 2 tasks per gpu) and 2 CPUs and the one task would

Re: How can transform RDD[Seq[String]] to RDD[ROW]

2021-08-05 Thread Artemis User

I am not sure why you need to create an RDD first. You can create a data frame directly from csv file, for instance: spark.read.format("csv").option("header","true").schema(yourSchema).load(ftpUrl) -- ND On 8/5/21 3:14 AM, igyu wrote: val ftpUrl ="ftp://test:test@ip:21/upload/test/_temporary/

Reading SPARK 3.1.x generated parquet in SPARK 2.4.x

2021-08-05 Thread Gourav Sengupta

Hi, we are trying to migrate some of the data lake pipelines to run in SPARK 3.x, where as the dependent pipelines using those tables will be still running in SPARK 2.4.x for sometime to come. Does anyone know of any issues that can happen: 1. when reading Parquet files written in 3.1.x in SPARK

Re: How can transform RDD[Seq[String]] to RDD[ROW]

2021-08-05 Thread suresh kumar pathak

May be this link will help you. https://stackoverflow.com/questions/41898144/convert-rddstring-to-rddrow-to-dataframe-spark-scala On Thu, Aug 5, 2021 at 12:46 PM igyu wrote: > val ftpUrl = > "ftp://test:test@ip:21/upload/test/_temporary/0/_temporary/task_2019124756_0002_m_00_0/*"; > val

How can transform RDD[Seq[String]] to RDD[ROW]

2021-08-05 Thread igyu

val ftpUrl = "ftp://test:test@ip:21/upload/test/_temporary/0/_temporary/task_2019124756_0002_m_00_0/*"; val rdd = spark.sparkContext.wholeTextFiles(ftpUrl) val value = rdd.map(_._2).map(csv=>csv.split(",").toSeq) val schemas = StructType(List( new StructField("id", DataTypes.Strin

Re: Performance Problems Migrating to S3A Committers

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

Re: How can transform RDD[Seq[String]] to RDD[ROW]

Reading SPARK 3.1.x generated parquet in SPARK 2.4.x

Re: How can transform RDD[Seq[String]] to RDD[ROW]

How can transform RDD[Seq[String]] to RDD[ROW]

7 matches

Site Navigation

Mail list logo

Footer information