from:"Brandon Geise"

[Spark CSV DataframeWriter] Quote options for columns on write

2018-03-06 Thread Brandon Geise

My problem is related to the need to have all records in a specific column quoted when writing a CSV. I assumed that by setting the options escapeQuotes to false in the options, that fields would not have any type of quoting applied, even when that delimiter exists. Unless I am misunderstandin

Re: how to create all possible combinations from an array? how to join and explode row array?

2018-03-30 Thread Brandon Geise

Possibly instead of doing the initial grouping, just do a full outer join on zyzy. This is in scala but should be easily convertible to python. val data = Array(("john", "red"), ("john", "blue"), ("john", "red"), ("bill", "blue"), ("bill", "red"), ("sam", "green")) val distData: DataFra

Re: Union of multiple data frames

2018-04-05 Thread Brandon Geise

Maybe something like var finalDF = spark.sqlContext.emptyDataFrame for (df <- dfs){ finalDF = finalDF.union(df) } Where dfs is a Seq of dataframes. From: Cesar Date: Thursday, April 5, 2018 at 2:17 PM To: user Subject: Union of multiple data frames The following code

from_json schema order

2018-08-15 Thread Brandon Geise

Hi, Can someone confirm whether ordering matters between the schema and underlying JSON string? Thanks, Brandon

Re: CSV parser - how to parse column containing json data

2018-08-30 Thread Brandon Geise

If you know your json schema you can create a struct and then apply that using from_json: val json_schema = StructType(Array(StructField(“x”, StringType, true), StructField(“y”, StringType, true), StructField(“z”, IntegerType, true))) .withColumn("_c3", from_json(col("_c3_signals"),json_s

Re: CSV parser - how to parse column containing json data

2018-10-02 Thread Brandon Geise

CSV as well. As per your solution, I am creating SructType only for Json field. So how am I going to mix and match here? i.e. do type inference for all fields but json field and use custom json_schema for json field. On Thu, Aug 30, 2018 at 5:29 PM Brandon Geise wrote: If you

Re: Timestamp Difference/operations

2018-10-15 Thread Brandon Geise

How about select unix_timestamp(timestamp2) – unix_timestamp(timestamp1)? From: Paras Agarwal Date: Monday, October 15, 2018 at 2:41 AM To: John Zhuge Cc: user , dev Subject: Re: Timestamp Difference/operations Thanks John, Actually need full date and time difference not just d

Re: How to address seemingly low core utilization on a spark workload?

2018-11-15 Thread Brandon Geise

I recently came across this (haven’t tried it out yet) but maybe it can help guide you to identify the root cause. https://github.com/groupon/sparklint From: Vitaliy Pisarev Date: Thursday, November 15, 2018 at 10:08 AM To: user Cc: David Markovitz Subject: How to address seemingly low

Re: How to print DataFrame.show(100) to text file at HDFS

2019-04-14 Thread Brandon Geise

Use .limit on the dataframe followed by .write On Apr 14, 2019, 5:10 AM, at 5:10 AM, Chetan Khatri wrote: >Nuthan, > >Thank you for reply. the solution proposed will give everything. for me >is >like one Dataframe show(100) in 3000 lines of Scala Spark code. >However, yarn logs --applicationId

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

You could use the Hadoop API and check if the file exists. From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 11:25 AM To: "user @spark" Subject: Exception handling in Spark Hi, As I understand exception handling in Spark only makes sense if one attempts an action as opposed to lazy

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Date: Tuesday, May 5, 2020 at 12:45 PM To: Brandon Geise Cc: "user @spark" Subject: Re: Exception handling in Spark Thanks Brandon! i should have remembered that. basically the code gets out with sys.exit(1) if it cannot find the file I guess there is no easy way

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

This is what I had in mind. Can you give this approach a try? val df = Try(spark.read.csv("")) match { case Success(df) => df case Failure(e) => throw new Exception("foo") } From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 5:17 PM To: To

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Import scala.util.Try Import scala.util.Success Import scala.util.Failure From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 6:11 PM To: Brandon Geise Cc: Todd Nist , "user @spark" Subject: Re: Exception handling in Spark This is what I get scala> val df = Try(spar

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Match needs to be lower case “match” From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 6:13 PM To: Brandon Geise Cc: Todd Nist , "user @spark" Subject: Re: Exception handling in Spark scala> import scala.util.{Try, Success, Failure} import scala.util.{Try, Success, Fa

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise

Sure, just do case Failure(e) => throw e From: Mich Talebzadeh Date: Tuesday, May 5, 2020 at 6:36 PM To: Brandon Geise Cc: Todd Nist , "user @spark" Subject: Re: Exception handling in Spark Hi Brandon. In dealing with df case Failure(e) => throw new Exception

[Spark CSV DataframeWriter] Quote options for columns on write

Re: how to create all possible combinations from an array? how to join and explode row array?

Re: Union of multiple data frames

from_json schema order

Re: CSV parser - how to parse column containing json data

Re: CSV parser - how to parse column containing json data

Re: Timestamp Difference/operations

Re: How to address seemingly low core utilization on a spark workload?

Re: How to print DataFrame.show(100) to text file at HDFS

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

Re: Exception handling in Spark

15 matches

Site Navigation

Mail list logo

Footer information