date:20180719

Re: Live Streamed Code Review today at 11am Pacific

2018-07-19 Thread Holden Karau

Heads up tomorrows Friday review is going to be at 8:30 am instead of 9:30 am because I had to move some flights around. On Fri, Jul 13, 2018 at 12:03 PM, Holden Karau wrote: > This afternoon @ 3pm pacific I'll be looking at review tooling for Spark & > Beam https://www.youtube.com/watch?v=ff8_j

Re: Arrow type issue with Pandas UDF

2018-07-19 Thread Bryan Cutler

Hi Patrick, It looks like it's failing in Scala before it even gets to Python to execute your udf, which is why it doesn't seem to matter what's in your udf. Since you are doing a grouped map udf maybe your group sizes are too big or skewed? Could you try to reduce the size of your groups by addin

Re: Mulitple joins with same Dataframe throws AnalysisException: resolved attribute(s)

2018-07-19 Thread Joel D

One workaround is to rename the fid column for each df before joining. On Thu, Jul 19, 2018 at 9:50 PM wrote: > Spark 2.3.0 has this problem upgrade it to 2.3.1 > > Sent from my iPhone > > On Jul 19, 2018, at 2:13 PM, Nirav Patel wrote: > > corrected subject line. It's missing attribute error

Re: Mulitple joins with same Dataframe throws AnalysisException: resolved attribute(s)

2018-07-19 Thread kanth909

Spark 2.3.0 has this problem upgrade it to 2.3.1 Sent from my iPhone > On Jul 19, 2018, at 2:13 PM, Nirav Patel wrote: > > corrected subject line. It's missing attribute error not ambiguous reference > error. > >> On Thu, Jul 19, 2018 at 2:11 PM, Nirav Patel wrote: >> I am getting attribute

Parquet

2018-07-19 Thread amin mohebbi

We do have two big tables each includes 5 billion of rows, so my question here is should we partition /sort the data and convert it to Parquet before doing any join? Best Regards ... Amin Mohebbi PhD candidate in Software Engineering at unive

Re: Mulitple joins with same Dataframe throws AnalysisException: resolved attribute(s)

2018-07-19 Thread Nirav Patel

corrected subject line. It's missing attribute error not ambiguous reference error. On Thu, Jul 19, 2018 at 2:11 PM, Nirav Patel wrote: > I am getting attribute missing error after joining dataframe 'df2' twice . > > Exception in thread "main" org.apache.spark.sql.AnalysisException: > resolved a

Mulitple joins with same Dataframe throws Ambiguous reference error

2018-07-19 Thread Nirav Patel

I am getting attribute missing error after joining dataframe 'df2' twice . Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) *fid#49 *missing from value#14,value#126,mgrId#15,name#16,d31#109,df2Id#125,df2Id#47,d4#130,d3#129,df1Id#13,name#128, *fId#127* in ope

Re: [STRUCTURED STREAM] Join static dataset in state function (flatMapGroupsWithState)

2018-07-19 Thread Christiaan Ras

Hi Gerard, First, I like to thank you for your fast reply and for directing my question to the proper mailinglist! I established the JDBC connection in the context of the state function (flatMapGroupsWithState). The JDBC connection is made by using the read in the SparkSession. Like below: spar

Compute /Storage Calculation

2018-07-19 Thread Deepu Raj

Hi Team - Any good calculator/Excel to estimate compute and storage requirements for the new spark jobs to be developed. Capacity planning based on:- Job, Data type etc Thanks, Deepu Raj

Arrow type issue with Pandas UDF

2018-07-19 Thread Patrick McCarthy

PySpark 2.3.1 on YARN, Python 3.6, PyArrow 0.8. I'm trying to run a pandas UDF, but I seem to get nonsensical exceptions in the last stage of the job regardless of my output type. The problem I'm trying to solve: I have a column of scalar values, and each value on the same row has a sorted vecto

[Spark Structured Streaming on K8S]: Debug - File handles/descriptor (unix pipe) leaking

2018-07-19 Thread Abhishek Tripathi

Hello All! I am using spark 2.3.1 on kubernetes to run a structured streaming spark job which read stream from Kafka , perform some window aggregation and output sink to Kafka. After job running few hours(5-6 hours), the executor pods is getting crashed which is caused by "Too many open files in

Re: [STRUCTURED STREAM] Join static dataset in state function (flatMapGroupsWithState)

2018-07-19 Thread Gerard Maas

Hi Chris, Could you show the code you are using? When you mention "I like to use a static datasource (JDBC) in the state function" do you refer to a DataFrame from a JDBC source or an independent JDBC connection? The key point to consider is that the flatMapGroupsWithState function must be serial

Re: Live Streamed Code Review today at 11am Pacific

Re: Arrow type issue with Pandas UDF

Re: Mulitple joins with same Dataframe throws AnalysisException: resolved attribute(s)

Re: Mulitple joins with same Dataframe throws AnalysisException: resolved attribute(s)

Parquet

Re: Mulitple joins with same Dataframe throws AnalysisException: resolved attribute(s)

Mulitple joins with same Dataframe throws Ambiguous reference error

Re: [STRUCTURED STREAM] Join static dataset in state function (flatMapGroupsWithState)

Compute /Storage Calculation

Arrow type issue with Pandas UDF

[Spark Structured Streaming on K8S]: Debug - File handles/descriptor (unix pipe) leaking

Re: [STRUCTURED STREAM] Join static dataset in state function (flatMapGroupsWithState)

12 matches

Site Navigation

Mail list logo

Footer information