Re: filter by dict() key in pySpark

2016-03-15 Thread Davies Liu
Another solution could be using left-semi join: keys = sqlContext.createDataFrame(dict.keys()) DF2 = DF1.join(keys, DF1.a = keys.k, "leftsemi") On Wed, Feb 24, 2016 at 2:14 AM, Franc Carter wrote: > > A colleague found how to do this, the approach was to use a udf() > > cheers > > On 21 February

Re: filter by dict() key in pySpark

2016-02-24 Thread Franc Carter
A colleague found how to do this, the approach was to use a udf() cheers On 21 February 2016 at 22:41, Franc Carter wrote: > > I have a DataFrame that has a Python dict() as one of the columns. I'd > like to filter he DataFrame for those Rows that where the dict() contains a > specific value. e