Hi all,
I am trying to run pyspark with pypy, and it is work when using spark-1.3.1
but failed when using spark-1.4.1 and spark-1.5.1
my pypy version:
$ /usr/bin/pypy --version
Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
[PyPy 2.2.1 with GCC 4.8.4]
works with spark-1.3.1
$ PYSP
; investigate so that we can update the documentation or produce a fix to
> restore compatibility with earlier PyPy builds?
>
> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan
> wrote:
>
>> Hi all,
>>
>> I am trying to run pyspark with pypy, and it is work whe
stion to run advanced test?
On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan wrote:
> Thanks for your quickly reply.
>
> I will test several pypy versions and report the result later.
>
> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen wrote:
>
>> I noticed that you're u
>> pypy-2.2.1
>> pypy-2.3
>> pypy-2.3.1
>> pypy-2.4.0
>> pypy-2.5.0
>> pypy-2.5.1
>> pypy-2.6.0
>> pypy-2.6.1
>>
>> I run
>>
>> $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
>> /path/to/spark-1.5.1/bin/p
spark version: spark-1.5.2-bin-hadoop2.6
python version: 2.7.9
os: ubuntu 14.04
code to reproduce error
# write.py
import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df = sqlc.range(10)
df1 = df.withColumn('a', df['id'] * 2)
df1.write.partitionBy('id').parquet('./data')
#
https://issues.apache.org/jira/browse/SPARK-12231
this is my first time to create JIRA ticket.
is this ticket proper?
thanks
On Tue, Dec 8, 2015 at 9:59 PM, Reynold Xin wrote:
> Can you create a JIRA ticket for this? Thanks.
>
>
> On Tue, Dec 8, 2015 at 5:25 PM, Chang Ya-
are you trying to do dataframe boolean expression?
please use '&' for 'and', '|' for 'or', '~' for 'not' when building
DataFrame boolean expressions.
example:
>>> df = sqlContext.range(10)
>>> df.where( (df.id==1) | ~(df.id==1))
DataFrame[id: bigint]
On Wed, Dec 16, 2015 at 4:32 PM, Allen Zhang
python version: 2.7.9
os: ubuntu 14.04
spark: 1.5.2
I run a standalone spark on localhost, and use the following code to access
sc.defaultParallism
# a.py
import pyspark
sc = pyspark.SparkContext()
print(sc.defaultParallelism)
and use the following command to submit
$ spark-submit --master spa
python version: 2.7.9
os: ubuntu 14.04
spark: 1.5.2
```
import pyspark
from pyspark.sql import Row
from pyspark.sql.types import StructType, IntegerType
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
schema1 = StructType() \
.add('a', IntegerType()) \
.add('b', IntegerType())
s