Re: Define size partitions

2015-01-30 Thread Davies Liu
I think the new API sc. binaryRecords [1] (added in 1.2) can help in this case. [1] http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.binaryRecords Davies On Fri, Jan 30, 2015 at 6:50 AM, Guillermo Ortiz wrote: > Hi, > > I want to process some files, there're a

Re: Define size partitions

2015-01-30 Thread Rishi Yadav
if you are only concerned about big partition size you can specify number of partitions as an additional parameter while loading files form hdfs. On Fri, Jan 30, 2015 at 9:47 AM, Sven Krasser wrote: > You can also use your InputFormat/RecordReader in Spark, e.g. using > newAPIHadoopFile. See her

Re: Define size partitions

2015-01-30 Thread Sven Krasser
You can also use your InputFormat/RecordReader in Spark, e.g. using newAPIHadoopFile. See here: https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext . -Sven On Fri, Jan 30, 2015 at 6:50 AM, Guillermo Ortiz wrote: > Hi, > > I want to process some files, there're