Hi Deepark, Thank for replying. The way to write into alluxio is df.write.mode(SaveMode.Append).partitionBy("network_id", "time").parquet("alluxio://master1:19999/FACT_ADMIN_HOURLY”)
I partition by 2 columns and store. I just want when I write it automatic write a size properly for what I already set in Alluxio 512MB per block. > On Jul 1, 2016, at 11:01 AM, Deepak Sharma <deepakmc...@gmail.com> wrote: > > Before writing coalesing your rdd to 1 . > It will create only 1 output file . > Multiple part file happens as all your executors will be writing their > partitions to separate part files. > > Thanks > Deepak > > On 1 Jul 2016 8:01 am, "Chanh Le" <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi everyone, > I am using Alluxio for storage. But I am little bit confuse why I am do set > block size of alluxio is 512MB and my file part only few KB and too many part. > Is that normal? Because I want to read it fast? Is that many part effect the > read operation? > How to set the size of file part? > > Thanks. > Chanh > > > > > > <Screen_Shot_2016-07-01_at_9_24_55_AM.png>