If you want to see which partitions exist on disk (without manually
checking), you could write code against the Hadoop FileSystem library to
check. Is that what you are asking?

https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/fs/package-summary.html


On Thu, Feb 25, 2016 at 10:54 AM, Deenar Toraskar <deenar.toras...@gmail.com
> wrote:

> Kevin
>
> I meant the partitions on disk/hdfs not the inmemory RDD/Dataframe
> partitions. If I am right mapPartitions or forEachPartitions would identify
> and operate on the in memory partitions.
>
> Deenar
>
> On 25 February 2016 at 15:28, Kevin Mellott <kevin.r.mell...@gmail.com>
> wrote:
>
>> Once you have loaded information into a DataFrame, you can use the 
>> *mapPartitionsi
>> or forEachPartition *operations to both identify the partitions and
>> operate against them.
>>
>>
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame
>>
>>
>> On Thu, Feb 25, 2016 at 9:24 AM, Deenar Toraskar <
>> deenar.toras...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> How does one check for the presence of a partition in a Spark SQL
>>> partitioned table (save using dataframe.write.partitionedBy("partCol") not
>>> hive compatible tables), other than physically checking the directory on
>>> HDFS or doing a count(*)  with the partition cols in the where clause ?
>>>
>>>
>>> Regards
>>> Deenar
>>>
>>
>>
>

Reply via email to