Re: Spark SQL partitioned tables - check for partition

Kevin Mellott Thu, 25 Feb 2016 09:00:57 -0800

If you want to see which partitions exist on disk (without manually
checking), you could write code against the Hadoop FileSystem library to
check. Is that what you are asking?


https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/fs/package-summary.html


On Thu, Feb 25, 2016 at 10:54 AM, Deenar Toraskar <deenar.toras...@gmail.com
> wrote:

> Kevin
>
> I meant the partitions on disk/hdfs not the inmemory RDD/Dataframe
> partitions. If I am right mapPartitions or forEachPartitions would identify
> and operate on the in memory partitions.
>
> Deenar
>
> On 25 February 2016 at 15:28, Kevin Mellott <kevin.r.mell...@gmail.com>
> wrote:
>
>> Once you have loaded information into a DataFrame, you can use the 
>> *mapPartitionsi
>> or forEachPartition *operations to both identify the partitions and
>> operate against them.
>>
>>
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame
>>
>>
>> On Thu, Feb 25, 2016 at 9:24 AM, Deenar Toraskar <
>> deenar.toras...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> How does one check for the presence of a partition in a Spark SQL
>>> partitioned table (save using dataframe.write.partitionedBy("partCol") not
>>> hive compatible tables), other than physically checking the directory on
>>> HDFS or doing a count(*)  with the partition cols in the where clause ?
>>>
>>>
>>> Regards
>>> Deenar
>>>
>>
>>
>

Re: Spark SQL partitioned tables - check for partition

Reply via email to