HI ,
In order to do that you can write code to read/list a HDFS directory first
, then list its sub-directories . In this way using custom logic ,first
identify the latest year/month/version , then read the avro in that dir in
a DF, then add year/month/version to that DF using withColumn.
Regard
Thank you Daniel. Unfortunately, we don't use Hive but bare (Avro) files.
On 11/17/2016 08:47 PM, Daniel Haviv wrote:
Hi Samy,
If you're working with hive you could create a partitioned table and update
it's partitions' locations to the last version so when you'll query it using
spark, you'll
Hi Samy,
If you're working with hive you could create a partitioned table and update
it's partitions' locations to the last version so when you'll query it
using spark, you'll always get the latest version.
Daniel
On Thu, Nov 17, 2016 at 9:05 PM, Samy Dindane wrote:
> Hi,
>
> I have some data p