Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread Miao Wang
+1 Openlnx’s comment on implementation. Only if we have an external timing synchronization service and enforce all clients using the service, timestamps of different clients are not comparable. So, there are two asks: 1). Whether to have a timestamp based API for delta reading; 2). How to enfor

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread Peter Vary
Newby here, but if I understand correctly, the client knows the previous snapshot and the corresponding timestamp. It could be the responsibility of the client to generate a new timestamp which is higher or equal than the previous one. There might be checks implemented on commit to prevent smaller

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread OpenInx
I agree that it's helpful to allow users to read the incremental delta based timestamp, as Jingsong said timestamp is more friendly. My question is how to implement this ? If just attach the client's timestamp to the iceberg table when committing, then different clients may have different tim

Re: [DISCUSS] September board report

2020-09-08 Thread Jingsong Li
+1 Thanks Ryan for reporting. On Wed, Sep 9, 2020 at 3:59 AM Ryan Blue wrote: > Hi everyone, > > It’s time for our board report, which I think is the last monthly report. > Here’s what I have so far. Please comment and reply with anything that I’ve > missed! > > rb > Description: > > Apache Ice

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread Jingsong Li
+1 for timestamps are linear, in implementation, maybe the writer only needs to look at the previous snapshot timestamp. We're trying to think of iceberg as a message queue, Let's take the popular queue Kafka as an example, Iceberg has snapshotId and timestamp, corresponding, Kafka has offset and

[DISCUSS] September board report

2020-09-08 Thread Ryan Blue
Hi everyone, It’s time for our board report, which I think is the last monthly report. Here’s what I have so far. Please comment and reply with anything that I’ve missed! rb Description: Apache Iceberg is a table format for huge analytic datasets that is designed for high performance and ease of

Iceberg's type system

2020-09-08 Thread Chen Song
Hi I have a general question on Iceberg's data type system. Iceberg has a well defined type spec which can be mapped to types in Avro, Parquet, ORC. If users want to use Iceberg and extend the universe of data types (e.g., adding custom typ

Re: Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread Sud
We are using incremental read for iceberg tables which gets quite few appends ( ~500- 1000 per hour) . but instead of using timestamp we use snapshot ids and track state of last read snapshot Id. We are using timestamp as fallback when the state is incorrect, but as you mentioned if timestamps are

Timestamp Based Incremental Reading in Iceberg ...

2020-09-08 Thread Gautam
Hello Devs, We are looking into adding workflows that read data incrementally based on commit time. The ability to read deltas between start / end commit timestamps on a table and ability to resume reading from last read end timestamp. In that regard, we need the timestamps to be