Notes: Attendees/agenda building Wes (TwoSigma): - Rest API - Roadmap - communicate with community Uwe (Blue Yonder): - git tag for versioning Julien (Dremio): - Timestamp: - REST API - Roadmap
Discussion: - git tag for versioning - development packages version names are based on latest tag in history from master + commit count since then. - since the release tag is in a branch it goes from an older version and is misleading - options: - add a tag {release version}.post on the first commit after the release to get a better dev version string - rebase master on top of the last release (0.4) - we decided to rebase master (the only change is adding the commit that updates the version number in pom files) - Timestamp in Arrow and Parquet: - Both support "Timezone Naive” timestamps (aka “timestamp without timezone” in SQL) - in Arrow when timezone field is missing in Timestamp type: https://github.com/apache/arrow/blob/5899800f53f3c3fffc0db95294c4f0eb0e556228/format/Schema.fbs#L117 - in Parquet (proposed PR) when isAdjustedToUTC is false: https://github.com/apache/parquet-format/pull/51/files#diff-0f9d1b5347959e15259da7ba8f4b6252R242 - They also both support a “Timezone aware” timestamp (aka “timestamp with timezone” in SQL) - in Arrow when the timezone field is present with the original timezone. - in Parquet when isAdjustedToUTC is true - So there is more information in Arrow and it requires this extra information since its absence means “timezone naive” - conclusion: - when writing to parquet we should use isAdjustedToUTC = false only if there is no knowledge of the timezone - when reading from parquet we will populate timezone with UTC when isAdjustedToUTC == true (and leave it missing otherwise) - REST API: - review doc here: https://docs.google.com/document/d/1N4TP6zARRs2c4_h-4WqCqIFVPQwmxOmXel1V3AxpGok/edit# - Roadmap: - todo: blog post to describe the direction of arrow - among those: - REST API and generalizing messaging - C++ analytics library for interacting with ARROW memory. Tools for wrapping existing data structure (array of doubles) - arrow for GPU - Arrow ODBC interface: turbodbc - Spark integration improvements: group UDFS etc On Wed, May 31, 2017 at 9:16 AM, Julien Le Dem <jul...@dremio.com> wrote: > The arrow sync is at 9:30 am PT today on google hangout > https://hangouts.google.com/hangouts/_/dremio.com/arrow > > -- > Julien > -- Julien