+1, this will be great. Long ago we had WRITE TO but it was very limited and troublesome. This is a great improvement over that older facility.
On Oct 24, 2023 at 09:53:40, Wail Alkowaileet <[email protected]> wrote: > Currently, AsterixDB does not have a clean way to extract query results or > dump a dataset to a storage device. The only channel provided currently is > the Query Service (i.e., running the query and writing it somehow at the > client side). We need to support a way to write query results (or dump a > dataset) in parallel to a storage device. > > To illustrate, say we want to do the following: > > USE CopyToDataverse; > > > COPY ColumnDataset > > TO localfs > > PATH("localhost:///media/backup/CopyToResult") > > WITH { > > "format" : "json" > > }; > > > In this example, the data in ColumnDataset will be written in each node at > the provided path localhost:///media/backup/CopyToResult. Simply, each node > will write its own partitions for the data stored in ColumnDataset locally. > The written files will be in raw JSON format. > > Another example: > > USE CopyToDataverse; > > COPY (SELECT cd.uid uid, > > cd.sensor_info.name name, > > to_bigint(cd.sensor_info.battery_status) > > battery_status > > FROM ColumnDataset cd > > ) toWrite > > TO s3 > > PATH("CopyToResult/" || to_string(b)) > > OVER ( > > PARTITION BY toWrite.battery_status b > > ORDER BY toWrite.name > > ) > > WITH { > > "format" : "json", > > "compression": "gzip", > > "max-objects-per-file": 100, > > "container": "myBucket", > > "accessKeyId": "<access-key>", > > "secretAccessKey": "<secret-key>", > > "region": "us-west-2" > > }; > > > The second example shows how to write the result of a query and also > partition the result so that each partition will be written to a certain > path. In this example, we partition by the battery_status (say an integer > value from 0 to 100). The final result will be written to myBucke in Amazon > S3. > > Each partition will have the path CopyToResult/<battery_status>. For > example CopyToResult/0, CopyToResult/1 ..., CopyToResult/99, > CopyToResult/100). This partitioning scheme can be useful if a user wants > to exploit dynamic prefixes (external filters) (see ASTERIXDB-3073 > < > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3073__;!!CzAuKJ42GuquVTTmVmPViYEvSg!JX_fnvdEqKOBNcNltYJtuNHqGGHc4fyOkgw7JbMQu97xtsGlYfO-FZYaApn8kneml74lriyZ0Vdm4QA$ > >). > > Additionally, the records in each partition will be ordered by the > sensor_name (toWrite.name). Note that this ordering isn't global but per > partition. > > Also, the written files will be compressed using *gzip* and each file > should have at most 100 records max (*max-objects-per-file*). > > EPIC: ASTERIXDB-3286 < > https://urldefense.com/v3/__https://issues.apache.org/jira/browse/ASTERIXDB-3286__;!!CzAuKJ42GuquVTTmVmPViYEvSg!JX_fnvdEqKOBNcNltYJtuNHqGGHc4fyOkgw7JbMQu97xtsGlYfO-FZYaApn8kneml74lriyZQEc5iKM$ > > > -- > > *Regards,* > Wail Alkowaileet >
