Currently, AsterixDB does not have a clean way to extract query results or dump a dataset to a storage device. The only channel provided currently is the Query Service (i.e., running the query and writing it somehow at the client side). We need to support a way to write query results (or dump a dataset) in parallel to a storage device.
To illustrate, say we want to do the following: > USE CopyToDataverse; COPY ColumnDataset > TO localfs > PATH("localhost:///media/backup/CopyToResult") > WITH { > "format" : "json" > }; In this example, the data in ColumnDataset will be written in each node at the provided path localhost:///media/backup/CopyToResult. Simply, each node will write its own partitions for the data stored in ColumnDataset locally. The written files will be in raw JSON format. Another example: > USE CopyToDataverse; > COPY (SELECT cd.uid uid, > cd.sensor_info.name name, > to_bigint(cd.sensor_info.battery_status) > battery_status > FROM ColumnDataset cd > ) toWrite > TO s3 > PATH("CopyToResult/" || to_string(b)) > OVER ( > PARTITION BY toWrite.battery_status b > ORDER BY toWrite.name > ) > WITH { > "format" : "json", > "compression": "gzip", > "max-objects-per-file": 100, > "container": "myBucket", > "accessKeyId": "<access-key>", > "secretAccessKey": "<secret-key>", > "region": "us-west-2" > }; The second example shows how to write the result of a query and also partition the result so that each partition will be written to a certain path. In this example, we partition by the battery_status (say an integer value from 0 to 100). The final result will be written to myBucke in Amazon S3. Each partition will have the path CopyToResult/<battery_status>. For example CopyToResult/0, CopyToResult/1 ..., CopyToResult/99, CopyToResult/100). This partitioning scheme can be useful if a user wants to exploit dynamic prefixes (external filters) (see ASTERIXDB-3073 <https://issues.apache.org/jira/browse/ASTERIXDB-3073>). Additionally, the records in each partition will be ordered by the sensor_name (toWrite.name). Note that this ordering isn't global but per partition. Also, the written files will be compressed using *gzip* and each file should have at most 100 records max (*max-objects-per-file*). EPIC: ASTERIXDB-3286 <https://issues.apache.org/jira/browse/ASTERIXDB-3286> -- *Regards,* Wail Alkowaileet