Excellent! On Wed, Nov 1, 2023 at 2:27 PM Wail Alkowaileet <wael....@gmail.com> wrote:
> AsterixDB now supports COPY TO. > > On Thu, Oct 26, 2023 at 10:05 AM Till Westmann <ti...@apache.org> wrote: > > > Agreed (+1) - this functionality will significantly reduce the pain of > > moving data in and out of object storage. > > > > > On Oct 25, 2023, at 10:04 PM, Mike Carey <dtab...@gmail.com> wrote: > > > > > > +1 -- Now maybe users will stop trying to retrieve huge results and > > wondering why the UI is choking! :-) This capability is actually long > > overdue. > > > > > > On 10/24/23 9:53 AM, Wail Alkowaileet wrote: > > >> Currently, AsterixDB does not have a clean way to extract query > results > > or > > >> dump a dataset to a storage device. The only channel provided > currently > > is > > >> the Query Service (i.e., running the query and writing it somehow at > the > > >> client side). We need to support a way to write query results (or > dump a > > >> dataset) in parallel to a storage device. > > >> > > >> To illustrate, say we want to do the following: > > >> > > >>> USE CopyToDataverse; > > >> COPY ColumnDataset > > >>> TO localfs > > >>> PATH("localhost:///media/backup/CopyToResult") > > >>> WITH { > > >>> "format" : "json" > > >>> }; > > >> In this example, the data in ColumnDataset will be written in each > node > > at > > >> the provided path localhost:///media/backup/CopyToResult. Simply, each > > node > > >> will write its own partitions for the data stored in ColumnDataset > > locally. > > >> The written files will be in raw JSON format. > > >> > > >> Another example: > > >> > > >>> USE CopyToDataverse; > > >>> COPY (SELECT cd.uid uid, > > >>> cd.sensor_info.name name, > > >>> to_bigint(cd.sensor_info.battery_status) > > >>> battery_status > > >>> FROM ColumnDataset cd > > >>> ) toWrite > > >>> TO s3 > > >>> PATH("CopyToResult/" || to_string(b)) > > >>> OVER ( > > >>> PARTITION BY toWrite.battery_status b > > >>> ORDER BY toWrite.name > > >>> ) > > >>> WITH { > > >>> "format" : "json", > > >>> "compression": "gzip", > > >>> "max-objects-per-file": 100, > > >>> "container": "myBucket", > > >>> "accessKeyId": "<access-key>", > > >>> "secretAccessKey": "<secret-key>", > > >>> "region": "us-west-2" > > >>> }; > > >> The second example shows how to write the result of a query and also > > >> partition the result so that each partition will be written to a > certain > > >> path. In this example, we partition by the battery_status (say an > > integer > > >> value from 0 to 100). The final result will be written to myBucke in > > Amazon > > >> S3. > > >> > > >> Each partition will have the path CopyToResult/<battery_status>. For > > >> example CopyToResult/0, CopyToResult/1 ..., CopyToResult/99, > > >> CopyToResult/100). This partitioning scheme can be useful if a user > > wants > > >> to exploit dynamic prefixes (external filters) (see ASTERIXDB-3073 > > >> <https://issues.apache.org/jira/browse/ASTERIXDB-3073>). > > >> > > >> Additionally, the records in each partition will be ordered by the > > >> sensor_name (toWrite.name). Note that this ordering isn't global but > per > > >> partition. > > >> > > >> Also, the written files will be compressed using *gzip* and each file > > >> should have at most 100 records max (*max-objects-per-file*). > > >> > > >> EPIC: ASTERIXDB-3286< > > https://issues.apache.org/jira/browse/ASTERIXDB-3286> > > > > > > -- > > *Regards,* > Wail Alkowaileet >