Re: Help with Dynamic writing

Chamikara Jayalath Tue, 20 Mar 2018 10:16:02 -0700

Hi Eila,

Please find my comments inline.


On Tue, Mar 20, 2018 at 8:02 AM OrielResearch Eila Arich-Landkof <
[email protected]> wrote:

> Hello all,
>
> It was nice to meet you last week!!!
>
>
It was nice to meet you as well :)


> I am writing genomic pCollection that is created from bigQuery to a
> folder. Following is the code with output so you can run it with any small
> BQ table and let me know what your thoughts are:
>
> rows = [{u'index': u'GSM2313641', u'SNRPCP14': 0},{u'index':
> u'GSM2316666', u'SNRPCP14': 0},{u'index': u'GSM2312355', u'SNRPCP14':
> 0},{u'index': u'GSM2312372', u'SNRPCP14': 0}]
>
> rows[1].keys()
> # output:  [u'index', u'SNRPCP14']
>
> # you can change `archs4.results_20180308_ to any other table name with
> index column
> queries2 = rows | beam.Map(lambda x:
> (beam.io.Read(beam.io.BigQuerySource(project='orielresearch-188115',
> use_standard_sql=False, query=str('SELECT * FROM
> `archs4.results_20180308_*` where index=\'%s\'' % (x["index"])))),
>                                str('gs://archs4/output/'+x["index"]+'/')))
>

I don't think above code will work (not portable across runners at least).
BigQuerySource (along with Read transform) have to be applied to a Pipeline
object. So probably change this to a for loop that creates a set of read
transforms and use Flatten to create a single PCollection.


>
> queries2
> # output: a list of pCollection and the path to write the pCollection data
> to
>
> [(<Read(PTransform) label=[Read] at 0x7fa6990fb7d0>,
>   'gs://archs4/output/GSM2313641/'),
>  (<Read(PTransform) label=[Read] at 0x7fa6990fb950>,
>   'gs://archs4/output/GSM2316666/'),
>  (<Read(PTransform) label=[Read] at 0x7fa6990fb9d0>,
>   'gs://archs4/output/GSM2312355/'),
>  (<Read(PTransform) label=[Read] at 0x7fa6990fbb50>,
>   'gs://archs4/output/GSM2312372/')]
>
>
What you got here is a PCollection of PTransform objects which is not
useful.


>
> *# this is my challenge*
> queries2 | 'write to relevant path' >> beam.io.WriteToText("SECOND COLUMN")
>
>
Once you update above code you will get a proper PCollection of elements
read from BigQuery. You can transform and write this (to files, BQ, or any
other sink) as needed.
Please see programming guide on how to write to text files (section 5.3 and
click Python tab): https://beam.apache.org/documentation/programming-guide/

Thanks,
Cham


> Do you have any idea how to sink the data to a text file? I have tried few
> other options and was stuck at the write transform
>
> Any advice is very appreciated.
>
> Thanks,
> Eila
>
>
>
> --
> Eila
> www.orielresearch.org
> https://www.meetup.com/Deep-Learning-In-Production/
>

Re: Help with Dynamic writing

Reply via email to