Hello all,
It was nice to meet you last week!!!
I am writing genomic pCollection that is created from bigQuery to a folder.
Following is the code with output so you can run it with any small BQ table
and let me know what your thoughts are:
rows = [{u'index': u'GSM2313641', u'SNRPCP14': 0},{u'index': u'GSM2316666',
u'SNRPCP14': 0},{u'index': u'GSM2312355', u'SNRPCP14': 0},{u'index':
u'GSM2312372', u'SNRPCP14': 0}]
rows[1].keys()
# output: [u'index', u'SNRPCP14']
# you can change `archs4.results_20180308_ to any other table name with
index column
queries2 = rows | beam.Map(lambda x:
(beam.io.Read(beam.io.BigQuerySource(project='orielresearch-188115',
use_standard_sql=False, query=str('SELECT * FROM
`archs4.results_20180308_*` where index=\'%s\'' % (x["index"])))),
str('gs://archs4/output/'+x["index"]+'/')))
queries2
# output: a list of pCollection and the path to write the pCollection data
to
[(<Read(PTransform) label=[Read] at 0x7fa6990fb7d0>,
'gs://archs4/output/GSM2313641/'),
(<Read(PTransform) label=[Read] at 0x7fa6990fb950>,
'gs://archs4/output/GSM2316666/'),
(<Read(PTransform) label=[Read] at 0x7fa6990fb9d0>,
'gs://archs4/output/GSM2312355/'),
(<Read(PTransform) label=[Read] at 0x7fa6990fbb50>,
'gs://archs4/output/GSM2312372/')]
*# this is my challenge*
queries2 | 'write to relevant path' >> beam.io.WriteToText("SECOND COLUMN")
Do you have any idea how to sink the data to a text file? I have tried few
other options and was stuck at the write transform
Any advice is very appreciated.
Thanks,
Eila
--
Eila
www.orielresearch.org
https://www.meetup.com/Deep-Learning-In-Production/