Hello there,

I have a dataframe with the following...

+-----------------------------+-------------------------------------------------------------------+-------------------------------+
|entity_id                    |file_path
                       |other_useful_id                |
+-----------------------------+-------------------------------------------------------------------+-------------------------------+
|id-01f7pqqbxddb3b1an6ntyqx6mg|gs://bucket1/path/to/id-01g4he5cb4xqn6s1999k6y1vbd/file_result.json|id-2-01g4he5cb4xqn6s1999k6y1vbd|
|id-01f7pqgbwms4ajmdtdedtwa3mf|gs://bucket1/path/to/id-01g4he5cbh52che104rwy603sr/file_result.json|id-2-01g4he5cbh52che104rwy603sr|
|id-01f7pqqbxejt3ef4ap9qcs78m5|gs://bucket1/path/to/id-01g4he5cbqmdv7dnx46sebs0gt/file_result.json|id-2-01g4he5cbqmdv7dnx46sebs0gt|
|id-01f7pqqbynh895ptpjjfxvk6dc|gs://bucket1/path/to/id-01g4he5cbx1kwhgvdme1s560dw/file_result.json|id-2-01g4he5cbx1kwhgvdme1s560dw|
+-----------------------------+-------------------------------------------------------------------+-------------------------------+

I would like to read each row from `file_path` and write the result to
another dataframe containing `entity_id`, `other_useful_id`,
`json_content`, `file_path`.
Assume that I already have the required HDFS url libraries in my classpath.

Please advice,
Muthu

Reply via email to