[ 
https://issues.apache.org/jira/browse/ARROW-17894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647674#comment-17647674
 ] 

Carl Boettiger commented on ARROW-17894:
----------------------------------------

I no longer get the error when I use read_file.  Unfortunately I don't seem to 
get access to the bucket either:


{code:java}
bucket <- arrow::gs_bucket("neon-is-transition-output",
                 json_credentials = readr::read_file("service_account.json"))
bucket$ls(){code}
is just empty, no error, no files.  Not sure what is going wrong or how to 
debug...

Note that I can use this same service_account.json to list files in this same 
bucket using rclone just fine, and I can use the HMAC credentials to get 
arrow::s3_bucket() to successfully list the bucket contents too, so I don't 
think the problem is with my bucket or my service_account.json.  

Note that arrow::s3_bucket() strategy with HMAC credentials is not a viable 
work-around for google cloud since Google's implementation of S3 suffers from 
numerous documented deviations from the S3 standard that cause most operations 
to fail, eg see 
[https://gist.github.com/harshavardhana/5d4d7410afb02ec9837e2b4d82a932e7.]

So I'd really love native GCS support in arrow.  (The bucket indicated above 
contains data from the National Science Foundation's National Ecological 
Observatory Network, nearly half-billion taxpayer dollars going to create this 
data and they chose to host it on Google storage... so we'd love to be able to 
access it with an amazing tool like arrow ...)

> [R] Documentation for json_credentials is misleading
> ----------------------------------------------------
>
>                 Key: ARROW-17894
>                 URL: https://issues.apache.org/jira/browse/ARROW-17894
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 9.0.1
>            Reporter: Joran Elias
>            Priority: Major
>
> For authenticating with GCS via a JSON credentials file, the documentation 
> under ?FileSystem for GcsFileSystem$create() says:
>  
>  * {{{}json_credentials{}}}: optional string for authentication. Point to a 
> JSON credentials file downloaded from GCS.
>  
> Additionally, the GCS Authentication section of Working with Cloud Storage 
> (S3, GCS) in the file system vignette says:
> {quote}or {{{}json_credentials{}}}, to reference a downloaded credentials 
> file.
> {quote}
>  
> Both of these seem to imply that json_credentials expects a path to a JSON 
> credentials file downloaded from GCP. However, when a file path is provided 
> you get an invalid argument error:
>  
> {code:java}
> > bucket <- gs_bucket(bucket = 'pinned_data',json_credentials = json_path)
> > bucket$ls(recursive = TRUE)
> Error: Invalid: google::cloud::Status(INVALID_ARGUMENT: Permanent error in 
> ListObjects: Invalid ServiceAccountCredentials,parsing failed on data loaded 
> from memory). Detail: [errno 22] Invalid argument
> {code}
>  
> However, if you pass a string containing the raw JSON from the file itself, 
> the above code snippet works and returns the names of the objects in the 
> bucket.
> Both sections of the documentation should be clarified to explicitly say that 
> the argument expects the actual JSON rather than a file path to the JSON file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to