Hi,

I have a question regarding the initialization/finalization of the S3 
filesystem within the Arrow filesystem library.  Apologies if this question has 
been raised in the past; I did perform a search but that search didn't turn up 
anything.  I did read the thread that discussed the issue of init/finalize, 
though nothing I found made it clear when the addition of the finalize method 
surfaced.  I thought I read mention that it occurred around version 12.0.0, but 
not certain.  That's just a side note really, I am curious to know when it came 
about, because we had been using an old version of the libraries (8.0.0) and it 
didn't exist within that version.  But I digress.

So my issue and the question I have surrounds this notion of timing.  The 
aforementioned thread that I read made it clear that the init/finalize should 
take place at the beginning and the end of main():


// Snipped for brevity reasons
int main()
{
   // More snipping
    arrow::Status   initializeStatus = arrow::fs::InitializeS3( globalOptions );
...
   arrow::Status   finalizeStatus = arrow::fs::FinalizeS3();
} /* end of your main() entry point*/


The thread also made it clear that this bookended init/finalize should not 
occur within a class definition, most likely in the constructor/destructor 
respectively.

So OK.  While I am not familiar with the reason that this structure became "a 
thing" within the Arrow filesystem library, it is indeed that way now.  
Admittedly, I would like to know why this is being done in this fashion, but 
that is tangential to my issue.  Now for my question: this is all fine and well 
in the context of developing your own stand-alone program and such.  However, 
what happens when you live in an embedded world in which your code lies many 
layers below main() and you don't have access to main(), even if you wanted to 
follow the prescribed pattern?  I mean, we are expected to wind up and then 
down in an on-demand fashion, allocating and then freeing all resources 
respectively.  I pulled the init/finalize out to the outermost layer that I 
have any involvement with, yet I see the following error messages:


2024-11-26T04:55:10,917 DEBUG [00000007] () App.parquet - Could not create a 
AWS filesystem object
2024-11-26T04:55:10,917 DEBUG [00000007] () App.parquet - parquetFileReader): 
Exception exit, reason = Unable to create a file system object on AWS server: 
Invalid: S3 subsystem is finalized


This occurs because the first spool-up/spool-down worked successfully, but then 
when we are called sometime thereafter, the finalize method has already done 
its thing, thus we can't initialize again.  Obviously, I know why this is 
occurring, that is straightforward, I don't need an explanation for that.  The 
question is what can I do about this in my environment where no access to 
main() is available and we must exist/not-exist on-demand?  Surely I am not the 
only one in this development scenario who has been faced with this issue.  So 
what is the solution here?  Anyone else faced this?  Help?

Thanks,
Jerry

Reply via email to