> Admittedly, I would like to know why this is being done in this fashion,
but that is tangential to my issue.

IIRC, this is a limitation given to use by the AWS C++ SDK.  See [1].  The
AWS C++ SDK has static state and they do not manage it with static local
variables.  As a result, the initialization and finalization order is
(IIRC) undefined (or at least not very well defined).

> Now for my question: this is all fine and well in the context of
developing your own stand-alone program and such.
> However, what happens when you live in an embedded world in which your
code lies many layers below main() and
> you don't have access to main(), even if you wanted to follow the
prescribed pattern?  I mean, we are expected to wind
> up and then down in an on-demand fashion, allocating and then freeing all
resources respectively.  I pulled the init/finalize
> out to the outermost layer that I have any involvement with, yet I see
the following error messages

I'm not familiar with embedded programming models.  Is there a main
somewhere?  If so, can you pass the responsibility onto your caller
(whomever has the main?)  Or does some kind of component-level
initialization exist?

If not, then you can try and play games with static variables, but I think
that would violate "freeing all resources respectively".  However,
Arrow-C++ itself has static state (e.g. CPU & I/O thread pools), so unless
you are unloading the library, it's not clear that you will be freeing all
resources anyways.

[1]
https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/basic-use.html

On Sun, Dec 1, 2024 at 1:02 PM Jerry Adair <jerry.ad...@sas.com.invalid>
wrote:

> Hi,
>
> I have a question regarding the initialization/finalization of the S3
> filesystem within the Arrow filesystem library.  Apologies if this question
> has been raised in the past; I did perform a search but that search didn't
> turn up anything.  I did read the thread that discussed the issue of
> init/finalize, though nothing I found made it clear when the addition of
> the finalize method surfaced.  I thought I read mention that it occurred
> around version 12.0.0, but not certain.  That's just a side note really, I
> am curious to know when it came about, because we had been using an old
> version of the libraries (8.0.0) and it didn't exist within that version.
> But I digress.
>
> So my issue and the question I have surrounds this notion of timing.  The
> aforementioned thread that I read made it clear that the init/finalize
> should take place at the beginning and the end of main():
>
>
> // Snipped for brevity reasons
> int main()
> {
>    // More snipping
>     arrow::Status   initializeStatus = arrow::fs::InitializeS3(
> globalOptions );
> ...
>    arrow::Status   finalizeStatus = arrow::fs::FinalizeS3();
> } /* end of your main() entry point*/
>
>
> The thread also made it clear that this bookended init/finalize should not
> occur within a class definition, most likely in the constructor/destructor
> respectively.
>
> So OK.  While I am not familiar with the reason that this structure became
> "a thing" within the Arrow filesystem library, it is indeed that way now.
> Admittedly, I would like to know why this is being done in this fashion,
> but that is tangential to my issue.  Now for my question: this is all fine
> and well in the context of developing your own stand-alone program and
> such.  However, what happens when you live in an embedded world in which
> your code lies many layers below main() and you don't have access to
> main(), even if you wanted to follow the prescribed pattern?  I mean, we
> are expected to wind up and then down in an on-demand fashion, allocating
> and then freeing all resources respectively.  I pulled the init/finalize
> out to the outermost layer that I have any involvement with, yet I see the
> following error messages:
>
>
> 2024-11-26T04:55:10,917 DEBUG [00000007] () App.parquet - Could not create
> a AWS filesystem object
> 2024-11-26T04:55:10,917 DEBUG [00000007] () App.parquet -
> parquetFileReader): Exception exit, reason = Unable to create a file system
> object on AWS server: Invalid: S3 subsystem is finalized
>
>
> This occurs because the first spool-up/spool-down worked successfully, but
> then when we are called sometime thereafter, the finalize method has
> already done its thing, thus we can't initialize again.  Obviously, I know
> why this is occurring, that is straightforward, I don't need an explanation
> for that.  The question is what can I do about this in my environment where
> no access to main() is available and we must exist/not-exist on-demand?
> Surely I am not the only one in this development scenario who has been
> faced with this issue.  So what is the solution here?  Anyone else faced
> this?  Help?
>
> Thanks,
> Jerry
>

Reply via email to