IIRC the strong consistency in S3 means that the data will be there immediately after a write instead of eventually there. Also, this is only with Amazon S3, not necessarily with other products that present an S3 compatible API.

Performing a sync on the WAL means that all data written to the file has been flushed to disk. The newer Hadoop S3 file system implementation emits a warning that it's not supported.

On Oct 12, 2022 8:25 PM, Christopher <ctubb...@apache.org> wrote:
Does S3 need sync if it's "strongly consistent"? Does "strongly" imply "immediate"? If so, you might just need to use a noop log closer. But I'm not sure.

On Wed, Oct 12, 2022, 20:05 <dlmarion@comcast.net> wrote:

I believe that S3 Guard is OBE, but you still need to put the WAL on HDFS as S3 does not support sync. If you put your WAL in S3, and you have a tserver failure, then it’s possible that you will lose data.

 

From: Christopher <ctubbsii@apache.org>
Sent: Wednesday, October 12, 2022 4:12 PM
To: user@accumulo.apache.org
Subject: Re: Accumulo On S3

 

Since S3 became strongly consistent, I think it would probably just work. But, obviously, we can't make any guarantees, especially about the behavior of software outside of our control. So, your experience may vary.

 

On Wed, Oct 12, 2022 at 12:28 PM Josh Clum <joshclum@gmail.com> wrote:

Hi,

 

Question on this post: https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html 

 

It's been a long time and it looks like several of the merge requests are merged into master. 

 

The point about S3 Guard being needed seems OBE since S3 is strongly consistent and S3 Guard is deprecated at this point: https://issues.apache.org/jira/browse/HADOOP-17480

 

Should Accumulo on S3 with the metadata table and WAL logs in S3 work or are there still commits somewhere that won't be available until 2.1.0?

 

Thanks,

Josh


Reply via email to