Hi, Tommy, Thanks for reporting this. Definitely we can be more defensive in coding here. I just wonder what's the specific reason for you to call RocksDB store close() explicitly? As you see that SamzaContainer#shutdownStores already calling flush() and close() automatically. Does it work for you if you remove the explicit store close() calls in your CloseableTask implementation?
Thanks! -Yi On Wed, Sep 14, 2016 at 7:56 AM, Tommy Becker <tobec...@tivo.com> wrote: > While testing with Samza 0.10.1 I noticed the following crash whenever I > would kill a job that uses a RocksDB store: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007eff66b6c27e, pid=20315, tid=139636974364416 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_51-b16) (build > 1.8.0_51-b16) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.51-b03 mixed mode > linux-amd64 compressed oops) > # Problematic frame: > # C [librocksdbjni2253915919401340417..so+0x11427e] > rocksdb_flush_helper(JNIEnv_*, rocksdb::DB*, rocksdb::FlushOptions const&, > rocksdb::ColumnFamilyHandle*)+0x1e > # > # Failed to write core dump. Core dumps have been disabled. To enable core > dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /home/tommy/projects/ffs/ffs-stream-jobs/target/ffs-stream-j > obs-8.1.4.0-SNAPSHOT-dist/ffs-stream-jobs/hs_err_pid20315.log > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > > I was able to tie this back to the RocksDB store being closed by both our > StreamTask and the SamzaContainer. We always close stores via > CloseableTask#close just for housekeeping purposes. Prior to this issue I > was not aware that this also happens automatically in > SamzaContainer#shutdownStores. When closed, KeyValueStorageEngine first > calls flush() on the underlying store and there is no guard to ensure that > close has not already been called. The flush() call on a closed DB is what > seems to cause the crash. Obviously RocksDB should handle this more > gracefully, but I wonder if a patch is warranted for Samza also. Thoughts? > > -- > Tommy Becker > Senior Software Engineer > > Digitalsmiths > A TiVo Company > > www.digitalsmiths.com<http://www.digitalsmiths.com> > tobec...@tivo.com<mailto:tobec...@tivo.com> > > ________________________________ > > This email and any attachments may contain confidential and privileged > material for the sole use of the intended recipient. Any review, copying, > or distribution of this email (or any attachments) by others is prohibited. > If you are not the intended recipient, please contact the sender > immediately and permanently delete this email and any attachments. No > employee or agent of TiVo Inc. is authorized to conclude any binding > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo > Inc. may only be made by a signed written agreement. >