Flink’s version is hosted here: https://github.com/dataArtisans/frocksdb <https://github.com/dataArtisans/frocksdb>
> Am 26.05.2017 um 19:59 schrieb Jason Brelloch <jb.bc....@gmail.com>: > > Thanks for looking into this Stefan. We are moving forward with a different > strategy for now. If I want to take a look at this, where do I go to get the > Flink version of RocksDB? > > On Fri, May 26, 2017 at 1:06 PM, Stefan Richter <s.rich...@data-artisans.com > <mailto:s.rich...@data-artisans.com>> wrote: > I forgot to mention that you need to run this with Flink’s version of > RocksDB, as the stock version is already unable to perform the inserts > because their implementation of merge operator has a performance problem. > > Furthermore, I think a higher multiplicator than *2 is required on num > (and/or a smaller modulo on the key bytes) to trigger the problem; Noticed > that I ran it multiple times, so it added up to bigger sizes over the runs. > >> Am 26.05.2017 um 18:42 schrieb Stefan Richter <s.rich...@data-artisans.com >> <mailto:s.rich...@data-artisans.com>>: >> >> I played a bit around with your info and this looks now like a general >> problem in RocksDB to me. Or more specifically, between RocksDB and the JNI >> bridge. I could reproduce the issue with the following simple test code: >> >> File rocksDir = new File("/tmp/rocks"); >> final Options options = new Options() >> .setCreateIfMissing(true) >> .setMergeOperator(new StringAppendOperator()) >> .setCompactionStyle(CompactionStyle.LEVEL) >> .setLevelCompactionDynamicLevelBytes(true) >> .setIncreaseParallelism(4) >> .setUseFsync(false) >> .setMaxOpenFiles(-1) >> .setAllowOsBuffer(true) >> .setDisableDataSync(true); >> >> final WriteOptions write_options = new WriteOptions() >> .setSync(false) >> .setDisableWAL(true); >> >> try (final RocksDB rocksDB = RocksDB.open(options, >> rocksDir.getAbsolutePath())) { >> final String key = "key"; >> final String value = >> "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ7890654321"; >> >> byte[] keyBytes = key.getBytes(StandardCharsets.UTF_8); >> keyBytes = Arrays.copyOf(keyBytes, keyBytes.length + 1); >> final byte[] valueBytes = value.getBytes(StandardCharsets.UTF_8); >> final int num = (Integer.MAX_VALUE / valueBytes.length) * 2; >> >> System.out.println("begin insert"); >> >> final long beginInsert = System.nanoTime(); >> for (int i = 0; i < num; i++) { >> keyBytes[keyBytes.length - 1] = (byte) (i % 9); >> rocksDB.merge(write_options, keyBytes, valueBytes); >> } >> final long endInsert = System.nanoTime(); >> System.out.println("end insert - duration: " + ((endInsert - beginInsert) >> / 1_000_000) + " ms"); >> >> final long beginGet = System.nanoTime(); >> try (RocksIterator iterator = rocksDB.newIterator()) { >> iterator.seekToFirst(); >> >> while (iterator.isValid()) { >> iterator.next(); >> byte[] bytes = iterator.value(); >> System.out.println(bytes.length + " " + bytes[bytes.length - 1]); >> } >> } >> final long endGet = System.nanoTime(); >> >> System.out.println("end get - duration: " + ((endGet - beginGet) / >> 1_000_000) + " ms"); >> } >> >> Depending on how smooth the 1.3 release is going, maybe I find some time >> next week to take a closer look into this. If this is urgent, please also >> feel free to already report this problem to the RocksDB issue tracker. >> >> Best, >> Stefan >> >>> Am 26.05.2017 um 16:40 schrieb Jason Brelloch <jb.bc....@gmail.com >>> <mailto:jb.bc....@gmail.com>>: >>> >>> ~2 GB was the total state in the backend. The total number of keys in the >>> test is 10 with an approximately even distribution of state across keys, >>> and parallelism of 1 so all keys are on the same taskmanager. We are using >>> ListState and the number of elements per list would be about 500000. >>> >>> On Fri, May 26, 2017 at 10:20 AM, Stefan Richter >>> <s.rich...@data-artisans.com <mailto:s.rich...@data-artisans.com>> wrote: >>> Hi, >>> >>> what means „our state“ in this context? The total state in the backend or >>> the state under one key? If you use, e.g. list state, I could see that the >>> state for one key can grow above 2GB, but once we retrieve the state back >>> from RocksDB as Java arrays (in your stacktrace, when making a checkpoint), >>> which are bounded in size to a maximum of 2GB (Integer.MAX_VALUE) and maybe >>> that is what happens in JNI if you try to go beyond that limit. Could that >>> be a reason for your problem? >>> >>>> Am 26.05.2017 um 15:50 schrieb Robert Metzger <rmetz...@apache.org >>>> <mailto:rmetz...@apache.org>>: >>>> >>>> Hi Jason, >>>> >>>> This error is unexpected. I don't think its caused by insufficient memory. >>>> I'm including Stefan into the conversation, he's the RocksDB expert :) >>>> >>>> On Thu, May 25, 2017 at 4:15 PM, Jason Brelloch <jb.bc....@gmail.com >>>> <mailto:jb.bc....@gmail.com>> wrote: >>>> Hey guys, >>>> >>>> We are running into a JVM crash on checkpointing when our rocksDB state >>>> reaches a certain size on a taskmanager (about 2GB). The issue happens >>>> with both a hadoop backend and just writing to a local file. >>>> >>>> We are running on Flink 1.2.1. >>>> >>>> # >>>> # A fatal error has been detected by the Java Runtime Environment: >>>> # >>>> # SIGSEGV (0xb) at pc=0x00007febf4261b42, pid=1, tid=0x00007fead135f700 >>>> # >>>> # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build >>>> 1.8.0_131-b11) >>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode >>>> linux-amd64 compressed oops) >>>> # Problematic frame: >>>> # V [libjvm.so+0x6d1b42] jni_SetByteArrayRegion+0xc2 >>>> # >>>> # Core dump written. Default location: //core or core.1 >>>> # >>>> # An error report file with more information is saved as: >>>> # /tmp/hs_err_pid1.log >>>> # >>>> # If you would like to submit a bug report, please visit: >>>> # http://bugreport.java.com/bugreport/crash.jsp >>>> <http://bugreport.java.com/bugreport/crash.jsp> >>>> # >>>> >>>> Is this an issue with not enough memory? Or maybe not enough allocated to >>>> rocksDB? >>>> >>>> I have attached the taskmanager logs, and the core dump. The jobmanager >>>> logs just say taskmanger lost/killed. >>>> >>>> Thanks! >>>> >>>> -- >>>> Jason Brelloch | Product Developer >>>> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305 >>>> <http://www.bettercloud.com/> >>>> Subscribe to the BetterCloud Monitor >>>> <https://www.bettercloud.com/monitor?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=monitor_launch> >>>> - Get IT delivered to your inbox >>>> >>> >>> >>> >>> >>> -- >>> Jason Brelloch | Product Developer >>> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305 >>> <http://www.bettercloud.com/> >>> Subscribe to the BetterCloud Monitor >>> <https://www.bettercloud.com/monitor?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=monitor_launch> >>> - Get IT delivered to your inbox >> > > > > > -- > Jason Brelloch | Product Developer > 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305 > <http://www.bettercloud.com/> > Subscribe to the BetterCloud Monitor > <https://www.bettercloud.com/monitor?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=monitor_launch> > - Get IT delivered to your inbox