hi, dominic 照这段代码看,如下条件会返回-1:
path == null path == "" path does not exist totalSpace == 0 因为现场已经不在,推测更可能的原因是加粗的两个之一。 我们的节点之前出问题是因为业务端出了bug巨量的消息塞满了磁盘,停服删文件后不恢复继续报"full"(没留意是so reclaim space还是so mark disk full),删shm无效,之后重启解决,不过我们删完目录后是自行创建的commitlog跟consumequeue两个子目录,那么也可能是某些操作绕过了“自动创建”这个触发导致走了not exist分支,下次再出现我们再细查一下 best regards, Jason > On Sep 4, 2018, at 14:09, 亓杨 <[email protected]> wrote: > > `2018-09-03 17:53:28 INFO StoreScheduledThread1 - physic disk maybe full > soon, so reclaim space, -1.0` > 错误日志 so reclaim space, -1.0. > -1.0 是通过如下代码取到的。 > public static double getDiskPartitionSpaceUsedPercent(final String path) { > if (null == path || path.isEmpty()) > return -1; > > try { > File file = new File(path); > > if (!file.exists()) > return -1; > > long totalSpace = file.getTotalSpace(); > > if (totalSpace > 0) { > long freeSpace = file.getFreeSpace(); > long usedSpace = totalSpace - freeSpace; > > return usedSpace / (double) totalSpace; > } > } catch (Exception e) { > return -1; > } > > return -1; > } > storePathCommitLog 的配置是方法的参数。我猜测是你配的路径可能启动RocketMQ的用户没有权限? > 改后的配置 storePathCommitLog 为默认的 > System.getProperty("user.home") + File.separator + "store" > + File.separator + "commitlog"; > > [email protected] <mailto:[email protected]> > <[email protected] <mailto:[email protected]>> > 于2018年9月4日周二 下午1:52写道: > Hi, Jason > 问题解决了,是由于以下几个配置项的影响。 > 发生问题的配置: > rocketmqHome = /opt/data/rocketmq/ > storePathRootDir = /opt/data/rocketmq/data/ > storePathCommitLog=/opt/data/rocketmq/data/commitlog/ > storePathConsumerQueue=/opt/data/rocketmq/data/consumequeue/ > 解决问题的配置: > rocketmqHome = /opt/data/rocketmq > storePathRootDir = /opt/data/rocketmq/store > > 但目前还不知道最终的原因,只是配置项改了就好了。 > 再次感谢您的回复和建议。 > Best regards. > Will. > > On 2018/09/04 02:02:18, Jason Joo <[email protected] <mailto:[email protected]>> > wrote: > > hi, will, > > > > 以下两点需要注意: > > > > 1. 检查配置文件命令参数是否正确 > > 2. 这种情况下启动前需要删除整个store数据目录,"refresh" boot > > > > best regards, > > > > Jason > > > > > On Sep 4, 2018, at 09:59, [email protected] > > > <mailto:[email protected]> wrote: > > > > > > Hi, Jason > > > 十分抱歉还得打扰您,我把我们用的所有虚拟机全都重启了一次,然后将数据清理,重新启动集群,发现这个问题还是重在,我们用的是 > > > 2m-2s-async的,只有其中一个主,刚启动起来没有 > > > broker.log的日志(为空),只有store.log的日志,里面全部都是(2018-09-04 09:47:29 WARN > > > StoreScheduledThread1 - disk space will be full soon, but delete file > > > failed. > > > 2018-09-04 09:47:39 WARN StoreScheduledThread1 - disk space will be full > > > soon, but delete file failed. > > > 2018-09-04 09:47:49 WARN StoreScheduledThread1 - disk space will be full > > > soon, but delete file failed.) > > > 而其他的broker确是正常的,感觉十分奇怪。 > > > 十分抱歉再次打扰。 > > > > > > Best regards. > > > Will. > > > > > > On 2018/09/03 11:58:48, Jason Joo <[email protected] > > > <mailto:[email protected]>> wrote: > > >> 对,物理重启 > > >> > > >> 之前强删数据的时候碰过一次,删了shm也不管用,当时没去细查具体机制,索性做了重启快速恢复服务节点。 > > >> > > >> best regards, > > >> > > >> Jason > > >> > > >>> On Sep 3, 2018, at 18:39, [email protected] > > >>> <mailto:[email protected]> wrote: > > >>> > > >>> 刚想起来,我没有重启namesrv,但应该和namesrv无关吧,我只重启了broker并且清了数据。 > > >>> > > >>> On 2018/09/03 10:22:55, Jason Joo <[email protected] > > >>> <mailto:[email protected]>> wrote: > > >>>> 应该是某处缓存没清,一般发生在强停后强删数据文件之后,一般重启节点(系统)即可解决 > > >>>> > > >>>> best regards, > > >>>> > > >>>> Jason > > >>>> > > >>>>> On Sep 3, 2018, at 18:04, [email protected] > > >>>>> <mailto:[email protected]> wrote: > > >>>>> > > >>>>> > > >>>>> > > >>>>> On 2018/09/03 10:01:33, [email protected] > > >>>>> <mailto:[email protected]> <[email protected] > > >>>>> <mailto:[email protected]>> wrote: > > >>>>>> 具体日志如下: > > >>>>>> 2018-09-03 17:53:18 WARN StoreScheduledThread1 - disk space will be > > >>>>>> full soon, but delete file failed. > > >>>>>> 2018-09-03 17:53:28 INFO StoreScheduledThread1 - physic disk maybe > > >>>>>> full soon, so reclaim space, -1.0 > > >>>>>> 2018-09-03 17:53:28 INFO StoreScheduledThread1 - begin to delete > > >>>>>> before 24 hours file. timeup: false spacefull: true > > >>>>>> manualDeleteFileSeveralTimes: 0 cleanAtOnce: false > > >>>>>> INFO AllocateMappedFileService - /opt/push/rocketmq/store/commitlog > > >>>>>> mkdir OK > > >>>>>> 2018-07-31 16:55:51 INFO StoreScheduledThread1 - logics disk maybe > > >>>>>> full soon, so reclaim space, -1.0 > > >>>>>> 2018-07-31 16:55:51 INFO StoreScheduledThread1 - begin to delete > > >>>>>> before 120 hours file. timeup: false spacefull: true > > >>>>>> manualDeleteFileSeveralTimes: 0 cleanAtOnce: false > > >>>>>> 2018-07-31 16:55:51 WARN StoreScheduledThread1 - disk space will be > > >>>>>> full soon, but delete file failed. > > >>>>>> > > >>>>>> 把所有集群停止后把所有数据都清掉,再启动集群,还是会报这个错误,这个具体可能是什么原因呢?大大们帮忙看一下。 > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>> > > >>>> > > >> > > >> > > > >
