Re: disk will be full soon, but delete file failed

Jason Joo Mon, 03 Sep 2018 23:28:14 -0700

hi, dominic

照这段代码看，如下条件会返回-1：


path == null
path == ""
path does not exist
totalSpace == 0

因为现场已经不在，推测更可能的原因是加粗的两个之一。

我们的节点之前出问题是因为业务端出了bug巨量的消息塞满了磁盘，停服删文件后不恢复继续报"full"（没留意是so reclaim space还是so 
mark disk 
full），删shm无效，之后重启解决，不过我们删完目录后是自行创建的commitlog跟consumequeue两个子目录，那么也可能是某些操作绕过了“自动创建”这个触发导致走了not
 exist分支，下次再出现我们再细查一下


best regards,

Jason

> On Sep 4, 2018, at 14:09, 亓杨 <[email protected]> wrote:
> 
> `2018-09-03 17:53:28 INFO StoreScheduledThread1 - physic disk maybe full 
> soon, so reclaim space, -1.0`
> 错误日志 so reclaim space, -1.0.
> -1.0 是通过如下代码取到的。
> public static double getDiskPartitionSpaceUsedPercent(final String path) {
>     if (null == path || path.isEmpty())
>         return -1;
> 
>     try {
>         File file = new File(path);
> 
>         if (!file.exists())
>             return -1;
> 
>         long totalSpace = file.getTotalSpace();
> 
>         if (totalSpace > 0) {
>             long freeSpace = file.getFreeSpace();
>             long usedSpace = totalSpace - freeSpace;
> 
>             return usedSpace / (double) totalSpace;
>         }
>     } catch (Exception e) {
>         return -1;
>     }
> 
>     return -1;
> }
> storePathCommitLog 的配置是方法的参数。我猜测是你配的路径可能启动RocketMQ的用户没有权限？
> 改后的配置 storePathCommitLog 为默认的 
> System.getProperty("user.home") + File.separator + "store"
>     + File.separator + "commitlog";
> 
> [email protected] <mailto:[email protected]> 
> <[email protected] <mailto:[email protected]>> 
> 于2018年9月4日周二 下午1:52写道：
> Hi, Jason
>    问题解决了，是由于以下几个配置项的影响。
> 发生问题的配置：
> rocketmqHome = /opt/data/rocketmq/
> storePathRootDir = /opt/data/rocketmq/data/
> storePathCommitLog=/opt/data/rocketmq/data/commitlog/
> storePathConsumerQueue=/opt/data/rocketmq/data/consumequeue/
> 解决问题的配置：
> rocketmqHome = /opt/data/rocketmq
> storePathRootDir = /opt/data/rocketmq/store
> 
> 但目前还不知道最终的原因，只是配置项改了就好了。
> 再次感谢您的回复和建议。
> Best regards.
> Will.
> 
> On 2018/09/04 02:02:18, Jason Joo <[email protected] <mailto:[email protected]>> 
> wrote: 
> > hi, will,
> > 
> > 以下两点需要注意：
> > 
> > 1. 检查配置文件命令参数是否正确
> > 2. 这种情况下启动前需要删除整个store数据目录，"refresh" boot
> > 
> > best regards,
> > 
> > Jason
> > 
> > > On Sep 4, 2018, at 09:59, [email protected] 
> > > <mailto:[email protected]> wrote:
> > > 
> > > Hi, Jason
> > >   十分抱歉还得打扰您，我把我们用的所有虚拟机全都重启了一次，然后将数据清理，重新启动集群，发现这个问题还是重在，我们用的是 
> > > 2m-2s-async的，只有其中一个主，刚启动起来没有 
> > > broker.log的日志（为空），只有store.log的日志，里面全部都是（2018-09-04 09:47:29 WARN 
> > > StoreScheduledThread1 - disk space will be full soon, but delete file 
> > > failed.
> > > 2018-09-04 09:47:39 WARN StoreScheduledThread1 - disk space will be full 
> > > soon, but delete file failed.
> > > 2018-09-04 09:47:49 WARN StoreScheduledThread1 - disk space will be full 
> > > soon, but delete file failed.）
> > > 而其他的broker确是正常的，感觉十分奇怪。
> > > 十分抱歉再次打扰。
> > > 
> > > Best regards.
> > > Will.
> > > 
> > > On 2018/09/03 11:58:48, Jason Joo <[email protected] 
> > > <mailto:[email protected]>> wrote: 
> > >> 对，物理重启
> > >> 
> > >> 之前强删数据的时候碰过一次，删了shm也不管用，当时没去细查具体机制，索性做了重启快速恢复服务节点。
> > >> 
> > >> best regards,
> > >> 
> > >> Jason
> > >> 
> > >>> On Sep 3, 2018, at 18:39, [email protected] 
> > >>> <mailto:[email protected]> wrote:
> > >>> 
> > >>> 刚想起来，我没有重启namesrv，但应该和namesrv无关吧，我只重启了broker并且清了数据。
> > >>> 
> > >>> On 2018/09/03 10:22:55, Jason Joo <[email protected] 
> > >>> <mailto:[email protected]>> wrote: 
> > >>>> 应该是某处缓存没清，一般发生在强停后强删数据文件之后，一般重启节点（系统）即可解决
> > >>>> 
> > >>>> best regards,
> > >>>> 
> > >>>> Jason
> > >>>> 
> > >>>>> On Sep 3, 2018, at 18:04, [email protected] 
> > >>>>> <mailto:[email protected]> wrote:
> > >>>>> 
> > >>>>> 
> > >>>>> 
> > >>>>> On 2018/09/03 10:01:33, [email protected] 
> > >>>>> <mailto:[email protected]> <[email protected] 
> > >>>>> <mailto:[email protected]>> wrote: 
> > >>>>>> 具体日志如下：
> > >>>>>> 2018-09-03 17:53:18 WARN StoreScheduledThread1 - disk space will be 
> > >>>>>> full soon, but delete file failed.
> > >>>>>> 2018-09-03 17:53:28 INFO StoreScheduledThread1 - physic disk maybe 
> > >>>>>> full soon, so reclaim space, -1.0
> > >>>>>> 2018-09-03 17:53:28 INFO StoreScheduledThread1 - begin to delete 
> > >>>>>> before 24 hours file. timeup: false spacefull: true 
> > >>>>>> manualDeleteFileSeveralTimes: 0 cleanAtOnce: false
> > >>>>>> INFO AllocateMappedFileService - /opt/push/rocketmq/store/commitlog 
> > >>>>>> mkdir OK
> > >>>>>> 2018-07-31 16:55:51 INFO StoreScheduledThread1 - logics disk maybe 
> > >>>>>> full soon, so reclaim space, -1.0
> > >>>>>> 2018-07-31 16:55:51 INFO StoreScheduledThread1 - begin to delete 
> > >>>>>> before 120 hours file. timeup: false spacefull: true 
> > >>>>>> manualDeleteFileSeveralTimes: 0 cleanAtOnce: false
> > >>>>>> 2018-07-31 16:55:51 WARN StoreScheduledThread1 - disk space will be 
> > >>>>>> full soon, but delete file failed.
> > >>>>>> 
> > >>>>>> 把所有集群停止后把所有数据都清掉，再启动集群，还是会报这个错误，这个具体可能是什么原因呢？大大们帮忙看一下。
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>> 
> > >>>> 
> > >> 
> > >> 
> > 
> >

Re: disk will be full soon, but delete file failed

Reply via email to