Hi Ciro, I debugged my program and found that it got stuck at 'for i,data in enumerate(dataloader)'. I did some research https://discuss.pytorch.org/t/dataloader-iteration-hang-up/12886. It said that the argument of Dataloader object --num_workers > 0 would specify how many python subprocesses to use for data loading. Meanwhile, I checked the back stack at which my program got stuck and it showed the multiprocessing package of python. I have tried many ways to ban the multiprocessing of python including set num_workers = 0 but it did not work. BTW, my program could work normally if I mount my disk image and chroot. I am really confused. Is it due to gem5 lack of enough support for OS? And I also don't know why I cannot ban the multiprocessing of python. Any advice would be highly appreciated.
Best regards, Yifan Song > -----原始邮件----- > 发件人: "syf1997--- via gem5-users" <gem5-users@gem5.org> > 发送时间: 2020-08-09 14:50:29 (星期日) > 收件人: "gem5 users mailing list" <gem5-users@gem5.org> > 抄送: syf1...@mail.ustc.edu.cn > 主题: [gem5-users] Re: gem5 aborted when increase mem-size in FS mode > > > A few seconds after I login using m5term. But when I changed a new kernel > vmlinux-5.2.4 it could work except got stuck at training section. I checked > it out by mounting the disk image on my own machine and I saw the warning > :cannot found /proc/cpuinfo. I think it is beacause I use AomaticCPU. But > when I used TimingSimpleCPU and /proc/cpuinfo did exit, my program still got > stuck. Could you give some suggestions? Thank you > > Best regards, > > Yifan Song > > > -----原始邮件----- > > 发件人: "Ciro Santilli via gem5-users" <gem5-users@gem5.org> > > 发送时间: 2020-08-06 15:11:57 (星期四) > > 收件人: "gem5 users mailing list" <gem5-users@gem5.org> > > 抄送: syf1...@mail.ustc.edu.cn, "Ciro Santilli" <ciro.santi...@gmail.com> > > 主题: [gem5-users] Re: gem5 aborted when increase mem-size in FS mode > > > > Does it crash immediately? If so, provide to us and look at the > > backtrace to try and determine which allocation fails. If that doesn't > > help, you can also try techniques mentioned at: > > https://stackoverflow.com/questions/6261201/how-to-find-memory-leak-in-a-c-code-project/57877190#57877190 > > > > On Thu, Aug 6, 2020 at 2:44 AM syf1997--- via gem5-users > > <gem5-users@gem5.org> wrote: > > > > > > Hi Jason, > > > > > > Thank you for your reply. I am using the latest version 20.3. I have just > > > double-checked the memory usage info of gem5. It showed that memory usage > > > is about 10GB, while I run gem5 on a 128GB memory server. I don't think > > > my system ran out of memory. I am confused about it. > > > > > > Yifan Song > > > > > > -----原始邮件----- > > > 发件人:"Jason Lowe-Power via gem5-users" <gem5-users@gem5.org> > > > 发送时间:2020-08-06 01:13:59 (星期四) > > > 收件人: "gem5 users mailing list" <gem5-users@gem5.org> > > > 抄送: syf1...@mail.ustc.edu.cn, "Jason Lowe-Power" <ja...@lowepower.com> > > > 主题: [gem5-users] Re: gem5 aborted when increase mem-size in FS mode > > > > > > Hi Yifan, > > > > > > Is it possible that your system is running out of memory? It's possible > > > this is a gem5 bug (what version are you using?), but I haven't heard of > > > this issue before. > > > > > > Cheers, > > > Jason > > > > > > On Wed, Aug 5, 2020 at 1:34 AM syf1997--- via gem5-users > > > <gem5-users@gem5.org> wrote: > > >> > > >> Hi all, > > >> > > >> I am trying to run my own program (training a tiny vgg16 CNN model using > > >> CPU) in FS mode. I have created my own disk image and installed python3 > > >> and pytorch on it. My command line is as bellow: > > >> > > >> build/X86/gem5.opt configs/example/fs.py --kernel=binary/vmlinux-5.2.4 > > >> --disk-image=disk/linux-x86.img -n 2 --caches --l2cache --mem-size=4GB > > >> > > >> If I set the mem-size small, like 512MB, 1GB, 2GB, my program could run > > >> normally but would fail due to the small memory. On the other hand, if I > > >> increase the mem-size, my program could not run and gem5 aborted (core > > >> dump). Could anyone give me some advice? Thanks. > > >> > > >> Best regards, > > >> > > >> Yifan > > >> > > >> _______________________________________________ > > >> gem5-users mailing list -- gem5-users@gem5.org > > >> To unsubscribe send an email to gem5-users-le...@gem5.org > > >> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > > > > > _______________________________________________ > > > gem5-users mailing list -- gem5-users@gem5.org > > > To unsubscribe send an email to gem5-users-le...@gem5.org > > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > > _______________________________________________ > > gem5-users mailing list -- gem5-users@gem5.org > > To unsubscribe send an email to gem5-users-le...@gem5.org > > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s > _______________________________________________ > gem5-users mailing list -- gem5-users@gem5.org > To unsubscribe send an email to gem5-users-le...@gem5.org > %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s