Re: [ceph-users] Ceph-Deploy error on 15/71 stage

Jones de Andrade Wed, 29 Aug 2018 08:57:01 -0700

Hi Eugen.

Sorry for the delay in answering.


Just looked in the /var/log/ceph/ directory. It only contains the following
files (for example on node01):

#######
# ls -lart
total 3864
-rw------- 1 ceph ceph     904 ago 24 13:11 ceph.audit.log-20180829.xz
drwxr-xr-x 1 root root     898 ago 28 10:07 ..
-rw-r--r-- 1 ceph ceph  189464 ago 28 23:59 ceph-mon.node01.log-20180829.xz
-rw------- 1 ceph ceph   24360 ago 28 23:59 ceph.log-20180829.xz
-rw-r--r-- 1 ceph ceph   48584 ago 29 00:00 ceph-mgr.node01.log-20180829.xz
-rw------- 1 ceph ceph       0 ago 29 00:00 ceph.audit.log
drwxrws--T 1 ceph ceph     352 ago 29 00:00 .
-rw-r--r-- 1 ceph ceph 1908122 ago 29 12:46 ceph-mon.node01.log
-rw------- 1 ceph ceph  175229 ago 29 12:48 ceph.log
-rw-r--r-- 1 ceph ceph 1599920 ago 29 12:49 ceph-mgr.node01.log
#######

So, it only contains logs concerning the node itself (is it correct? sincer
node01 is also the master, I was expecting it to have logs from the other
too) and, moreover, no ceph-osd* files. Also, I'm looking the logs I have
available, and nothing "shines out" (sorry for my poor english) as a
possible error.

Any suggestion on how to proceed?

Thanks a lot in advance,

Jones


On Mon, Aug 27, 2018 at 5:29 AM Eugen Block <ebl...@nde.ag> wrote:

> Hi Jones,
>
> all ceph logs are in the directory /var/log/ceph/, each daemon has its
> own log file, e.g. OSD logs are named ceph-osd.*.
>
> I haven't tried it but I don't think SUSE Enterprise Storage deploys
> OSDs on partitioned disks. Is there a way to attach a second disk to
> the OSD nodes, maybe via USB or something?
>
> Although this thread is ceph related it is referring to a specific
> product, so I would recommend to post your question in the SUSE forum
> [1].
>
> Regards,
> Eugen
>
> [1] https://forums.suse.com/forumdisplay.php?99-SUSE-Enterprise-Storage
>
> Zitat von Jones de Andrade <johanne...@gmail.com>:
>
> > Hi Eugen.
> >
> > Thanks for the suggestion. I'll look for the logs (since it's our first
> > attempt with ceph, I'll have to discover where they are, but no problem).
> >
> > One thing called my attention on your response however:
> >
> > I haven't made myself clear, but one of the failures we encountered were
> > that the files now containing:
> >
> > node02:
> >    ----------
> >    storage:
> >        ----------
> >        osds:
> >            ----------
> >            /dev/sda4:
> >                ----------
> >                format:
> >                    bluestore
> >                standalone:
> >                    True
> >
> > Were originally empty, and we filled them by hand following a model found
> > elsewhere on the web. It was necessary, so that we could continue, but
> the
> > model indicated that, for example, it should have the path for /dev/sda
> > here, not /dev/sda4. We chosen to include the specific partition
> > identification because we won't have dedicated disks here, rather just
> the
> > very same partition as all disks were partitioned exactly the same.
> >
> > While that was enough for the procedure to continue at that point, now I
> > wonder if it was the right call and, if it indeed was, if it was done
> > properly.  As such, I wonder: what you mean by "wipe" the partition here?
> > /dev/sda4 is created, but is both empty and unmounted: Should a different
> > operation be performed on it, should I remove it first, should I have
> > written the files above with only /dev/sda as target?
> >
> > I know that probably I wouldn't run in this issues with dedicated discks,
> > but unfortunately that is absolutely not an option.
> >
> > Thanks a lot in advance for any comments and/or extra suggestions.
> >
> > Sincerely yours,
> >
> > Jones
> >
> > On Sat, Aug 25, 2018 at 5:46 PM Eugen Block <ebl...@nde.ag> wrote:
> >
> >> Hi,
> >>
> >> take a look into the logs, they should point you in the right direction.
> >> Since the deployment stage fails at the OSD level, start with the OSD
> >> logs. Something's not right with the disks/partitions, did you wipe
> >> the partition from previous attempts?
> >>
> >> Regards,
> >> Eugen
> >>
> >> Zitat von Jones de Andrade <johanne...@gmail.com>:
> >>
> >>> (Please forgive my previous email: I was using another message and
> >>> completely forget to update the subject)
> >>>
> >>> Hi all.
> >>>
> >>> I'm new to ceph, and after having serious problems in ceph stages 0, 1
> >> and
> >>> 2 that I could solve myself, now it seems that I have hit a wall harder
> >>> than my head. :)
> >>>
> >>> When I run salt-run state.orch ceph.stage.deploy, i monitor I see it
> >> going
> >>> up to here:
> >>>
> >>> #######
> >>> [14/71]   ceph.sysctl on
> >>>           node01....................................... ✓ (0.5s)
> >>>           node02........................................ ✓ (0.7s)
> >>>           node03....................................... ✓ (0.6s)
> >>>           node04......................................... ✓ (0.5s)
> >>>           node05....................................... ✓ (0.6s)
> >>>           node06.......................................... ✓ (0.5s)
> >>>
> >>> [15/71]   ceph.osd on
> >>>           node01...................................... ❌ (0.7s)
> >>>           node02........................................ ❌ (0.7s)
> >>>           node03....................................... ❌ (0.7s)
> >>>           node04......................................... ❌ (0.6s)
> >>>           node05....................................... ❌ (0.6s)
> >>>           node06.......................................... ❌ (0.7s)
> >>>
> >>> Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s
> >>>
> >>> Failures summary:
> >>>
> >>> ceph.osd (/srv/salt/ceph/osd):
> >>>   node02:
> >>>     deploy OSDs: Module function osd.deploy threw an exception.
> >> Exception:
> >>> Mine on node02 for cephdisks.list
> >>>   node03:
> >>>     deploy OSDs: Module function osd.deploy threw an exception.
> >> Exception:
> >>> Mine on node03 for cephdisks.list
> >>>   node01:
> >>>     deploy OSDs: Module function osd.deploy threw an exception.
> >> Exception:
> >>> Mine on node01 for cephdisks.list
> >>>   node04:
> >>>     deploy OSDs: Module function osd.deploy threw an exception.
> >> Exception:
> >>> Mine on node04 for cephdisks.list
> >>>   node05:
> >>>     deploy OSDs: Module function osd.deploy threw an exception.
> >> Exception:
> >>> Mine on node05 for cephdisks.list
> >>>   node06:
> >>>     deploy OSDs: Module function osd.deploy threw an exception.
> >> Exception:
> >>> Mine on node06 for cephdisks.list
> >>> #######
> >>>
> >>> Since this is a first attempt in 6 simple test machines, we are going
> to
> >>> put the mon, osds, etc, in all nodes at first. Only the master is left
> >> in a
> >>> single machine (node01) by now.
> >>>
> >>> As they are simple machines, they have a single hdd, which is
> partitioned
> >>> as follows (the hda4 partition is unmounted and left for the ceph
> >> system):
> >>>
> >>> ###########
> >>> # lsblk
> >>> NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
> >>> sda      8:0    0 465,8G  0 disk
> >>> ├─sda1   8:1    0   500M  0 part /boot/efi
> >>> ├─sda2   8:2    0    16G  0 part [SWAP]
> >>> ├─sda3   8:3    0  49,3G  0 part /
> >>> └─sda4   8:4    0   400G  0 part
> >>> sr0     11:0    1   3,7G  0 rom
> >>>
> >>> # salt -I 'roles:storage' cephdisks.list
> >>> node01:
> >>> node02:
> >>> node03:
> >>> node04:
> >>> node05:
> >>> node06:
> >>>
> >>> # salt -I 'roles:storage' pillar.get ceph
> >>> node02:
> >>>     ----------
> >>>     storage:
> >>>         ----------
> >>>         osds:
> >>>             ----------
> >>>             /dev/sda4:
> >>>                 ----------
> >>>                 format:
> >>>                     bluestore
> >>>                 standalone:
> >>>                     True
> >>> (and so on for all 6 machines)
> >>> ##########
> >>>
> >>> Finally and just in case, my policy.cfg file reads:
> >>>
> >>> #########
> >>> #cluster-unassigned/cluster/*.sls
> >>> cluster-ceph/cluster/*.sls
> >>> profile-default/cluster/*.sls
> >>> profile-default/stack/default/ceph/minions/*yml
> >>> config/stack/default/global.yml
> >>> config/stack/default/ceph/cluster.yml
> >>> role-master/cluster/node01.sls
> >>> role-admin/cluster/*.sls
> >>> role-mon/cluster/*.sls
> >>> role-mgr/cluster/*.sls
> >>> role-mds/cluster/*.sls
> >>> role-ganesha/cluster/*.sls
> >>> role-client-nfs/cluster/*.sls
> >>> role-client-cephfs/cluster/*.sls
> >>> ##########
> >>>
> >>> Please, could someone help me and shed some light on this issue?
> >>>
> >>> Thanks a lot in advance,
> >>>
> >>> Regasrds,
> >>>
> >>> Jones
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph-Deploy error on 15/71 stage

Reply via email to