Re: [ceph-users] Minimize data lost with PG incomplete

José M . Martín Wed, 01 Feb 2017 09:33:49 -0800

It doesn't matter anymore. MDS has crushed and it is stuck in rejoin
state in rejoin state.
Now, thinking in delete the pool and start again. Is it safe or
advisable use an erasured code pool for CephFS?


Thank you very much for your time. I like very much this software.

Cheers,
José

El 01/02/17 a las 14:29, José M. Martín escribió:
> Hi Maxime
>
> I have 3 of the original disks but I don't know which OSD correspond
> each one. Besides, I don't think I have enough technical skills to do
> that and I don't want to go worse...
> I'm trying to write a script that copy files from the damaged CephFS to
> a new location.
> Any help will be very gratefull
>
> José
>
>
> El 01/02/17 a las 07:56, Maxime Guyot escribió:
>> Hi José
>>
>> If you have some of the original OSDs (not zapped or erased) then you might 
>> be able to just re-add them to your cluster and have a happy cluster.
>> If you attempt the ceph_objectstore_tool –op export & import make sure to do 
>> it on a temporary OSD of weight 0 as recommended in the link provided.
>>
>> Either way and from what I can see inthe pg dump you provided, if you 
>> restore osd.0, osd.3, osd.20, osd.21 and osd.22 it should be enough to bring 
>> back the pg that are down.
>>
>> Cheers,
>>  
>> On 31/01/17 11:48, "ceph-users on behalf of José M. Martín" 
>> <ceph-users-boun...@lists.ceph.com on behalf of jmar...@onsager.ugr.es> 
>> wrote:
>>
>>     Any idea of how could I recover files from the filesystem mount?
>>     Doing a cp, it hungs when find a damaged file/folder. I would be happy
>>     getting no damaged files
>>     
>>     Thanks
>>     
>>     El 31/01/17 a las 11:19, José M. Martín escribió:
>>     > Thanks.
>>     > I just realized I keep some of the original OSD. If it contains some of
>>     > the incomplete PGs , would be possible to add then into the new disks?
>>     > Maybe following this steps? 
>> http://ceph.com/community/incomplete-pgs-oh-my/
>>     >
>>     > El 31/01/17 a las 10:44, Maxime Guyot escribió:
>>     >> Hi José,
>>     >>
>>     >> Too late, but you could have updated the CRUSHmap *before* moving the 
>> disks. Something like: “ceph osd crush set osd.0 0.90329 root=default 
>> rack=sala2.2  host=loki05” would move the osd.0 to loki05 and would trigger 
>> the appropriate PG movements before any physical move. Then the physical 
>> move is done as usual: set noout, stop osd, physically move, active osd, 
>> unnset noout.
>>     >>
>>     >> It’s a way to trigger the data movement overnight (maybe with a cron) 
>> and do the physical move at your own convenience in the morning.
>>     >>
>>     >> Cheers, 
>>     >> Maxime 
>>     >>
>>     >> On 31/01/17 10:35, "ceph-users on behalf of José M. Martín" 
>> <ceph-users-boun...@lists.ceph.com on behalf of jmar...@onsager.ugr.es> 
>> wrote:
>>     >>
>>     >>     Already min_size = 1
>>     >>     
>>     >>     Thanks,
>>     >>     Jose M. Martín
>>     >>     
>>     >>     El 31/01/17 a las 09:44, Henrik Korkuc escribió:
>>     >>     > I am not sure about "incomplete" part out of my head, but you 
>> can try
>>     >>     > setting min_size to 1 for pools toreactivate some PG, if they 
>> are
>>     >>     > down/inactive due to missing replicas.
>>     >>     >
>>     >>     > On 17-01-31 10:24, José M. Martín wrote:
>>     >>     >> # ceph -s
>>     >>     >>      cluster 29a91870-2ed2-40dc-969e-07b22f37928b
>>     >>     >>       health HEALTH_ERR
>>     >>     >>              clock skew detected on mon.loki04
>>     >>     >>              155 pgs are stuck inactive for more than 300 
>> seconds
>>     >>     >>              7 pgs backfill_toofull
>>     >>     >>              1028 pgs backfill_wait
>>     >>     >>              48 pgs backfilling
>>     >>     >>              892 pgs degraded
>>     >>     >>              20 pgs down
>>     >>     >>              153 pgs incomplete
>>     >>     >>              2 pgs peering
>>     >>     >>              155 pgs stuck inactive
>>     >>     >>              1077 pgs stuck unclean
>>     >>     >>              892 pgs undersized
>>     >>     >>              1471 requests are blocked > 32 sec
>>     >>     >>              recovery 3195781/36460868 objects degraded 
>> (8.765%)
>>     >>     >>              recovery 5079026/36460868 objects misplaced 
>> (13.930%)
>>     >>     >>              mds0: Behind on trimming (175/30)
>>     >>     >>              noscrub,nodeep-scrub flag(s) set
>>     >>     >>              Monitor clock skew detected
>>     >>     >>       monmap e5: 5 mons at
>>     >>     >> 
>> {loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
>>     >>     >>
>>     >>     >>              election epoch 4028, quorum 0,1,2,3,4
>>     >>     >> loki01,loki02,loki03,loki04,loki05
>>     >>     >>        fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
>>     >>     >>       osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
>>     >>     >>              flags noscrub,nodeep-scrub
>>     >>     >>        pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 
>> 17087 kobjects
>>     >>     >>              45892 GB used, 34024 GB / 79916 GB avail
>>     >>     >>              3195781/36460868 objects degraded (8.765%)
>>     >>     >>              5079026/36460868 objects misplaced (13.930%)
>>     >>     >>                  3640 active+clean
>>     >>     >>                   838 
>> active+undersized+degraded+remapped+wait_backfill
>>     >>     >>                   184 active+remapped+wait_backfill
>>     >>     >>                   134 incomplete
>>     >>     >>                    48 
>> active+undersized+degraded+remapped+backfilling
>>     >>     >>                    19 down+incomplete
>>     >>     >>                     6
>>     >>     >> 
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull
>>     >>     >>                     1 active+remapped+backfill_toofull
>>     >>     >>                     1 peering
>>     >>     >>                     1 down+peering
>>     >>     >> recovery io 93909 kB/s, 10 keys/s, 67 objects/s
>>     >>     >>
>>     >>     >>
>>     >>     >>
>>     >>     >> # ceph osd tree
>>     >>     >> ID  WEIGHT   TYPE NAME           UP/DOWN REWEIGHT 
>> PRIMARY-AFFINITY
>>     >>     >>   -1 77.22777 root default
>>     >>     >>   -9 27.14778     rack sala1
>>     >>     >>   -2  5.41974         host loki01
>>     >>     >>   14  0.90329             osd.14       up  1.00000          
>> 1.00000
>>     >>     >>   15  0.90329             osd.15       up  1.00000          
>> 1.00000
>>     >>     >>   16  0.90329             osd.16       up  1.00000          
>> 1.00000
>>     >>     >>   17  0.90329             osd.17       up  1.00000          
>> 1.00000
>>     >>     >>   18  0.90329             osd.18       up  1.00000          
>> 1.00000
>>     >>     >>   25  0.90329             osd.25       up  1.00000          
>> 1.00000
>>     >>     >>   -4  3.61316         host loki03
>>     >>     >>    0  0.90329             osd.0        up  1.00000          
>> 1.00000
>>     >>     >>    2  0.90329             osd.2        up  1.00000          
>> 1.00000
>>     >>     >>   20  0.90329             osd.20       up  1.00000          
>> 1.00000
>>     >>     >>   24  0.90329             osd.24       up  1.00000          
>> 1.00000
>>     >>     >>   -3  9.05714         host loki02
>>     >>     >>    1  0.90300             osd.1        up  0.90002          
>> 1.00000
>>     >>     >>   31  2.72198             osd.31       up  1.00000          
>> 1.00000
>>     >>     >>   29  0.90329             osd.29       up  1.00000          
>> 1.00000
>>     >>     >>   30  0.90329             osd.30       up  1.00000          
>> 1.00000
>>     >>     >>   33  0.90329             osd.33       up  1.00000          
>> 1.00000
>>     >>     >>   32  2.72229             osd.32       up  1.00000          
>> 1.00000
>>     >>     >>   -5  9.05774         host loki04
>>     >>     >>    3  0.90329             osd.3        up  1.00000          
>> 1.00000
>>     >>     >>   19  0.90329             osd.19       up  1.00000          
>> 1.00000
>>     >>     >>   21  0.90329             osd.21       up  1.00000          
>> 1.00000
>>     >>     >>   22  0.90329             osd.22       up  1.00000          
>> 1.00000
>>     >>     >>   23  2.72229             osd.23       up  1.00000          
>> 1.00000
>>     >>     >>   28  2.72229             osd.28       up  1.00000          
>> 1.00000
>>     >>     >> -10 24.61000     rack sala2.2
>>     >>     >>   -6 24.61000         host loki05
>>     >>     >>    5  2.73000             osd.5        up  1.00000          
>> 1.00000
>>     >>     >>    6  2.73000             osd.6        up  1.00000          
>> 1.00000
>>     >>     >>    9  2.73000             osd.9        up  1.00000          
>> 1.00000
>>     >>     >>   10  2.73000             osd.10       up  1.00000          
>> 1.00000
>>     >>     >>   11  2.73000             osd.11       up  1.00000          
>> 1.00000
>>     >>     >>   12  2.73000             osd.12       up  1.00000          
>> 1.00000
>>     >>     >>   13  2.73000             osd.13       up  1.00000          
>> 1.00000
>>     >>     >>    4  2.73000             osd.4        up  1.00000          
>> 1.00000
>>     >>     >>    8  2.73000             osd.8        up  1.00000          
>> 1.00000
>>     >>     >>    7  0.03999             osd.7        up  1.00000          
>> 1.00000
>>     >>     >> -12 25.46999     rack sala2.1
>>     >>     >> -11 25.46999         host loki06
>>     >>     >>   34  2.73000             osd.34       up  1.00000          
>> 1.00000
>>     >>     >>   35  2.73000             osd.35       up  1.00000          
>> 1.00000
>>     >>     >>   36  2.73000             osd.36       up  1.00000          
>> 1.00000
>>     >>     >>   37  2.73000             osd.37       up  1.00000          
>> 1.00000
>>     >>     >>   38  2.73000             osd.38       up  1.00000          
>> 1.00000
>>     >>     >>   39  2.73000             osd.39       up  1.00000          
>> 1.00000
>>     >>     >>   40  2.73000             osd.40       up  1.00000          
>> 1.00000
>>     >>     >>   43  2.73000             osd.43       up  1.00000          
>> 1.00000
>>     >>     >>   42  0.90999             osd.42       up  1.00000          
>> 1.00000
>>     >>     >>   41  2.71999             osd.41       up  1.00000          
>> 1.00000
>>     >>     >>
>>     >>     >>
>>     >>     >> # ceph pg dump
>>     >>     >> You can find it in this link:
>>     >>     >> http://ergodic.ugr.es/pgdumpoutput.txt
>>     >>     >>
>>     >>     >>
>>     >>     >> What I did:
>>     >>     >> My cluster is  heterogeneous, having old oss nodes with 1TB 
>> disks and
>>     >>     >> new ones with 3TB. I was having problems with balance, some 
>> 1TB osd got
>>     >>     >> nearly full meanwhile there was plenty of space in others. My 
>> plan was
>>     >>     >> changing some disks to another one biggers. I started the 
>> process with
>>     >>     >> no problems, changing one disk. Reweight to 0.0, wait for 
>> rebalance, and
>>     >>     >> removed.
>>     >>     >> After that, searching for my problem, I read about straw2. 
>> Then, I
>>     >>     >> changed the algorithm editing the crush map and some data 
>> movement did.
>>     >>     >> My setup was not optimal, I had the journal in the xfs 
>> filesystem, so I
>>     >>     >> decided to change it also. First, I did it slowly, disk by 
>> disk, but as
>>     >>     >> rebalance take much time and my group was pushing me to finish 
>> quickly,
>>     >>     >> I did
>>     >>     >> ceph osd out osd.id
>>     >>     >> ceph osd crush remove osd.id
>>     >>     >> ceph auth del osd.id
>>     >>     >> ceph osd rm id
>>     >>     >>
>>     >>     >> Then umount the disks, and using ceph-deploy add then again
>>     >>     >> ceph-deploy disk zap loki01:/dev/sda
>>     >>     >> ceph-deploy osd create loki01:/dev/sda
>>     >>     >>
>>     >>     >> For every disk in rack "sala1". First, I finished loki02. 
>> Then, I did
>>     >>     >> this steps en loki04, loki01 and loki03 at the same time.
>>     >>     >>
>>     >>     >> Thanks,
>>     >>     >> -- 
>>     >>     >> José M. Martín
>>     >>     >>
>>     >>     >>
>>     >>     >> El 31/01/17 a las 00:43, Shinobu Kinjo escribió:
>>     >>     >>> First off, the followings, please.
>>     >>     >>>
>>     >>     >>>   * ceph -s
>>     >>     >>>   * ceph osd tree
>>     >>     >>>   * ceph pg dump
>>     >>     >>>
>>     >>     >>> and
>>     >>     >>>
>>     >>     >>>   * what you actually did with exact commands.
>>     >>     >>>
>>     >>     >>> Regards,
>>     >>     >>>
>>     >>     >>> On Tue, Jan 31, 2017 at 6:10 AM, José M. Martín
>>     >>     >>> <jmar...@onsager.ugr.es> wrote:
>>     >>     >>>> Dear list,
>>     >>     >>>>
>>     >>     >>>> I'm having some big problems with my setup.
>>     >>     >>>>
>>     >>     >>>> I was trying to increase the global capacity by changing 
>> some osds by
>>     >>     >>>> bigger ones. I changed them without wait the rebalance 
>> process
>>     >>     >>>> finished,
>>     >>     >>>> thinking the replicas were saved in other buckets, but I 
>> found a
>>     >>     >>>> lot of
>>     >>     >>>> PGs incomplete, so replicas of a PG were placed in a same 
>> bucket. I
>>     >>     >>>> have
>>     >>     >>>> assumed I have lost data because I zapped the disks and used 
>> in
>>     >>     >>>> other tasks.
>>     >>     >>>>
>>     >>     >>>> My question is: what should I do to recover as much data as 
>> possible?
>>     >>     >>>> I'm using the filesystem and RBD.
>>     >>     >>>>
>>     >>     >>>> Thank you so much,
>>     >>     >>>>
>>     >>     >>>> -- 
>>     >>     >>>>
>>     >>     >>>> Jose M. Martín
>>     >>     >>>>
>>     >>     >>>>
>>     >>     >>>> _______________________________________________
>>     >>     >>>> ceph-users mailing list
>>     >>     >>>> ceph-users@lists.ceph.com
>>     >>     >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >>     >>
>>     >>     >>
>>     >>     >> _______________________________________________
>>     >>     >> ceph-users mailing list
>>     >>     >> ceph-users@lists.ceph.com
>>     >>     >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >>     >
>>     >>     >
>>     >>     > _______________________________________________
>>     >>     > ceph-users mailing list
>>     >>     > ceph-users@lists.ceph.com
>>     >>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >>     
>>     >>     
>>     >>     _______________________________________________
>>     >>     ceph-users mailing list
>>     >>     ceph-users@lists.ceph.com
>>     >>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     >>     
>>     >>
>>     >
>>     > _______________________________________________
>>     > ceph-users mailing list
>>     > ceph-users@lists.ceph.com
>>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     
>>     
>>     _______________________________________________
>>     ceph-users mailing list
>>     ceph-users@lists.ceph.com
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>     
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Minimize data lost with PG incomplete

Reply via email to