[ceph-users] Strange configuration with many SAN and few servers

2014-11-07 Thread Mario Giammarco
Hello, I need to build a ceph test lab. I have to do it with existing hardware. I have several iscsi and fibre channel san but few servers. Imagine I have: - 4 SAN with 1 lun on each san - 2 diskless (apart boot disk) servers I mount two luns on first server and two luns on second server. Then (I

Re: [ceph-users] Strange configuration with many SAN and few servers

2014-11-08 Thread Mario Giammarco
Gregory Farnum writes: > > > and then to "replace the server" you could hair mount the LUNs somewhere else and turn on the OSDs. You would need to set a few config options (like the one that automatically updates crush location on boot), but it shouldn't be too difficult. Thank you for your r

[ceph-users] Armel debian repository

2013-12-19 Thread Mario Giammarco
Hello, I would like to install ceph on a Netgear ReadyNAS 102. It is a debian wheezy based. I have tried to add ceph repository but nas is "armel" architecture and I see you provide a repo for "armhf" architecture. How can I solve this problem? Thanks, Mario _

Re: [ceph-users] Armel debian repository

2013-12-21 Thread Mario Giammarco
Mario Giammarco writes: > > Hello, > I would like to install ceph on a Netgear ReadyNAS 102. > It is a debian wheezy based. > I have tried to add ceph repository but nas is "armel" architecture and I > see you provide a repo for "armhf" architect

Re: [ceph-users] Armel debian repository

2013-12-22 Thread Mario Giammarco
Wido den Hollander writes: > > What version of ARM CPU is in the Netgear NAS? > > Since the packages are build for ARMv7 and for example don't work on a > RaspberryPi which is ARMv6. > > Another solution would be to build to packages manually for the Netgear NAS. It is a Marvell Armada 370

[ceph-users] Another cluster completely hang

2016-06-28 Thread Mario Giammarco
Hello, this is the second time that happens to me, I hope that someone can explain what I can do. Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2. One hdd goes down due to bad sectors. Ceph recovers but it ends with: cluster f2a8dd7d-949a-4a29-acab-11d4900249f4 health HEALT

Re: [ceph-users] Another cluster completely hang

2016-06-28 Thread Mario Giammarco
bic > IP-Interactive > > mailto:i...@ip-interactive.de > > Anschrift: > > IP Interactive UG ( haftungsbeschraenkt ) > Zum Sonnenberg 1-3 > 63571 Gelnhausen > > HRB 93402 beim Amtsgericht Hanau > Geschäftsführung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 &g

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
29 giu 2016 alle ore 08:02 Mario Giammarco < mgiamma...@gmail.com> ha scritto: > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash > rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool > stripe_width 0 >removed_snaps [1~3] >

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
t; you import the PG to temporary OSD (between steps 12 and 13). > > On 29.06.2016 09:09, Mario Giammarco wrote: > > Now I have also discovered that, by mistake, someone has put production > > data on a virtual machine of the cluster. I need that ceph starts I/O so > > I ca

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
I have searched google and I see that there is no official procedure. Il giorno mer 29 giu 2016 alle ore 09:43 Mario Giammarco < mgiamma...@gmail.com> ha scritto: > I have read many times the post "incomplete pgs, oh my" > I think my case is different. > The broken

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
pgs more than 300 but it is due to the fact that I had 11 hdds now only 10. I will add more hdds after I repair the pool 4) I have reduced the monitors to 3 Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer ha scritto: > > Hello, > > On Wed, 29 Jun 2016 06:02:59 + Mar

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
oing - > did you delete the disk from the crush map as well? > > Ceph waits by default 300 secs AFAIK to mark an OSD out after it will > start to recover. > > > On 29 Jun 2016, at 10:42, Mario Giammarco wrote: > > I thank you for your reply so I can add my experience: &g

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
t; > Oliver Dzombic > IP-Interactive > > mailto:i...@ip-interactive.de > > Anschrift: > > IP Interactive UG ( haftungsbeschraenkt ) > Zum Sonnenberg 1-3 > 63571 Gelnhausen > > HRB 93402 beim Amtsgericht Hanau > Geschäftsführung: Oliver Dzombic > > St

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
ore 11:16 Mario Giammarco < mgiamma...@gmail.com> ha scritto: > Infact I am worried because: > > 1) ceph is under proxmox, and proxmox may decide to reboot a server if it > is not responding > 2) probably a server was rebooted while ceph was reconstructing > 3) even using max=

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
u > Geschäftsführung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 29.06.2016 um 12:00 schrieb Mario Giammarco: > > Now the problem is that ceph has put out two disks because scrub has > > failed (I think it is not a disk fault but d

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
u > Geschäftsführung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 29.06.2016 um 12:33 schrieb Mario Giammarco: > > Thanks, > > I can put in osds but the do not stay in, and I am pretty sure that are > > not broken. > > > &g

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
dds in the cluster? Should I remove them from crush and start again? Can I tell ceph that they are not bad? Mario Il giorno mer 29 giu 2016 alle ore 15:34 Lionel Bouton < lionel+c...@bouton.name> ha scritto: > Hi, > > Le 29/06/2016 12:00, Mario Giammarco a écrit : > > Now t

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
can I find paid support? I mean someone that logs in to my cluster and tell cephs that all is active+clean Thanks, Mario Il giorno mer 29 giu 2016 alle ore 16:08 Mario Giammarco < mgiamma...@gmail.com> ha scritto: > This time at the end of recovery procedure you described it was like most

Re: [ceph-users] Questions about bluestore

2017-10-14 Thread Mario Giammarco
Nobody can help me? Il ven 6 ott 2017, 07:31 Mario Giammarco ha scritto: > Hello, > I am trying Ceph luminous with Bluestore. > > I create an osd: > > ceph-disk prepare --bluestore /dev/sdg --block.db /dev/sdf > > and I see that on ssd it creates a partition of onl

[ceph-users] PGs inconsistent, do I fear data loss?

2017-10-28 Thread Mario Giammarco
Hello, we recently upgraded two clusters to Ceph luminous with bluestore and we discovered that we have many more pgs in state active+clean+inconsistent. (Possible data damage, xx pgs inconsistent) This is probably due to checksums in bluestore that discover more errors. We have some pools with r

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-10-30 Thread Mario Giammarco
>In general you should find that clusters running bluestore are much more >effective about doing a repair automatically (because bluestore has >checksums on all data, it knows which object is correct!), but there are >still some situations where they won't. If that happens to you, I would not >f

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-10-30 Thread Mario Giammarco
>[Questions to the list] >How is it possible that the cluster cannot repair itself with ceph pg repair? >No good copies are remaining? >Cannot decide which copy is valid or up-to date? >If so, why not, when there is checksum, mtime for everything? >In this inconsistent state which object does th

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread Mario Giammarco
"num_objects_degraded": 917, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 917, "num_whiteouts": 0, "num_read&quo

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread Mario Giammarco
ing with size = 3, then you always have a majority of the OSDs > online receiving a write and they can both agree on the correct data to > give to the third when it comes back up. > > On Wed, Nov 1, 2017 at 3:31 AM Mario Giammarco > wrote: > >> Sure here it is ceph -s: >

[ceph-users] Bluestore compression statistics

2017-11-01 Thread Mario Giammarco
Hello, I have enabled bluestore compression, how can I get some statistics just to see if compression is really working? Thanks, Mario ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Moving bluestore WAL and DB after bluestore creation

2017-11-15 Thread Mario Giammarco
It seems it is not possible. I recreated the OSD 2017-11-12 17:44 GMT+01:00 Shawn Edwards : > I've created some Bluestore OSD with all data (wal, db, and data) all on > the same rotating disk. I would like to now move the wal and db onto an > nvme disk. Is that possible without re-creating the

Re: [ceph-users] Cache tiering on Erasure coded pools

2018-01-03 Thread Mario Giammarco
Nobody explains why, I will tell you from direct experience: the cache tier has a block size of several megabytes. So if you ask for one byte that is not in cache some megabytes are read from disk and, if cache is full, some other megabytes are written from cache to the EC pool. Il giorno gio 28 d

[ceph-users] How to really change public network in ceph

2018-02-19 Thread Mario Giammarco
Hello, I have a test proxmox/ceph cluster with four servers. I need to change the ceph public subnet from 10.1.0.0/24 to 10.1.5.0/24. I have read documentation and tutorials. The most critical part seems monitor map editing. But it seems to me that osds need to bind to new subnet too. I tried to pu

Re: [ceph-users] How to really change public network in ceph

2018-02-21 Thread Mario Giammarco
I try to ask a simpler question: when I change monitors network and the network of osds, how can monitors know the new addresses of osds? Thanks, Mario 2018-02-19 10:22 GMT+01:00 Mario Giammarco : > Hello, > I have a test proxmox/ceph cluster with four servers. > I need to change the ce

[ceph-users] Help: pool not responding

2016-02-14 Thread Mario Giammarco
Hello, I am using ceph hammer under proxmox. I have working cluster it is several month I am using it. For reasons yet to discover I am now in this situation: HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean; 7 requests are blocked > 32 sec; 1 osds have slow requests pg 0.

Re: [ceph-users] Help: pool not responding

2016-02-15 Thread Mario Giammarco
koukou73gr writes: > > Have you tried restarting osd.0 ? > Yes I have restarted all osds many times. Also launched repair and scrub. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-15 Thread Mario Giammarco
Karan Singh writes: > Agreed to Ferhat. > > Recheck your network ( bonds , interfaces , network switches , even cables ) I use gigabit ethernet, I am checking the network. But I am using another pool on the same cluster and it works perfectly: why? Thanks again, Mario _

Re: [ceph-users] Help: pool not responding

2016-02-16 Thread Mario Giammarco
Mark Nelson writes: > PGs are pool specific, so the other pool may be totally healthy while > the first is not. If it turns out it's a hardware problem, it's also > possible that the 2nd pool may not hit all of the same OSDs as the first > pool, especially if it has a low PG count. > Just

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco
Ferhat Ozkasgarli writes: > 1-) One of the OSD nodes has network problem. > 2-) Disk failure > 3-) Not enough resource for OSD nodes > 4-) Slow OSD Disks I have replaced cables and switches. I am sure that there are no network problems. Disks are SSHD and so they are fast. Nodes memory is empty

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco
Thank you for your time. Dimitar Boichev writes: > > I am sure that I speak for the majority of people reading this, when I say that I didn't get anything from your emails. > Could you provide more debug information ? > Like (but not limited to): > ceph -s > ceph health details > ceph osd tree

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco
Mario Giammarco writes: Sorry ceph health detail is: HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean pg 0.0 is stuck inactive for 4836623.776873, current state incomplete, last acting [0,1,3] pg 0.40 is stuck inactive for 2773379.028048, current state incomplete, last

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco
Oliver Dzombic writes: > > Hi, > > i dont know, but as it seems to me: > > incomplete = not enough data > > the only solution would be to drop it ( delete ) > > so the cluster get in active healthy state. > > How many copies do you do from each data ? > Do you mean dropping the pg not wo

[ceph-users] Fwd: Help: pool not responding

2016-03-02 Thread Mario Giammarco
Tried to set min_size=1 but unfortunately nothing has changed. Thanks for the idea. 2016-02-29 22:56 GMT+01:00 Lionel Bouton : > Le 29/02/2016 22:50, Shinobu Kinjo a écrit : > > the fact that they are optimized for benchmarks and certainly not > Ceph OSD usage patterns (with or without internal j

[ceph-users] Fwd: Help: pool not responding

2016-03-02 Thread Mario Giammarco
288 pgs, 4 pools, 391 GB data, 100 kobjects > > 1090 GB used, 4481 GB / 5571 GB avail > > 284 active+clean > > 4 incomplete > > Cheers, > S > > - Original Message - > From: "Mario Giammarco" >

Re: [ceph-users] Fwd: Help: pool not responding

2016-03-02 Thread Mario Giammarco
führung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 02.03.2016 um 17:45 schrieb Mario Giammarco: > > > > > > Here it is: > > > > cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca > > health HEALTH_WARN > >

Re: [ceph-users] Fwd: Help: pool not responding

2016-03-03 Thread Mario Giammarco
rung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 02.03.2016 um 18:28 schrieb Mario Giammarco: > > Thans for info even if it is a bad info. > > Anyway I am reading docs again and I do not see a way to delete PGs. > >

[ceph-users] R: Help: pool not responding

2016-03-03 Thread Mario Giammarco
“creating” state when you force create them. How did you restart ceph ? Mine were created fine after I restarted the monitor nodes after a minor version upgrade. Did you do it monitors firs, osds second, etc etc ….. Regards. On Mar 3, 2016, at 13:13, Mario Giammarco <mgia

Re: [ceph-users] Help: pool not responding

2016-03-04 Thread Mario Giammarco
.boichev.axsmarine > E-mail: dimitar.boic...@axsmarine.com > > On Mar 3, 2016, at 22:47, Mario Giammarco wrote: > > Uses init script to restart > > *Da: *Dimitar Boichev > *Inviato: *giovedì 3 marzo 2016 21:44 > *A: *Mario Giammarco > *Cc: *Oliver Dzombic; ceph-users@li

Re: [ceph-users] Help: pool not responding

2016-03-05 Thread Mario Giammarco
21:51 GMT+01:00 Dimitar Boichev : > But the whole cluster or what ? > > Regards. > > *Dimitar Boichev* > SysAdmin Team Lead > AXSMarine Sofia > Phone: +359 889 22 55 42 > Skype: dimitar.boichev.axsmarine > E-mail: dimitar.boic...@axsmarine.com > > On Mar 3, 20

Re: [ceph-users] [Help: pool not responding] Now osd crash

2016-03-08 Thread Mario Giammarco
[0x6a9170] 12: (OSD::init()+0xc84) [0x6ac204] 13: (main()+0x2839) [0x632459] 14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45] 15: /usr/bin/ceph-osd() [0x64c087] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. 2016-03-02 9:38 GMT+01:00 Mario Giammarco : > Her

[ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Mario Giammarco
Hello, I have a ceph installation in a proxmox cluster. Due to a temporary hardware glitch now I get this error on osd startup -6> 2018-11-26 18:02:33.179327 7fa1d784be00 0 osd.0 1033 crush map has > features 1009089991638532096, adjusting msgr requires for osds >-5> 2018-11-26 18:02:34.14308

Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Mario Giammarco
I have only that copy, it is a showroom system but someone put a production vm on it. Il giorno gio 29 nov 2018 alle ore 10:43 Wido den Hollander ha scritto: > > > On 11/29/18 10:28 AM, Mario Giammarco wrote: > > Hello, > > I have a ceph installation in a proxmox cluster. &

Re: [ceph-users] How to recover from corrupted RocksDb

2018-11-29 Thread Mario Giammarco
The only strange thing is that ceph-bluestore-tool says that repair was done, no errors are found and all is ok. I ask myself what really does that tool. Mario Il giorno gio 29 nov 2018 alle ore 11:03 Wido den Hollander ha scritto: > > > On 11/29/18 10:45 AM, Mario Giammarco wrote: &

[ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-10 Thread Mario Giammarco
Hi, I enabled telegraf and influx plugins for my ceph cluster. I would like to use influx/chronograf to detect anomalies: - osd down - monitor down - osd near full But it is very difficult/complicated to make simple queries because, for example I have osd up and osd total but not osd down metric.

Re: [ceph-users] Use telegraf/influx to detect problems is very difficult

2019-12-11 Thread Mario Giammarco
Miroslav replied better for us why "is not so simple" to use math. And osd down was the easiest. How can I calculate: - monitor down - osd near full ? I do not understand why ceph plugin cannot send to influx all the metrics it has, especially the most useful for creating alarms. Il giorno mer 1