Re: [ceph-users] v10.1.2 Jewel release candidate release

2016-04-14 Thread Vincenzo Pii

> On 14 Apr 2016, at 00:09, Gregory Farnum  wrote:
> 
> On Wed, Apr 13, 2016 at 3:02 PM, Sage Weil  wrote:
>> Hi everyone,
>> 
>> The third (and likely final) Jewel release candidate is out.  We have a
>> very small number of remaining blocker issues and a bit of final polish
>> before we publish Jewel 10.2.0, probably next week.
>> 
>> There are no known issues with this release that are serious enough to
>> warn about here.  Greg is adding some CephFS checks so that admins don't
>> accidentally start using less-stable features,
> 
> s/is adding/has added/
> 
>>http://docs.ceph.com/docs/master/release-notes/
> 
> As noted in another thread, there's still a big CephFS warning in the
> online docs. We'll be cleaning those up, since we now have the
> recovery tools we desire! Some things are known to still be slow or
> sub-optimal, but we consider CephFS stable and safe at this time when
> run in the default single-MDS configuration. (It won't let you do
> anything bad without very explicitly setting flags and acknowledging
> they're dangerous.)
> :)
> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Greg,

A clarification:

When you say that things will be safe in “single-MDS” configuration, do you 
also exclude the HA setup with one active MDS and some passive (standby) ones? 
Or this would be safe as well?

Vincenzo Pii | TERALYTICS
DevOps Engineer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Antw: Re: Deprecating ext4 support

2016-04-14 Thread Steffen Weißgerber


>>> Christian Balzer  schrieb am Dienstag, 12. April 2016 um 
>>> 01:39:

> Hello,
> 

Hi,

> I'm officially only allowed to do (preventative) maintenance during weekend
> nights on our main production cluster. 
> That would mean 13 ruined weekends at the realistic rate of 1 OSD per
> night, so you can see where my lack of enthusiasm for OSD recreation comes
> from.
> 

Wondering extremely about that. We introduced ceph for VM's on RBD to not
have to move maintenance time to night shift.

My understanding of ceph is that it was also made as reliable storage in case
of hardware failure.

So what's the difference between maintain an osd and it's failure in effect for
the end user? In both cases it should be none.

Maintaining OSD's should be routine so that you're confident that your 
application
stays save while hardware fails in a amount one configured unused reserve.

In the end what happens to your cluster, when a complete node fails?

Regards

Steffen

> 
> Christian
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/ 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Official website of the developer mailing list address is wrong

2016-04-14 Thread m13913886148
Official website of the developer mailing list (ceph-devel) address is wrong, 
Who can give me a correct address to subscribe . thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Auth capability required to run ceph daemon commands

2016-04-14 Thread Sergio A. de Carvalho Jr.
Hi,

Does anybody know what auth capabilities are required to run commands such
as:

ceph daemon osd.0 perf dump

Even with the client.admin user, I can't get it to work:

$ ceph daemon osd.0 perf dump --name client.admin
--keyring=/etc/ceph/ceph.client.admin.keyring
{}

$ ceph auth get client.admin
exported keyring for client.admin
[client.admin]
key = **
caps mds = "allow"
caps mon = "allow *"
caps osd = "allow *"

The only way I can run that command is with sudo:

$ sudo ceph daemon osd.0 perf dump
{
"WBThrottle": {
...
}

I'm using Ceph 0.9.4 on CentOS 6.5.

Thanks,

Sergio
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v10.1.2 Jewel release candidate release

2016-04-14 Thread John Spray
On Thu, Apr 14, 2016 at 8:31 AM, Vincenzo Pii
 wrote:
>
> On 14 Apr 2016, at 00:09, Gregory Farnum  wrote:
>
> On Wed, Apr 13, 2016 at 3:02 PM, Sage Weil  wrote:
>
> Hi everyone,
>
> The third (and likely final) Jewel release candidate is out.  We have a
> very small number of remaining blocker issues and a bit of final polish
> before we publish Jewel 10.2.0, probably next week.
>
> There are no known issues with this release that are serious enough to
> warn about here.  Greg is adding some CephFS checks so that admins don't
> accidentally start using less-stable features,
>
>
> s/is adding/has added/
>
>http://docs.ceph.com/docs/master/release-notes/
>
>
> As noted in another thread, there's still a big CephFS warning in the
> online docs. We'll be cleaning those up, since we now have the
> recovery tools we desire! Some things are known to still be slow or
> sub-optimal, but we consider CephFS stable and safe at this time when
> run in the default single-MDS configuration. (It won't let you do
> anything bad without very explicitly setting flags and acknowledging
> they're dangerous.)
> :)
> -Greg
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> Hi Greg,
>
> A clarification:
>
> When you say that things will be safe in “single-MDS” configuration, do you
> also exclude the HA setup with one active MDS and some passive (standby)
> ones? Or this would be safe as well?

Yes, having standbys is fine (including "standby replay" daemons).  We
should really say "single active MDS configuration", but it's a bit of
a mouthful!

John

>
> Vincenzo Pii | TERALYTICS
> DevOps Engineer
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v10.1.2 Jewel release candidate release

2016-04-14 Thread Shinobu Kinjo
On Thu, Apr 14, 2016 at 7:32 PM, John Spray  wrote:
> On Thu, Apr 14, 2016 at 8:31 AM, Vincenzo Pii
>  wrote:
>>
>> On 14 Apr 2016, at 00:09, Gregory Farnum  wrote:
>>
>> On Wed, Apr 13, 2016 at 3:02 PM, Sage Weil  wrote:
>>
>> Hi everyone,
>>
>> The third (and likely final) Jewel release candidate is out.  We have a
>> very small number of remaining blocker issues and a bit of final polish
>> before we publish Jewel 10.2.0, probably next week.
>>
>> There are no known issues with this release that are serious enough to
>> warn about here.  Greg is adding some CephFS checks so that admins don't
>> accidentally start using less-stable features,
>>
>>
>> s/is adding/has added/
>>
>>http://docs.ceph.com/docs/master/release-notes/
>>
>>
>> As noted in another thread, there's still a big CephFS warning in the
>> online docs. We'll be cleaning those up, since we now have the
>> recovery tools we desire! Some things are known to still be slow or
>> sub-optimal, but we consider CephFS stable and safe at this time when
>> run in the default single-MDS configuration. (It won't let you do
>> anything bad without very explicitly setting flags and acknowledging
>> they're dangerous.)
>> :)
>> -Greg
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> Hi Greg,
>>
>> A clarification:
>>
>> When you say that things will be safe in “single-MDS” configuration, do you
>> also exclude the HA setup with one active MDS and some passive (standby)
>> ones? Or this would be safe as well?
>
> Yes, having standbys is fine (including "standby replay" daemons).  We

> should really say "single active MDS configuration", but it's a bit of

Pretty good description.

> a mouthful!
>
> John
>
>>
>> Vincenzo Pii | TERALYTICS
>> DevOps Engineer
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Email:
shin...@linux.com
GitHub:
shinobu-x
Blog:
Life with Distributed Computational System based on OpenSource
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Auth capability required to run ceph daemon commands

2016-04-14 Thread John Spray
On Thu, Apr 14, 2016 at 11:17 AM, Sergio A. de Carvalho Jr.
 wrote:
> Hi,
>
> Does anybody know what auth capabilities are required to run commands such
> as:

When you're doing "ceph daemon", no ceph authentication is happening:
this is a local connection to a UNIX socket in /var/run/ceph.  So this
just depends on the user you're running as having the right file
permissions on the socket file.

Cheers,
John

> ceph daemon osd.0 perf dump
>
> Even with the client.admin user, I can't get it to work:
>
> $ ceph daemon osd.0 perf dump --name client.admin
> --keyring=/etc/ceph/ceph.client.admin.keyring
> {}
>
> $ ceph auth get client.admin
> exported keyring for client.admin
> [client.admin]
> key = **
> caps mds = "allow"
> caps mon = "allow *"
> caps osd = "allow *"
>
> The only way I can run that command is with sudo:
>
> $ sudo ceph daemon osd.0 perf dump
> {
> "WBThrottle": {
> ...
> }
>
> I'm using Ceph 0.9.4 on CentOS 6.5.
>
> Thanks,
>
> Sergio
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] remote logging

2016-04-14 Thread Steffen Weißgerber
Hello,

I tried to configure ceph logging to a remote syslog host based on
Sebastian Han's Blog 
(http://www.sebastien-han.fr/blog/2013/01/07/logging-in-ceph/):

ceph.conf

[global]
...
log_file = none
log_to_syslog = true
err_to_syslog = true

[mon]
mon_cluster_log_to_syslog = true
mon_cluster_log_file = none

The remote logging works fine but nevertheless i find local logging in the file 
/none.

That means:

monitor and osd logging on mon hosts

2016-04-14 14:40:46.379376 mon.0 2.1.1.92:6789/0 2643 : cluster [INF] pgmap 
v39837493: 4624 pgs: 4624 active+clean; 22634 GB data, 68107 GB used, 122 TB 
/ 189 TB avail; 36972 kB/s rd, 6561 kB/s wr, 1733 op/s
2016-04-14 14:40:48.824882 7f21de1e9700  0 -- 2.1.1.138:6812/8489 >> 
2.1.106.116:0/1407816754 pipe(0x1b69e000 sd=182 :6812 s=0 pgs=0 cs=0 l=0 
c=0x1bb5a2c
0).accept peer addr is really 2.1.106.116:0/1407816754 (socket is 
2.1.106.116:60963/0)
2016-04-14 14:40:47.460665 mon.0 2.1.1.92:6789/0 2644 : cluster [INF] pgmap 
v39837494: 4624 pgs: 4624 active+clean; 22634 GB data, 68107 GB used, 122 TB 
/ 189 TB avail; 34412 kB/s rd, 8085 kB/s wr, 1762 op/s

and osd logging on non mon hosts.

I configured this on Giant and now migrated to Hammer (based on Ubuntu 14.04.4 
LTS) without change.

What I'm doing wrong?

Thanks in advance.


Regards

Steffen



-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Advice on OSD upgrades

2016-04-14 Thread Stephen Mercier
Good morning,

We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for the past 
20 months. We're very happy with our experience with the platform so far.

Shortly, we will be embarking on an initiative to replace all 88 OSDs with new 
drives (Planned maintenance and lifecycle replacement). Before we do so, 
however, I wanted to confirm with the community as to the proper order of 
operation to perform such a task.

The OSDs are divided evenly across an even number of hosts which are then 
divided evenly between 2 cabinets in 2 physically separate locations. The plan 
is to replace the OSDs, one host at a time, cycling back and forth between 
cabinets, replacing one host per week, or every 2 weeks (Depending on the 
amount of time the crush rebalancing takes).

For each host, the plan was to mark the OSDs as out, one at a time, closely 
monitoring each of them, moving to the next OSD one the current one is balanced 
out. Once all OSDs are successfully marked as out, we will then delete those 
OSDs from the cluster, shutdown the server, replace the physical drives, and 
once rebooted, add the new drives to the cluster as new OSDs using the same 
method we've used previously, doing so one at a time to allow for rebalancing 
as they rejoin the cluster.

My questions are…Does this process sound correct? Should I also mark the OSDs 
as down when I mark them as out? Are there any steps I'm overlooking in this 
process?

Any advice is greatly appreciated.

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727
stephen.merc...@attainia.com | www.attainia.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Advice on OSD upgrades

2016-04-14 Thread koukou73gr
If you have empty drive slots in your OSD hosts, I'd be tempted to
insert new drive in slot, set noout, shutdown one OSD, unmount OSD
directory, dd the old drive to the new one, remove old drive, restart OSD.

No rebalancing and minimal data movment when the OSD rejoins.

-K.

On 04/14/2016 04:29 PM, Stephen Mercier wrote:
> Good morning,
> 
> We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for
> the past 20 months. We're very happy with our experience with the
> platform so far.
> 
> Shortly, we will be embarking on an initiative to replace all 88 OSDs
> with new drives (Planned maintenance and lifecycle replacement). Before
> we do so, however, I wanted to confirm with the community as to the
> proper order of operation to perform such a task.
> 
> The OSDs are divided evenly across an even number of hosts which are
> then divided evenly between 2 cabinets in 2 physically separate
> locations. The plan is to replace the OSDs, one host at a time, cycling
> back and forth between cabinets, replacing one host per week, or every 2
> weeks (Depending on the amount of time the crush rebalancing takes).
> 
> For each host, the plan was to mark the OSDs as out, one at a time,
> closely monitoring each of them, moving to the next OSD one the current
> one is balanced out. Once all OSDs are successfully marked as out, we
> will then delete those OSDs from the cluster, shutdown the server,
> replace the physical drives, and once rebooted, add the new drives to
> the cluster as new OSDs using the same method we've used previously,
> doing so one at a time to allow for rebalancing as they rejoin the cluster.
> 
> My questions are…Does this process sound correct? Should I also mark the
> OSDs as down when I mark them as out? Are there any steps I'm
> overlooking in this process?
> 
> Any advice is greatly appreciated.
> 
> Cheers,
> -
> Stephen Mercier | Sr. Systems Architect
> Attainia Capital Planning Solutions (ACPS)
> O: (650)241-0567, 727 | TF: (866)288-2464, 727
> stephen.merc...@attainia.com  |
> www.attainia.com 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Official website of the developer mailing list address is wrong

2016-04-14 Thread Gregory Farnum
ceph-devel is hosted at vger.kernel.com rather than ceph.com. This is
unlike the other mailing lists, but all the addresses related to it on
the site look correct. eg
http://vger.kernel.org/vger-lists.html#ceph-devel
-Greg

On Thu, Apr 14, 2016 at 2:56 AM,   wrote:
> Official website of the developer mailing list (ceph-devel) address is
> wrong, Who can give me a correct address to subscribe . thanks!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Antw: Advice on OSD upgrades

2016-04-14 Thread Steffen Weißgerber
Hi,

that's how I did it for my osd's 25 to 30 (you can add as much as osd
numbers you like as long
you have free space).

First you can reweight the osd's to 0 to move their copies to other
osd's

for i in {25..30};
do
  ceph osd crush reweight osd.$i
done

and have to wait until it's done (when cluster health is ok again). 

Then you can remove the osd's from the cluster:

for i in {25..30};
do
  ceph osd out osd.$i && stop ceph-osd id=$i && ceph osd crush remove
osd.$i && ceph auth del osd.$i && ceph osd rm osd.$i;
done

Then you can remove the disks from the system:

echo 1 > /sys/block/sd/device/delete

where sd is the scsi-device name for the osd's (you can find from
/proc/partitions).

Then you can remove the disk physically (if hotplug is available).

After inserting new disks create the new osd's with ceph-deploy.

Regards

Steffen


>>> Stephen Mercier  schrieb am
Donnerstag, 14. April
2016 um 15:29:
> Good morning,
> 
> We've been running a medium-sized (88 OSDs - all SSD) ceph cluster
for the 
> past 20 months. We're very happy with our experience with the
platform so 
> far.
> 
> Shortly, we will be embarking on an initiative to replace all 88 OSDs
with 
> new drives (Planned maintenance and lifecycle replacement). Before we
do so, 
> however, I wanted to confirm with the community as to the proper
order of 
> operation to perform such a task.
> 
> The OSDs are divided evenly across an even number of hosts which are
then 
> divided evenly between 2 cabinets in 2 physically separate locations.
The 
> plan is to replace the OSDs, one host at a time, cycling back and
forth 
> between cabinets, replacing one host per week, or every 2 weeks
(Depending on 
> the amount of time the crush rebalancing takes).
> 
> For each host, the plan was to mark the OSDs as out, one at a time,
closely 
> monitoring each of them, moving to the next OSD one the current one
is 
> balanced out. Once all OSDs are successfully marked as out, we will
then 
> delete those OSDs from the cluster, shutdown the server, replace the
physical 
> drives, and once rebooted, add the new drives to the cluster as new
OSDs 
> using the same method we've used previously, doing so one at a time
to allow 
> for rebalancing as they rejoin the cluster.
> 
> My questions are*Does this process sound correct? Should I also mark
the 
> OSDs as down when I mark them as out? Are there any steps I'm
overlooking in 
> this process?
> 
> Any advice is greatly appreciated.
> 
> Cheers,
> -
> Stephen Mercier | Sr. Systems Architect
> Attainia Capital Planning Solutions (ACPS)
> O: (650)241-0567, 727 | TF: (866)288-2464, 727
> stephen.merc...@attainia.com | www.attainia.com

-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Advice on OSD upgrades

2016-04-14 Thread Wido den Hollander

> Op 14 april 2016 om 15:29 schreef Stephen Mercier
> :
> 
> 
> Good morning,
> 
> We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for the
> past 20 months. We're very happy with our experience with the platform so far.
> 
> Shortly, we will be embarking on an initiative to replace all 88 OSDs with new
> drives (Planned maintenance and lifecycle replacement). Before we do so,
> however, I wanted to confirm with the community as to the proper order of
> operation to perform such a task.
> 
> The OSDs are divided evenly across an even number of hosts which are then
> divided evenly between 2 cabinets in 2 physically separate locations. The plan
> is to replace the OSDs, one host at a time, cycling back and forth between
> cabinets, replacing one host per week, or every 2 weeks (Depending on the
> amount of time the crush rebalancing takes).
> 

I assume that your replication is set to "2" and that you replicate over the two
locations?

In that case, only work on HDDs in the first location and start on the second
one after you replaced them all.

> For each host, the plan was to mark the OSDs as out, one at a time, closely
> monitoring each of them, moving to the next OSD one the current one is
> balanced out. Once all OSDs are successfully marked as out, we will then
> delete those OSDs from the cluster, shutdown the server, replace the physical
> drives, and once rebooted, add the new drives to the cluster as new OSDs using
> the same method we've used previously, doing so one at a time to allow for
> rebalancing as they rejoin the cluster.
> 
> My questions are…Does this process sound correct? Should I also mark the OSDs
> as down when I mark them as out? Are there any steps I'm overlooking in this
> process?
> 

No, marking out is just fine. That tells CRUSH the OSD is no longer
participating in the data placement. It's effective weight will be 0 and that's
it.

Like others mention, reweight the OSD to 0 at the same time you mark it as out.
That way you prevent a double rebalance.

Keep it marked as UP so that it can help in migrating the PGs to other nodes.

> Any advice is greatly appreciated.
> 
> Cheers,
> -
> Stephen Mercier | Sr. Systems Architect
> Attainia Capital Planning Solutions (ACPS)
> O: (650)241-0567, 727 | TF: (866)288-2464, 727
> stephen.merc...@attainia.com | www.attainia.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Advice on OSD upgrades

2016-04-14 Thread Stephen Mercier
Sadly, this is not an option. Not only are there no free slots on the hosts, 
but we're downgrading in size for each OSD because we decided to sacrifice 
space to make a significant jump in drive quality. 

We're not really too concerned about the rebalancing, as we monitor the cluster 
closely and have the available breathing-room to withstand the impact as long 
as we're methodical and measured about it.

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727
stephen.merc...@attainia.com | www.attainia.com

On Apr 14, 2016, at 6:45 AM, koukou73gr wrote:

> If you have empty drive slots in your OSD hosts, I'd be tempted to
> insert new drive in slot, set noout, shutdown one OSD, unmount OSD
> directory, dd the old drive to the new one, remove old drive, restart OSD.
> 
> No rebalancing and minimal data movment when the OSD rejoins.
> 
> -K.
> 
> On 04/14/2016 04:29 PM, Stephen Mercier wrote:
>> Good morning,
>> 
>> We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for
>> the past 20 months. We're very happy with our experience with the
>> platform so far.
>> 
>> Shortly, we will be embarking on an initiative to replace all 88 OSDs
>> with new drives (Planned maintenance and lifecycle replacement). Before
>> we do so, however, I wanted to confirm with the community as to the
>> proper order of operation to perform such a task.
>> 
>> The OSDs are divided evenly across an even number of hosts which are
>> then divided evenly between 2 cabinets in 2 physically separate
>> locations. The plan is to replace the OSDs, one host at a time, cycling
>> back and forth between cabinets, replacing one host per week, or every 2
>> weeks (Depending on the amount of time the crush rebalancing takes).
>> 
>> For each host, the plan was to mark the OSDs as out, one at a time,
>> closely monitoring each of them, moving to the next OSD one the current
>> one is balanced out. Once all OSDs are successfully marked as out, we
>> will then delete those OSDs from the cluster, shutdown the server,
>> replace the physical drives, and once rebooted, add the new drives to
>> the cluster as new OSDs using the same method we've used previously,
>> doing so one at a time to allow for rebalancing as they rejoin the cluster.
>> 
>> My questions are…Does this process sound correct? Should I also mark the
>> OSDs as down when I mark them as out? Are there any steps I'm
>> overlooking in this process?
>> 
>> Any advice is greatly appreciated.
>> 
>> Cheers,
>> -
>> Stephen Mercier | Sr. Systems Architect
>> Attainia Capital Planning Solutions (ACPS)
>> O: (650)241-0567, 727 | TF: (866)288-2464, 727
>> stephen.merc...@attainia.com  |
>> www.attainia.com 
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] remote logging

2016-04-14 Thread Wido den Hollander

> Op 14 april 2016 om 14:46 schreef Steffen Weißgerber :
> 
> 
> Hello,
> 
> I tried to configure ceph logging to a remote syslog host based on
> Sebastian Han's Blog
> (http://www.sebastien-han.fr/blog/2013/01/07/logging-in-ceph/):
> 
> ceph.conf
> 
> [global]
> ...
> log_file = none
> log_to_syslog = true
> err_to_syslog = true
> 
> [mon]
> mon_cluster_log_to_syslog = true
> mon_cluster_log_file = none
> 
> The remote logging works fine but nevertheless i find local logging in the
> file /none.
> 

It has been a long time since I tried syslog. Could you try:

log_file = ""

See how that works out.

Wido

> That means:
> 
> monitor and osd logging on mon hosts
> 
> 2016-04-14 14:40:46.379376 mon.0 2.1.1.92:6789/0 2643 : cluster [INF] pgmap
> v39837493: 4624 pgs: 4624 active+clean; 22634 GB data, 68107 GB used, 122 TB 
> / 189 TB avail; 36972 kB/s rd, 6561 kB/s wr, 1733 op/s
> 2016-04-14 14:40:48.824882 7f21de1e9700  0 -- 2.1.1.138:6812/8489 >>
> 2.1.106.116:0/1407816754 pipe(0x1b69e000 sd=182 :6812 s=0 pgs=0 cs=0 l=0
> c=0x1bb5a2c
> 0).accept peer addr is really 2.1.106.116:0/1407816754 (socket is
> 2.1.106.116:60963/0)
> 2016-04-14 14:40:47.460665 mon.0 2.1.1.92:6789/0 2644 : cluster [INF] pgmap
> v39837494: 4624 pgs: 4624 active+clean; 22634 GB data, 68107 GB used, 122 TB 
> / 189 TB avail; 34412 kB/s rd, 8085 kB/s wr, 1762 op/s
> 
> and osd logging on non mon hosts.
> 
> I configured this on Giant and now migrated to Hammer (based on Ubuntu 14.04.4
> LTS) without change.
> 
> What I'm doing wrong?
> 
> Thanks in advance.
> 
> 
> Regards
> 
> Steffen
> 
> 
> 
> -- 
> Klinik-Service Neubrandenburg GmbH
> Allendestr. 30, 17036 Neubrandenburg
> Amtsgericht Neubrandenburg, HRB 2457
> Geschaeftsfuehrerin: Gudrun Kappich
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Advice on OSD upgrades

2016-04-14 Thread Stephen Mercier
Thank you for the advice.

Our crush map is actually setup with replication set to 3, and at least one 
copy in each cabinet, ensuring no one host is a single point of failure. We 
fully intended on performing this maintenance over the course of many week, one 
host at a time. We felt that the staggered deploy times for the SSDs, based on 
their unique failure nature, was a benefit anyway. (i.e. When one goes, all of 
its friends are usually close behind)

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727
stephen.merc...@attainia.com | www.attainia.com

On Apr 14, 2016, at 7:00 AM, Wido den Hollander wrote:

> 
>> Op 14 april 2016 om 15:29 schreef Stephen Mercier
>> :
>> 
>> 
>> Good morning,
>> 
>> We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for the
>> past 20 months. We're very happy with our experience with the platform so 
>> far.
>> 
>> Shortly, we will be embarking on an initiative to replace all 88 OSDs with 
>> new
>> drives (Planned maintenance and lifecycle replacement). Before we do so,
>> however, I wanted to confirm with the community as to the proper order of
>> operation to perform such a task.
>> 
>> The OSDs are divided evenly across an even number of hosts which are then
>> divided evenly between 2 cabinets in 2 physically separate locations. The 
>> plan
>> is to replace the OSDs, one host at a time, cycling back and forth between
>> cabinets, replacing one host per week, or every 2 weeks (Depending on the
>> amount of time the crush rebalancing takes).
>> 
> 
> I assume that your replication is set to "2" and that you replicate over the 
> two
> locations?
> 
> In that case, only work on HDDs in the first location and start on the second
> one after you replaced them all.
> 
>> For each host, the plan was to mark the OSDs as out, one at a time, closely
>> monitoring each of them, moving to the next OSD one the current one is
>> balanced out. Once all OSDs are successfully marked as out, we will then
>> delete those OSDs from the cluster, shutdown the server, replace the physical
>> drives, and once rebooted, add the new drives to the cluster as new OSDs 
>> using
>> the same method we've used previously, doing so one at a time to allow for
>> rebalancing as they rejoin the cluster.
>> 
>> My questions are…Does this process sound correct? Should I also mark the OSDs
>> as down when I mark them as out? Are there any steps I'm overlooking in this
>> process?
>> 
> 
> No, marking out is just fine. That tells CRUSH the OSD is no longer
> participating in the data placement. It's effective weight will be 0 and 
> that's
> it.
> 
> Like others mention, reweight the OSD to 0 at the same time you mark it as 
> out.
> That way you prevent a double rebalance.
> 
> Keep it marked as UP so that it can help in migrating the PGs to other nodes.
> 
>> Any advice is greatly appreciated.
>> 
>> Cheers,
>> -
>> Stephen Mercier | Sr. Systems Architect
>> Attainia Capital Planning Solutions (ACPS)
>> O: (650)241-0567, 727 | TF: (866)288-2464, 727
>> stephen.merc...@attainia.com | www.attainia.com
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] my cluster is down after upgrade to 10.1.2

2016-04-14 Thread Lomayani S. Laizer
Hello,
I upgraded from 10.1.0 to 10.1.2 with ceph-deploy and my cluster is down
now. getting below errors

ceph -s

2016-04-14 17:04:58.909894 7f14686e4700  0 -- :/2590574876 >>
10.10.200.4:6789/0 pipe(0x7f146405adf0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7f146405c0b0).fault
2016-04-14 17:05:01.909949 7f14685e3700  0 -- :/2590574876 >>
10.10.200.3:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f1458001f90).fault
2016-04-14 17:05:04.910416 7f14686e4700  0 -- :/2590574876 >>
10.10.200.4:6789/0 pipe(0x7f1458005120 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f14580063e0).fault
2016-04-14 17:05:07.910697 7f14685e3700  0 -- :/2590574876 >>
10.10.200.2:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f1458002410).fault

--
Lomayani
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster is down after upgrade to 10.1.2

2016-04-14 Thread c

Am 2016-04-14 16:05, schrieb Lomayani S. Laizer:

Hello,

I upgraded from 10.1.0 to 10.1.2 with ceph-deploy and my cluster is
down now. getting below errors

ceph -s

2016-04-14 17:04:58.909894 7f14686e4700  0 -- :/2590574876 >>
10.10.200.4:6789/0 [1] pipe(0x7f146405adf0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7f146405c0b0).fault
2016-04-14 17:05:01.909949 7f14685e3700  0 -- :/2590574876 >>
10.10.200.3:6789/0 [2] pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f1458001f90).fault
2016-04-14 17:05:04.910416 7f14686e4700  0 -- :/2590574876 >>
10.10.200.4:6789/0 [1] pipe(0x7f1458005120 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f14580063e0).fault
2016-04-14 17:05:07.910697 7f14685e3700  0 -- :/2590574876 >>
10.10.200.2:6789/0 [3] pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f1458002410).fault

--

Lomayani



Links:
--
[1] http://10.10.200.4:6789/0
[2] http://10.10.200.3:6789/0
[3] http://10.10.200.2:6789/0

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi Lomayani and other cephers,

i have the same issue - thankfully i am playing around with our test 
cluster.


this is what we get:

terminate called after throwing an instance of 
'ceph::buffer::end_of_buffer'

  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7fe2370a24c0 thread_name:ceph-mon
 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
 1: (()+0x4f3712) [0x55b2ed4b7712]
 2: (()+0x10340) [0x7fe2363b1340]
 3: (gsignal()+0x39) [0x7fe234639cc9]
 4: (abort()+0x148) [0x7fe23463d0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe234f44535]
 6: (()+0x5e6d6) [0x7fe234f426d6]
 7: (()+0x5e703) [0x7fe234f42703]
 8: (()+0x5e922) [0x7fe234f42922]
 9: (()+0x618f15) [0x55b2ed5dcf15]
 10: (FSMap::decode(ceph::buffer::list::iterator&)+0x101f) 
[0x55b2ed4faebf]

 11: (MDSMonitor::update_from_paxos(bool*)+0x178) [0x55b2ed321738]
 12: (PaxosService::refresh(bool*)+0x19a) [0x55b2ed2958da]
 13: (Monitor::refresh_from_paxos(bool*)+0x143) [0x55b2ed232643]
 14: (Monitor::init_paxos()+0x85) [0x55b2ed232a55]
 15: (Monitor::preinit()+0x925) [0x55b2ed242505]
 16: (main()+0x236d) [0x55b2ed1d10ed]
 17: (__libc_start_main()+0xf5) [0x7fe234624ec5]
 18: (()+0x25f28a) [0x55b2ed22328a]
2016-04-14 16:19:30.301995 7fe2370a24c0 -1 *** Caught signal (Aborted) 
**

 in thread 7fe2370a24c0 thread_name:ceph-mon

 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
 1: (()+0x4f3712) [0x55b2ed4b7712]
 2: (()+0x10340) [0x7fe2363b1340]
 3: (gsignal()+0x39) [0x7fe234639cc9]
 4: (abort()+0x148) [0x7fe23463d0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe234f44535]
 6: (()+0x5e6d6) [0x7fe234f426d6]
 7: (()+0x5e703) [0x7fe234f42703]
 8: (()+0x5e922) [0x7fe234f42922]
 9: (()+0x618f15) [0x55b2ed5dcf15]
 10: (FSMap::decode(ceph::buffer::list::iterator&)+0x101f) 
[0x55b2ed4faebf]

 11: (MDSMonitor::update_from_paxos(bool*)+0x178) [0x55b2ed321738]
 12: (PaxosService::refresh(bool*)+0x19a) [0x55b2ed2958da]
 13: (Monitor::refresh_from_paxos(bool*)+0x143) [0x55b2ed232643]
 14: (Monitor::init_paxos()+0x85) [0x55b2ed232a55]
 15: (Monitor::preinit()+0x925) [0x55b2ed242505]
 16: (main()+0x236d) [0x55b2ed1d10ed]
 17: (__libc_start_main()+0xf5) [0x7fe234624ec5]
 18: (()+0x25f28a) [0x55b2ed22328a]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.


 0> 2016-04-14 16:19:30.301995 7fe2370a24c0 -1 *** Caught signal 
(Aborted) **

 in thread 7fe2370a24c0 thread_name:ceph-mon

 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4)
 1: (()+0x4f3712) [0x55b2ed4b7712]
 2: (()+0x10340) [0x7fe2363b1340]
 3: (gsignal()+0x39) [0x7fe234639cc9]
 4: (abort()+0x148) [0x7fe23463d0d8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fe234f44535]
 6: (()+0x5e6d6) [0x7fe234f426d6]
 7: (()+0x5e703) [0x7fe234f42703]
 8: (()+0x5e922) [0x7fe234f42922]
 9: (()+0x618f15) [0x55b2ed5dcf15]
 10: (FSMap::decode(ceph::buffer::list::iterator&)+0x101f) 
[0x55b2ed4faebf]

 11: (MDSMonitor::update_from_paxos(bool*)+0x178) [0x55b2ed321738]
 12: (PaxosService::refresh(bool*)+0x19a) [0x55b2ed2958da]
 13: (Monitor::refresh_from_paxos(bool*)+0x143) [0x55b2ed232643]
 14: (Monitor::init_paxos()+0x85) [0x55b2ed232a55]
 15: (Monitor::preinit()+0x925) [0x55b2ed242505]
 16: (main()+0x236d) [0x55b2ed1d10ed]
 17: (__libc_start_main()+0xf5) [0x7fe234624ec5]
 18: (()+0x25f28a) [0x55b2ed22328a]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.



*ceph.conf*
[global]
#enable experimental unrecoverable data corrupting features = *
fsid = xx-xx-xx--xx
public_network = 172.xxx.xx.x/xx
cluster_network = 10.xxx.xx.x/xx
#mon_initial_members = srv1, srv2, srv3
#mon_host = 172.xxx.xx.1,172.xxx.xx.2,172.xxx.xx.3
mon_initial_members = mon1,mon2,mon3
mon_host = 172.xxx.xx.118,172.xxx.xx.119,172.xxx.xx.120
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
f

Re: [ceph-users] v10.1.2 Jewel release candidate release

2016-04-14 Thread Milosz Tanski
On Thu, Apr 14, 2016 at 6:32 AM, John Spray  wrote:
> On Thu, Apr 14, 2016 at 8:31 AM, Vincenzo Pii
>  wrote:
>>
>> On 14 Apr 2016, at 00:09, Gregory Farnum  wrote:
>>
>> On Wed, Apr 13, 2016 at 3:02 PM, Sage Weil  wrote:
>>
>> Hi everyone,
>>
>> The third (and likely final) Jewel release candidate is out.  We have a
>> very small number of remaining blocker issues and a bit of final polish
>> before we publish Jewel 10.2.0, probably next week.
>>
>> There are no known issues with this release that are serious enough to
>> warn about here.  Greg is adding some CephFS checks so that admins don't
>> accidentally start using less-stable features,
>>
>>
>> s/is adding/has added/
>>
>>http://docs.ceph.com/docs/master/release-notes/
>>
>>
>> As noted in another thread, there's still a big CephFS warning in the
>> online docs. We'll be cleaning those up, since we now have the
>> recovery tools we desire! Some things are known to still be slow or
>> sub-optimal, but we consider CephFS stable and safe at this time when
>> run in the default single-MDS configuration. (It won't let you do
>> anything bad without very explicitly setting flags and acknowledging
>> they're dangerous.)
>> :)
>> -Greg
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> Hi Greg,
>>
>> A clarification:
>>
>> When you say that things will be safe in “single-MDS” configuration, do you
>> also exclude the HA setup with one active MDS and some passive (standby)
>> ones? Or this would be safe as well?
>
> Yes, having standbys is fine (including "standby replay" daemons).  We
> should really say "single active MDS configuration", but it's a bit of
> a mouthful!

master - hot standby(s) is okay
multi-master is not supported

I feel like this is terminology that's more familiar (to me) from
other systems (eg. databases).

>
> John
>
>>
>> Vincenzo Pii | TERALYTICS
>> DevOps Engineer
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: mil...@adfin.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deprecating ext4 support

2016-04-14 Thread Christian Balzer

Hello,

[reduced to ceph-users]

On Thu, 14 Apr 2016 11:43:07 +0200 Steffen Weißgerber wrote:

> 
> 
> >>> Christian Balzer  schrieb am Dienstag, 12. April 2016
> >>> um 01:39:
> 
> > Hello,
> > 
> 
> Hi,
> 
> > I'm officially only allowed to do (preventative) maintenance during
> > weekend nights on our main production cluster. 
> > That would mean 13 ruined weekends at the realistic rate of 1 OSD per
> > night, so you can see where my lack of enthusiasm for OSD recreation
> > comes from.
> > 
> 
> Wondering extremely about that. We introduced ceph for VM's on RBD to not
> have to move maintenance time to night shift.
> 
This is Japan. 
It makes the most anal retentive people/rules in "der alten Heimat" look
like a bunch of hippies on drugs.

Note the preventative and I should have put "officially" in quotes, like
that.

I can do whatever I feel comfortable with on our other production cluster,
since there aren't hundreds of customers with very, VERY tight SLAs on it.

So if I were to tell my boss that I want to renew all OSDs he'd say "Sure,
but at time that if anything goes wrong it will not impact any customer
unexpectedly" meaning the official maintenance windows...

> My understanding of ceph is that it was also made as reliable storage in
> case of hardware failure.
>
Reliable, yes. With certain limitations, see below.
 
> So what's the difference between maintain an osd and it's failure in
> effect for the end user? In both cases it should be none.
> 
Ideally, yes.
Note than an OSD failure can result in slow I/O (to the point of what
would be considered service interruption) depending on the failure mode
and the various timeout settings.

So planned and properly executed maintenance has less impact.
None (or at least not noticeable) IF your cluster has enough resources
and/or all the tuning has been done correctly.

> Maintaining OSD's should be routine so that you're confident that your
> application stays save while hardware fails in a amount one configured
> unused reserve.
> 
IO is a very fickle beast, it may perform splendidly at 2000ops/s just to
totally go down the drain at 2100. 
Knowing your capacity and reserve isn't straightforward, especially not in
a live environment as compared to synthetic tests. 

In short, could that cluster (now, after upgrades and adding a cache tier)
handle OSD renewals at any given time?
Absolutely.
Will I get an official blessing to do so?
No effing way.

> In the end what happens to your cluster, when a complete node fails?
> 
Nothing much, in fact LESS than when an OSD should fail since it won't
trigger re-balancing (mon_osd_down_out_subtree_limit = host).

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster is down after upgrade to 10.1.2

2016-04-14 Thread Gregory Farnum
On Thu, Apr 14, 2016 at 7:05 AM, Lomayani S. Laizer  wrote:
> Hello,
> I upgraded from 10.1.0 to 10.1.2 with ceph-deploy and my cluster is down
> now. getting below errors
>
> ceph -s
>
> 2016-04-14 17:04:58.909894 7f14686e4700  0 -- :/2590574876 >>
> 10.10.200.4:6789/0 pipe(0x7f146405adf0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f146405c0b0).fault
> 2016-04-14 17:05:01.909949 7f14685e3700  0 -- :/2590574876 >>
> 10.10.200.3:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f1458001f90).fault
> 2016-04-14 17:05:04.910416 7f14686e4700  0 -- :/2590574876 >>
> 10.10.200.4:6789/0 pipe(0x7f1458005120 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f14580063e0).fault
> 2016-04-14 17:05:07.910697 7f14685e3700  0 -- :/2590574876 >>
> 10.10.200.2:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f1458002410).fault

Assuming your monitors aren't running and have crashed, can you get
the backtrace out of their log files?

We just discovered an issue with the new FSMap encoding in 10.1.2, if
you had already run an rc and had a filesystem. Patch is building and
being tested now.
-Greg

>
> --
> Lomayani
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd prepare 10.1.2

2016-04-14 Thread Michael Hanscho
Hi!

A fresh install of 10.1.2 on CentOS 7.2.1511 fails adding osds:

[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v
prepare --cluster ceph --fs-type xfs -- /dev/sdm /dev/sdi
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs

The reason seems to be a failing partprobe command:
[cestor1][WARNIN] update_partition: Calling partprobe on created device
/dev/sdi
[cestor1][WARNIN] command_check_call: Running command: /usr/bin/udevadm
settle --timeout=600
[cestor1][WARNIN] command: Running command: /sbin/partprobe /dev/sdi
[cestor1][WARNIN] update_partition: partprobe /dev/sdi failed : Error:
Error informing the kernel about modifications to partition /dev/sdi1 --
Device or resource busy.  This means Linux won't know about any changes
you made to /dev/sdi1 until you reboot -- so you shouldn't mount it or
use it in any way before rebooting.
[cestor1][WARNIN] Error: Failed to add partition 1 (Device or resource busy)
[cestor1][WARNIN]  (ignored, waiting 60s)

Attached ceph-deploy-osd-prepare-error.log with the details.

Modifying ceph-disk to ignore the partprobe failing allows to proceed.
Any hints?

Gruesse
Michael
ceph-deploy osd prepare cestor1:/dev/sdm:/dev/sdi
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/ink/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.31): /bin/ceph-deploy osd prepare cestor1:/dev/sdm:/dev/sdi
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username  : None
[ceph_deploy.cli][INFO  ]  disk  : [('cestor1', '/dev/sdm', '/dev/sdi')]
[ceph_deploy.cli][INFO  ]  dmcrypt   : False
[ceph_deploy.cli][INFO  ]  verbose   : False
[ceph_deploy.cli][INFO  ]  overwrite_conf: False
[ceph_deploy.cli][INFO  ]  subcommand: prepare
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir   : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet : False
[ceph_deploy.cli][INFO  ]  cd_conf   : 
[ceph_deploy.cli][INFO  ]  cluster   : ceph
[ceph_deploy.cli][INFO  ]  fs_type   : xfs
[ceph_deploy.cli][INFO  ]  func  : 
[ceph_deploy.cli][INFO  ]  ceph_conf : None
[ceph_deploy.cli][INFO  ]  default_release   : False
[ceph_deploy.cli][INFO  ]  zap_disk  : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks cestor1:/dev/sdm:/dev/sdi
[cestor1][DEBUG ] connection detected need for sudo
[cestor1][DEBUG ] connected to host: cestor1 
[cestor1][DEBUG ] detect platform information from remote host
[cestor1][DEBUG ] detect machine type
[cestor1][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.2.1511 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to cestor1
[cestor1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host cestor1 disk /dev/sdm journal /dev/sdi activate False
[cestor1][INFO  ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdm /dev/sdi
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-osd --check-allows-journal -i 0 --cluster ceph
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-osd --check-wants-journal -i 0 --cluster ceph
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-osd --check-needs-journal -i 0 --cluster ceph
[cestor1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdm uuid path is /sys/dev/block/8:192/dm/uuid
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[cestor1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdm uuid path is /sys/dev/block/8:192/dm/uuid
[cestor1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdm uuid path is /sys/dev/block/8:192/dm/uuid
[cestor1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdm uuid path is /sys/dev/block/8:192/dm/uuid
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[cestor1][WARNIN] command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[cestor1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdi uuid path is /sys/dev/block/8:128/dm/uuid
[cestor1][WARNIN] prepare_device: OSD will not be hot-swappable if journal is not the same device as the osd data
[cestor1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdi uuid path is /sys/dev/block/8:128/dm/uuid
[cestor1][WARNIN] ptype_tobe_for_name: name = journal
[cestor1][WARNIN] get_dm_uuid: get_dm_uuid /dev/sdi uuid path is /sys/dev/bl

Re: [ceph-users] Deprecating ext4 support

2016-04-14 Thread Michael Metz-Martini | SpeedPartner GmbH
Hi,

Am 14.04.2016 um 03:32 schrieb Christian Balzer:
> On Wed, 13 Apr 2016 14:51:58 +0200 Michael Metz-Martini | SpeedPartner GmbH 
> wrote:
>> Am 13.04.2016 um 04:29 schrieb Christian Balzer:
>>> On Tue, 12 Apr 2016 09:00:19 +0200 Michael Metz-Martini | SpeedPartner GmbH 
>>> wrote:
 Am 11.04.2016 um 23:39 schrieb Sage Weil:
> ext4 has never been recommended, but we did test it.  After Jewel is
> out, we would like explicitly recommend *against* ext4 and stop
> testing it.
 Hmmm. We're currently migrating away from xfs as we had some strange
 performance-issues which were resolved / got better by switching to
 ext4. We think this is related to our high number of objects (4358
 Mobjects according to ceph -s).
>>> It would be interesting to see on how this maps out to the OSDs/PGs.
>>> I'd guess loads and loads of subdirectories per PG, which is probably
>>> where Ext4 performs better than XFS.
>> A simple ls -l takes "ages" on XFS while ext4 lists a directory
>> immediately. According to our findings regarding XFS this seems to be
>> "normal" behavior.
> Just for the record, this is also influenced (for Ext4 at least) on how
> much memory you have and the "vm/vfs_cache_pressure" settings. 
> Once Ext4 runs out of space in SLAB for dentry and ext4_inode_cache
> (amongst others), it will become slower as well, since it has to go to the
> disk.
> Another thing to remember is that "ls" by itself is also a LOT faster than
> "ls -l" since it accesses less data.
128 GB RAM for 21 OSD (each 4 TB in size). Kernel so far "untuned"
regarding cache-pressure / inode-cache.



>> pool name   category KB  objects
>> data-   3240   2265521646
>> document_root   - 57736410150
>> images  -96197462245   2256616709
>> metadata-1150105 35903724
>> queue   -  542967346   173865
>> raw -36875247450 13095410
>>
>> total of 4736 pgs, 6 pools, 124 TB data, 4359 Mobjects
>>
>> What would you like to see?
>> tree? du per Directory?
> Just an example tree and typical size of the first "data layer".
> [...]

First levels seem to be empty, so:
./DIR_3
./DIR_3/DIR_9
./DIR_3/DIR_9/DIR_0
./DIR_3/DIR_9/DIR_0/DIR_0
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_0
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_D
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_E
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_A
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_C
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_1
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_4
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_2
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_B
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_5
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_3
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_9
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_6
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_F
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_7
./DIR_3/DIR_9/DIR_0/DIR_0/DIR_8
./DIR_3/DIR_9/DIR_0/DIR_D
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_0
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_D
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_E
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_A
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_C
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_1
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_4
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_2
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_B
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_5
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_3
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_9
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_6
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_F
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_7
./DIR_3/DIR_9/DIR_0/DIR_D/DIR_8
...

/var/lib/ceph/osd/ceph-58/current/6.93_head/DIR_3/DIR_9/DIR_C/DIR_0$ du
-ms *
99  DIR_0
102 DIR_1
105 DIR_2
102 DIR_3
101 DIR_4
105 DIR_5
106 DIR_6
102 DIR_7
105 DIR_8
98  DIR_9
99  DIR_A
105 DIR_B
103 DIR_C
100 DIR_D
103 DIR_E
104 DIR_F



>> As you can see we have one data-object in pool "data" per file saved
>> somewhere else. I'm not sure what's this related to, but maybe this is a
>> must by cephfs.
> That's rather confusing (even more so since I don't use CephFS), but it
> feels wrong.
> From what little I know about CephFS is that you can have only one FS per
> cluster and the pools can be arbitrarily named (default data and metadata).
[...]
> My guess is that you somehow managed to create things in a way that
> puts references (not the actual data) of everything in "images" to
> "data".
You can tune the pool by e.g.
cephfs /mnt/storage/docroot set_layout -p 4

We thought this was a good idea so that we can change the replication
size different for doc_root and raw-data if we like. Seems this was a
bad idea for all objects.

-- 
Kind regards
 Michael Metz-Martini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster is down after upgrade to 10.1.2

2016-04-14 Thread Lomayani S. Laizer
Hello Gregory,
Thanks for your reply. I think am hitting the same bug. Below is the link
for log just after an upgrade

https://justpaste.it/ta16

--
Lomayani

On Thu, Apr 14, 2016 at 6:24 PM, Gregory Farnum  wrote:

> On Thu, Apr 14, 2016 at 7:05 AM, Lomayani S. Laizer 
> wrote:
> > Hello,
> > I upgraded from 10.1.0 to 10.1.2 with ceph-deploy and my cluster is down
> > now. getting below errors
> >
> > ceph -s
> >
> > 2016-04-14 17:04:58.909894 7f14686e4700  0 -- :/2590574876 >>
> > 10.10.200.4:6789/0 pipe(0x7f146405adf0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> > c=0x7f146405c0b0).fault
> > 2016-04-14 17:05:01.909949 7f14685e3700  0 -- :/2590574876 >>
> > 10.10.200.3:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> > c=0x7f1458001f90).fault
> > 2016-04-14 17:05:04.910416 7f14686e4700  0 -- :/2590574876 >>
> > 10.10.200.4:6789/0 pipe(0x7f1458005120 sd=4 :0 s=1 pgs=0 cs=0 l=1
> > c=0x7f14580063e0).fault
> > 2016-04-14 17:05:07.910697 7f14685e3700  0 -- :/2590574876 >>
> > 10.10.200.2:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> > c=0x7f1458002410).fault
>
> Assuming your monitors aren't running and have crashed, can you get
> the backtrace out of their log files?
>
> We just discovered an issue with the new FSMap encoding in 10.1.2, if
> you had already run an rc and had a filesystem. Patch is building and
> being tested now.
> -Greg
>
> >
> > --
> > Lomayani
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster is down after upgrade to 10.1.2

2016-04-14 Thread Gregory Farnum
Yep! This is fixed in the jewel and master branches now, but we're
going to wait until the next rc (or final release!) to push official
packages for it.

In the meantime, you can install those from our gitbuilders following
the instructions at
http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development

Other CephFS users may want to hold off until the next release happens
— although I hope anybody using RCs is comfortable with dev packages
when needed. :)
-Greg

On Thu, Apr 14, 2016 at 10:41 AM, Lomayani S. Laizer
 wrote:
> Hello Gregory,
> Thanks for your reply. I think am hitting the same bug. Below is the link
> for log just after an upgrade
>
> https://justpaste.it/ta16
>
> --
> Lomayani
>
> On Thu, Apr 14, 2016 at 6:24 PM, Gregory Farnum  wrote:
>>
>> On Thu, Apr 14, 2016 at 7:05 AM, Lomayani S. Laizer 
>> wrote:
>> > Hello,
>> > I upgraded from 10.1.0 to 10.1.2 with ceph-deploy and my cluster is down
>> > now. getting below errors
>> >
>> > ceph -s
>> >
>> > 2016-04-14 17:04:58.909894 7f14686e4700  0 -- :/2590574876 >>
>> > 10.10.200.4:6789/0 pipe(0x7f146405adf0 sd=3 :0 s=1 pgs=0 cs=0 l=1
>> > c=0x7f146405c0b0).fault
>> > 2016-04-14 17:05:01.909949 7f14685e3700  0 -- :/2590574876 >>
>> > 10.10.200.3:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
>> > c=0x7f1458001f90).fault
>> > 2016-04-14 17:05:04.910416 7f14686e4700  0 -- :/2590574876 >>
>> > 10.10.200.4:6789/0 pipe(0x7f1458005120 sd=4 :0 s=1 pgs=0 cs=0 l=1
>> > c=0x7f14580063e0).fault
>> > 2016-04-14 17:05:07.910697 7f14685e3700  0 -- :/2590574876 >>
>> > 10.10.200.2:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
>> > c=0x7f1458002410).fault
>>
>> Assuming your monitors aren't running and have crashed, can you get
>> the backtrace out of their log files?
>>
>> We just discovered an issue with the new FSMap encoding in 10.1.2, if
>> you had already run an rc and had a filesystem. Patch is building and
>> being tested now.
>> -Greg
>>
>> >
>> > --
>> > Lomayani
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deprecating ext4 support

2016-04-14 Thread Samuel Just
It doesn't seem like it would be wise to run such systems on top of rbd.
-Sam

On Thu, Apr 14, 2016 at 11:05 AM, Jianjian Huo  wrote:
> On Wed, Apr 13, 2016 at 6:06 AM, Sage Weil  wrote:
>> On Tue, 12 Apr 2016, Jan Schermer wrote:
>>> Who needs to have exactly the same data in two separate objects
>>> (replicas)? Ceph needs it because "consistency"?, but the app (VM
>>> filesystem) is fine with whatever version because the flush didn't
>>> happen (if it did the contents would be the same).
>>
>> While we're talking/thinking about this, here's a simple example of why
>> the simple solution (let the replicas be out of sync), which seems
>> reasonable at first, can blow up in your face.
>>
>> If a disk block contains A and you write B over the top of it and then
>> there is a failure (e.g. power loss before you issue a flush), it's okay
>> for the disk to contain either A or B.  In a replicated system, let's say
>> 2x mirroring (call them R1 and R2), you might end up with B on R1 and A
>> on R2.  If you don't immediately clean it up, then at some point down the
>> line you might switch from reading R1 to reading R2 and the disk block
>> will go "back in time" (previously you read B, now you read A).  A
>> single disk/replica will never do that, and applications can break.
>>
>> For example, if the block in question is a journal block, we might see B
>> the first time (valid journal!), the do a bunch of work and
>> journal/write new stuff to the blocks that follow.  Then we lose
>> power again, lose R1, replay the journal, read A from R2, and stop journal
>> replay early... missing out on all the new stuff.  This can easily corrupt
>> a file system or database or whatever else.
>
> If data is critical, applications use their own replicas, MySQL,
> Cassandra, MongoDB... if above scenario happens and one replica is out
> of sync, they use quorum like protocol to guarantee reading the latest
> data, and repair those out-of-sync replicas. so eventual consistency
> in storage is acceptable for them?
>
> Jianjian
>>
>> It might sound unlikely, but keep in mind that writes to these
>> all-important metadata and commit blocks are extremely frequent.  It's the
>> kind of thing you can usually get away with, until you don't, and then you
>> have a very bad day...
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd prepare 10.1.2

2016-04-14 Thread Benjeman Meekhof
Hi Michael,

The partprobe issue was resolved for me by updating parted to the
package from Fedora 22:  parted-3.2-16.fc22.x86_64.  It shouldn't
require any other dependencies updated to install on EL7 varieties.

http://tracker.ceph.com/issues/15176

regards,
Ben

On Thu, Apr 14, 2016 at 12:35 PM, Michael Hanscho  wrote:
> Hi!
>
> A fresh install of 10.1.2 on CentOS 7.2.1511 fails adding osds:
>
> [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v
> prepare --cluster ceph --fs-type xfs -- /dev/sdm /dev/sdi
> [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
>
> The reason seems to be a failing partprobe command:
> [cestor1][WARNIN] update_partition: Calling partprobe on created device
> /dev/sdi
> [cestor1][WARNIN] command_check_call: Running command: /usr/bin/udevadm
> settle --timeout=600
> [cestor1][WARNIN] command: Running command: /sbin/partprobe /dev/sdi
> [cestor1][WARNIN] update_partition: partprobe /dev/sdi failed : Error:
> Error informing the kernel about modifications to partition /dev/sdi1 --
> Device or resource busy.  This means Linux won't know about any changes
> you made to /dev/sdi1 until you reboot -- so you shouldn't mount it or
> use it in any way before rebooting.
> [cestor1][WARNIN] Error: Failed to add partition 1 (Device or resource busy)
> [cestor1][WARNIN]  (ignored, waiting 60s)
>
> Attached ceph-deploy-osd-prepare-error.log with the details.
>
> Modifying ceph-disk to ignore the partprobe failing allows to proceed.
> Any hints?
>
> Gruesse
> Michael
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster is down after upgrade to 10.1.2

2016-04-14 Thread Lomayani S. Laizer
Hello,
Upgraded the cluster but still seeing the same issue. Is the cluster not
recoverable?

ceph --version
ceph version 10.1.2-64-ge657ecf (e657ecf8e437047b827aa89fb9c10be82643300c)

root@mon-b:~# ceph -w
2016-04-14 22:17:56.766169 7f5da3fff700  0 -- 10.10.200.3:0/1828342317 >>
10.10.200.3:6789/0 pipe(0x7f5da8000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f5da8001f90).fault
2016-04-14 22:18:02.766859 7f5db8215700  0 -- 10.10.200.3:0/1828342317 >>
10.10.200.2:6789/0 pipe(0x7f5da8007790 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7f5da8002410).fault
2016-04-14 22:18:05.767017 7f5da3fff700  0 -- 10.10.200.3:0/1828342317 >>
10.10.200.3:6789/0 pipe(0x7f5da80051a0 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7f5da8002bc0).fault

--
Lomayani


On Thu, Apr 14, 2016 at 8:46 PM, Gregory Farnum  wrote:

> Yep! This is fixed in the jewel and master branches now, but we're
> going to wait until the next rc (or final release!) to push official
> packages for it.
>
> In the meantime, you can install those from our gitbuilders following
> the instructions at
> http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development
>
> Other CephFS users may want to hold off until the next release happens
> — although I hope anybody using RCs is comfortable with dev packages
> when needed. :)
> -Greg
>
> On Thu, Apr 14, 2016 at 10:41 AM, Lomayani S. Laizer
>  wrote:
> > Hello Gregory,
> > Thanks for your reply. I think am hitting the same bug. Below is the link
> > for log just after an upgrade
> >
> > https://justpaste.it/ta16
> >
> > --
> > Lomayani
> >
> > On Thu, Apr 14, 2016 at 6:24 PM, Gregory Farnum 
> wrote:
> >>
> >> On Thu, Apr 14, 2016 at 7:05 AM, Lomayani S. Laizer <
> lomlai...@gmail.com>
> >> wrote:
> >> > Hello,
> >> > I upgraded from 10.1.0 to 10.1.2 with ceph-deploy and my cluster is
> down
> >> > now. getting below errors
> >> >
> >> > ceph -s
> >> >
> >> > 2016-04-14 17:04:58.909894 7f14686e4700  0 -- :/2590574876 >>
> >> > 10.10.200.4:6789/0 pipe(0x7f146405adf0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> >> > c=0x7f146405c0b0).fault
> >> > 2016-04-14 17:05:01.909949 7f14685e3700  0 -- :/2590574876 >>
> >> > 10.10.200.3:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> >> > c=0x7f1458001f90).fault
> >> > 2016-04-14 17:05:04.910416 7f14686e4700  0 -- :/2590574876 >>
> >> > 10.10.200.4:6789/0 pipe(0x7f1458005120 sd=4 :0 s=1 pgs=0 cs=0 l=1
> >> > c=0x7f14580063e0).fault
> >> > 2016-04-14 17:05:07.910697 7f14685e3700  0 -- :/2590574876 >>
> >> > 10.10.200.2:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> >> > c=0x7f1458002410).fault
> >>
> >> Assuming your monitors aren't running and have crashed, can you get
> >> the backtrace out of their log files?
> >>
> >> We just discovered an issue with the new FSMap encoding in 10.1.2, if
> >> you had already run an rc and had a filesystem. Patch is building and
> >> being tested now.
> >> -Greg
> >>
> >> >
> >> > --
> >> > Lomayani
> >> >
> >> >
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd prepare 10.1.2

2016-04-14 Thread Michael Hanscho
Hi Ben!

Thanks for the information - I will try that (although I am not happy to
leave the centos / redhat path)...

Gruesse
Michael

On 2016-04-14 20:44, Benjeman Meekhof wrote:
> Hi Michael,
> 
> The partprobe issue was resolved for me by updating parted to the
> package from Fedora 22:  parted-3.2-16.fc22.x86_64.  It shouldn't
> require any other dependencies updated to install on EL7 varieties.
> 
> http://tracker.ceph.com/issues/15176
> 
> regards,
> Ben
> 
> On Thu, Apr 14, 2016 at 12:35 PM, Michael Hanscho  wrote:
>> Hi!
>>
>> A fresh install of 10.1.2 on CentOS 7.2.1511 fails adding osds:
>>
>> [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v
>> prepare --cluster ceph --fs-type xfs -- /dev/sdm /dev/sdi
>> [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
>>
>> The reason seems to be a failing partprobe command:
>> [cestor1][WARNIN] update_partition: Calling partprobe on created device
>> /dev/sdi
>> [cestor1][WARNIN] command_check_call: Running command: /usr/bin/udevadm
>> settle --timeout=600
>> [cestor1][WARNIN] command: Running command: /sbin/partprobe /dev/sdi
>> [cestor1][WARNIN] update_partition: partprobe /dev/sdi failed : Error:
>> Error informing the kernel about modifications to partition /dev/sdi1 --
>> Device or resource busy.  This means Linux won't know about any changes
>> you made to /dev/sdi1 until you reboot -- so you shouldn't mount it or
>> use it in any way before rebooting.
>> [cestor1][WARNIN] Error: Failed to add partition 1 (Device or resource busy)
>> [cestor1][WARNIN]  (ignored, waiting 60s)
>>
>> Attached ceph-deploy-osd-prepare-error.log with the details.
>>
>> Modifying ceph-disk to ignore the partprobe failing allows to proceed.
>> Any hints?
>>
>> Gruesse
>> Michael
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] my cluster is down after upgrade to 10.1.2

2016-04-14 Thread Gregory Farnum
On Thu, Apr 14, 2016 at 12:19 PM, Lomayani S. Laizer
 wrote:
> Hello,
> Upgraded the cluster but still seeing the same issue. Is the cluster not
> recoverable?
>
> ceph --version
> ceph version 10.1.2-64-ge657ecf (e657ecf8e437047b827aa89fb9c10be82643300c)
>
> root@mon-b:~# ceph -w
> 2016-04-14 22:17:56.766169 7f5da3fff700  0 -- 10.10.200.3:0/1828342317 >>
> 10.10.200.3:6789/0 pipe(0x7f5da8000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f5da8001f90).fault
> 2016-04-14 22:18:02.766859 7f5db8215700  0 -- 10.10.200.3:0/1828342317 >>
> 10.10.200.2:6789/0 pipe(0x7f5da8007790 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f5da8002410).fault
> 2016-04-14 22:18:05.767017 7f5da3fff700  0 -- 10.10.200.3:0/1828342317 >>
> 10.10.200.3:6789/0 pipe(0x7f5da80051a0 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f5da8002bc0).fault

Please check the state of the actual daemon — this just means that the
ceph cli client couldn't set up a session with a monitor, which can
happen for an infinite number of reasons.
If the monitor has actually crashed again, please install the debug
packages and start up the monitor with "debug mon = 20" and "debug mds
= 20" in its config file, then post the log.

If it hasn't crashed, you probably don't have a quorum running. You'll
need to upgrade each of them to that gitbuilder version of the code
for them to be happy.
-Greg

>
> --
> Lomayani
>
>
> On Thu, Apr 14, 2016 at 8:46 PM, Gregory Farnum  wrote:
>>
>> Yep! This is fixed in the jewel and master branches now, but we're
>> going to wait until the next rc (or final release!) to push official
>> packages for it.
>>
>> In the meantime, you can install those from our gitbuilders following
>> the instructions at
>>
>> http://docs.ceph.com/docs/master/install/get-packages/#add-ceph-development
>>
>> Other CephFS users may want to hold off until the next release happens
>> — although I hope anybody using RCs is comfortable with dev packages
>> when needed. :)
>> -Greg
>>
>> On Thu, Apr 14, 2016 at 10:41 AM, Lomayani S. Laizer
>>  wrote:
>> > Hello Gregory,
>> > Thanks for your reply. I think am hitting the same bug. Below is the
>> > link
>> > for log just after an upgrade
>> >
>> > https://justpaste.it/ta16
>> >
>> > --
>> > Lomayani
>> >
>> > On Thu, Apr 14, 2016 at 6:24 PM, Gregory Farnum 
>> > wrote:
>> >>
>> >> On Thu, Apr 14, 2016 at 7:05 AM, Lomayani S. Laizer
>> >> 
>> >> wrote:
>> >> > Hello,
>> >> > I upgraded from 10.1.0 to 10.1.2 with ceph-deploy and my cluster is
>> >> > down
>> >> > now. getting below errors
>> >> >
>> >> > ceph -s
>> >> >
>> >> > 2016-04-14 17:04:58.909894 7f14686e4700  0 -- :/2590574876 >>
>> >> > 10.10.200.4:6789/0 pipe(0x7f146405adf0 sd=3 :0 s=1 pgs=0 cs=0 l=1
>> >> > c=0x7f146405c0b0).fault
>> >> > 2016-04-14 17:05:01.909949 7f14685e3700  0 -- :/2590574876 >>
>> >> > 10.10.200.3:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
>> >> > c=0x7f1458001f90).fault
>> >> > 2016-04-14 17:05:04.910416 7f14686e4700  0 -- :/2590574876 >>
>> >> > 10.10.200.4:6789/0 pipe(0x7f1458005120 sd=4 :0 s=1 pgs=0 cs=0 l=1
>> >> > c=0x7f14580063e0).fault
>> >> > 2016-04-14 17:05:07.910697 7f14685e3700  0 -- :/2590574876 >>
>> >> > 10.10.200.2:6789/0 pipe(0x7f1458000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
>> >> > c=0x7f1458002410).fault
>> >>
>> >> Assuming your monitors aren't running and have crashed, can you get
>> >> the backtrace out of their log files?
>> >>
>> >> We just discovered an issue with the new FSMap encoding in 10.1.2, if
>> >> you had already run an rc and had a filesystem. Patch is building and
>> >> being tested now.
>> >> -Greg
>> >>
>> >> >
>> >> > --
>> >> > Lomayani
>> >> >
>> >> >
>> >> > ___
>> >> > ceph-users mailing list
>> >> > ceph-users@lists.ceph.com
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
>> >
>> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deprecating ext4 support

2016-04-14 Thread Christian Balzer
On Thu, 14 Apr 2016 19:39:01 +0200 Michael Metz-Martini | SpeedPartner
GmbH wrote:

> Hi,
> 
> Am 14.04.2016 um 03:32 schrieb Christian Balzer:
[massive snip]

Thanks for that tree/du output, it matches what I expected.
You'd think XFS wouldn't be that intimidated by directories of that size.

> 
> 
> >> As you can see we have one data-object in pool "data" per file saved
> >> somewhere else. I'm not sure what's this related to, but maybe this
> >> is a must by cephfs.
> > That's rather confusing (even more so since I don't use CephFS), but it
> > feels wrong.
> > From what little I know about CephFS is that you can have only one FS
> > per cluster and the pools can be arbitrarily named (default data and
> > metadata).
> [...]
> > My guess is that you somehow managed to create things in a way that
> > puts references (not the actual data) of everything in "images" to
> > "data".
> You can tune the pool by e.g.
> cephfs /mnt/storage/docroot set_layout -p 4
> 
Yesterday morning I wouldn't have known what that meant, but since then I
did a lot of reading and created a CephFS on the test cluster a well,
including a second data pool and layouts.

> We thought this was a good idea so that we can change the replication
> size different for doc_root and raw-data if we like. Seems this was a
> bad idea for all objects.
> 
I'm not sure how you managed to get into that state or if it's a bug after
all, but I can't replicate it on the latest hammer.

Firstly I created a "default" FS, with the classic metadata and data
pools, mounted it and put some files into the root.
Then I added a second pool (filegoats) and set the layout for a
subdirectory to use it. After re-mounting the FS and copying data to that
subdir I get this, exactly what one would expect:
---

NAME  ID USED   %USED MAX AVAIL OBJECTS 
data  0  82043k 0 1181G 334 
metadata  1   2845k 0 1181G  20 
rbd   2161G  2.84  787G   41914 
filegoats 10 89034k 0 1181G 336 
---
So no duplicate objects (or at least their headers) for me.

If nobody else has anything to say about this, I'd consider filing a bug
report.

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deprecating ext4 support

2016-04-14 Thread Michael Metz-Martini | SpeedPartner GmbH
Hi,

Am 15.04.2016 um 03:07 schrieb Christian Balzer:
>> We thought this was a good idea so that we can change the replication
>> size different for doc_root and raw-data if we like. Seems this was a
>> bad idea for all objects.
> I'm not sure how you managed to get into that state or if it's a bug after
> all, but I can't replicate it on the latest hammer.
> Firstly I created a "default" FS, with the classic metadata and data
> pools, mounted it and put some files into the root.
> Then I added a second pool (filegoats) and set the layout for a
> subdirectory to use it. After re-mounting the FS and copying data to that
> subdir I get this, exactly what one would expect:
> ---
> 
> NAME  ID USED   %USED MAX AVAIL OBJECTS 
> data  0  82043k 0 1181G 334 
> metadata  1   2845k 0 1181G  20 
> rbd   2161G  2.84  787G   41914 
> filegoats 10 89034k 0 1181G 336 
> ---
> So no duplicate objects (or at least their headers) for me.
> 
> If nobody else has anything to say about this, I'd consider filing a bug
> report.
Im must admit that we're currently using 0.87 (Giant) and haven't
upgraded so far. Would be nice to know if upgrade would "clean" this
state or we should better start with a new cluster ... :(

-- 
Kind regards
 Michael Metz-Martini

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deprecating ext4 support

2016-04-14 Thread Christian Balzer

Hello,

On Fri, 15 Apr 2016 07:02:13 +0200 Michael Metz-Martini | SpeedPartner
GmbH wrote:

> Hi,
> 
> Am 15.04.2016 um 03:07 schrieb Christian Balzer:
> >> We thought this was a good idea so that we can change the replication
> >> size different for doc_root and raw-data if we like. Seems this was a
> >> bad idea for all objects.
> > I'm not sure how you managed to get into that state or if it's a bug
> > after all, but I can't replicate it on the latest hammer.
> > Firstly I created a "default" FS, with the classic metadata and data
> > pools, mounted it and put some files into the root.
> > Then I added a second pool (filegoats) and set the layout for a
> > subdirectory to use it. After re-mounting the FS and copying data to
> > that subdir I get this, exactly what one would expect:
> > ---
> > 
> > NAME  ID USED   %USED MAX AVAIL OBJECTS 
> > data  0  82043k 0 1181G 334 
> > metadata  1   2845k 0 1181G  20 
> > rbd   2161G  2.84  787G   41914 
> > filegoats 10 89034k 0 1181G 336 
> > ---
> > So no duplicate objects (or at least their headers) for me.
> > 
> > If nobody else has anything to say about this, I'd consider filing a
> > bug report.
> Im must admit that we're currently using 0.87 (Giant) and haven't
> upgraded so far. Would be nice to know if upgrade would "clean" this
> state or we should better start with a new cluster ... :(
> 
I can't really comment on that, but you will probably want to wait for
Jewel, being a LTS release and having plenty of CephFS enhancements
including a fsck.

Have you verified what those objects in your data pool are?
And that they are actually there on disk?
If so, I'd expect them all to be zero length. 

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deprecating ext4 support

2016-04-14 Thread Michael Metz-Martini | SpeedPartner GmbH
Hi,

Am 15.04.2016 um 07:43 schrieb Christian Balzer:
> On Fri, 15 Apr 2016 07:02:13 +0200 Michael Metz-Martini | SpeedPartner
> GmbH wrote:
>> Am 15.04.2016 um 03:07 schrieb Christian Balzer:
 We thought this was a good idea so that we can change the replication
 size different for doc_root and raw-data if we like. Seems this was a
 bad idea for all objects.
[...]
>>> If nobody else has anything to say about this, I'd consider filing a
>>> bug report.
>> Im must admit that we're currently using 0.87 (Giant) and haven't
>> upgraded so far. Would be nice to know if upgrade would "clean" this
>> state or we should better start with a new cluster ... :(
> I can't really comment on that, but you will probably want to wait for
> Jewel, being a LTS release and having plenty of CephFS enhancements
> including a fsck.
> Have you verified what those objects in your data pool are?
> And that they are actually there on disk?
> If so, I'd expect them all to be zero length. 
They exist and are all of size 0 - right.

/var/lib/ceph/osd/ceph-21/current/0.179_head/DIR_9/DIR_7/DIR_1/DIR_0/DIR_0/DIR_0$
ls -l
total 492
-rw-r--r--. 1 root root 0 Oct  6  2015
10003aed5cb.__head_AF000179__0
-rw-r--r--. 1 root root 0 Oct  6  2015
10003d09223.__head_6D000179__0
[..]

$ getfattr -d 10003aed5cb.__head_AF000179__0
# file: 10003aed5cb.__head_AF000179__0
user.ceph._=0sDQjpBAM1ABQxMDAwM2FlZDVjYi4wMDAwMDAwMP7/eQEArwAGAxwAAP8AAP//AHTfAwAA2hoAAAICFQIAAGScLgEADQAAY4zeU3D2EwgCAhUAAAB03wMAAAQ=
user.ceph._parent=0sBQTvy9WuAwABAAAGAgIbldSuAwABAAAHOF81LmpwZ0gCAgIW1NGuAwABAAACMTKhAwICNHwIgwMAAQAAIDBlZjY3MTk5OGMzNGE5MjViYzdjZjQxZGYyOTM5NmFlWgACAhYAAADce3oDAAEAAAIAAABmNscPAgIWJvV3AwABAAACMGWGeA0AAAICGgEABgAAAGltYWdlc28yNQAABgABAAA=
user.cephos.spill_out=0sMQA=

-- 
Kind regards
 Michael Metz-Martini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com