[ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-17 Thread Oliver Daudey
Hey all,

This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I
created in the tracker.  Thought I would pass it through the list as
well, to get an idea if anyone else is running into it.  It may only
show under higher loads.  More info about my setup is in the bug-report
above.  Here goes:


I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd
and mds. I'm using RBD on this cluster as storage for KVM, CephFS is
unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100
+MB/sec on simple linear writes to a file with `dd' inside a VM on this
cluster under regular load and the osds usually averaged 20-100%
CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for
the osds shot up to 100% to 400% in `top' (multi-core system) and the
speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users
complained that disk-access inside the VMs was significantly slower and
the backups of the RBD-store I was running, also got behind quickly.

After downgrading only the osds to v0.61.7 Cuttlefish and leaving the
rest at 0.67 Dumpling, speed and load returned to normal. I have
repeated this performance-hit upon upgrade on a similar test-cluster
under no additional load at all. Although CPU-usage for the osds wasn't
as dramatic during these tests because there was no base-load from other
VMs, I/O-performance dropped significantly after upgrading during these
tests as well, and returned to normal after downgrading the osds.

I'm not sure what to make of it. There are no visible errors in the logs
and everything runs and reports good health, it's just a lot slower,
with a lot more CPU-usage.



   Regards,

  Oliver

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-17 Thread Mark Nelson

On 08/17/2013 06:13 AM, Oliver Daudey wrote:

Hey all,

This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I
created in the tracker.  Thought I would pass it through the list as
well, to get an idea if anyone else is running into it.  It may only
show under higher loads.  More info about my setup is in the bug-report
above.  Here goes:


I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd
and mds. I'm using RBD on this cluster as storage for KVM, CephFS is
unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100
+MB/sec on simple linear writes to a file with `dd' inside a VM on this
cluster under regular load and the osds usually averaged 20-100%
CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for
the osds shot up to 100% to 400% in `top' (multi-core system) and the
speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users
complained that disk-access inside the VMs was significantly slower and
the backups of the RBD-store I was running, also got behind quickly.

After downgrading only the osds to v0.61.7 Cuttlefish and leaving the
rest at 0.67 Dumpling, speed and load returned to normal. I have
repeated this performance-hit upon upgrade on a similar test-cluster
under no additional load at all. Although CPU-usage for the osds wasn't
as dramatic during these tests because there was no base-load from other
VMs, I/O-performance dropped significantly after upgrading during these
tests as well, and returned to normal after downgrading the osds.

I'm not sure what to make of it. There are no visible errors in the logs
and everything runs and reports good health, it's just a lot slower,
with a lot more CPU-usage.


Hi Oliver,

If you have access to the perf command on this system, could you try 
running:


"sudo perf top"

And if that doesn't give you much,

"sudo perf record -g"

then:

"sudo perf report | less"

during the period of high CPU usage?  This will give you a call graph. 
There may be symbols missing, but it might help track down what the OSDs 
are doing.


Mark





Regards,

   Oliver

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw creating pool with empty name after upgrade from 0.61.7 cuttlefish to 0.67 dumpling

2013-08-17 Thread Øystein Lønning Nerhus
Hi,

This seems like a bug.

# ceph df
NAME   ID USED  %USED OBJECTS
.rgw.root  26 778   0 3
.rgw   27 1118  0 8
.rgw.gc28 0 0 32
   30 0 0 8
.users.uid 31 369   0 2
.users 32 200 2
.rgw.buckets.index 33 0 0 4
.rgw.buckets   34 3519  0 1

Notice the empty pool name.

# rados -p "" ls
notify.0
notify.1
notify.2
notify.3
notify.4
notify.5
notify.6
notify.7

Based on https://github.com/ceph/ceph/blob/dumpling/src/rgw/rgw_rados.cc it 
seems this pool is really supposed to be named ".rgw.control".

I am running Ubuntu 12.10 Quantal with ceph packages from sources.list:
"deb http://eu.ceph.com/debian-dumpling/ quantal main"

Note: i deleted all rgw pools before i upgraded to 0.67 so im not bringing rgw 
data from 0.61.7.

Other than the empty pool name, radosgw seems to be working just fine.  I have 
uploaded and retrieved objects without any problems.

Regards,

Øystein
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large memory leak on scrubbing

2013-08-17 Thread Sage Weil
Hi Dominic,

There is a bug fixed a couple of months back that fixes excessive memory 
consumption during scrub.  You can upgrade to the latest 'bobtail' branch.  
See

 http://ceph.com/docs/master/install/debian/#development-testing-packages

Installing that package should clear this up.

sage


On Fri, 16 Aug 2013, Mostowiec Dominik wrote:

> Hi,
> We noticed some issues on CEPH/S3 cluster, I think it related with scrubbing: 
> large memory leaks.
> 
> Logs 09.xx: 
> https://www.dropbox.com/s/4z1fzg239j43igs/ceph-osd.4.log_09xx.tar.gz
> >From 09.30 to 09.44 (14 minutes) osd.4 proces grows up to 28G. 
> 
> I think this is something curious:
> 2013-08-16 09:43:48.801331 7f6570d2e700  0 log [WRN] : slow request 32.794125 
> seconds old, received at 2013-08-16 09:43:16.007104: osd_sub_op(unknown.0.0:0 
> 16.113d 0//0//-1 [scrub-reserve] v 0'0 snapset=0=[]:[] snapc=0=[]) v7 
> currently no flag points reached
> 
> We have large rgw index and a lot of large files than on this cluster.
> ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
> Setup: 
> - 12 servers x 12 OSD 
> - 3 mons
> Default scrubbing settings.
> Journal and filestore settings:
> journal aio = true
> filestore flush min = 0
> filestore flusher = false
> filestore fiemap = false
> filestore op threads = 4
> filestore queue max ops = 4096
> filestore queue max bytes = 10485760
> filestore queue committing max bytes = 10485760
> journal max write bytes = 10485760
> journal queue max bytes = 10485760
> ms dispatch throttle bytes = 10485760
> objecter infilght op bytes = 10485760
> 
> Is this a known bug in this version?
> (Do you know some workaround to fix this?)
> 
> ---
> Regards
> Dominik
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.67.1 Dumpling released

2013-08-17 Thread Sage Weil
This is a bug fix release for Dumpling that resolves a problem with the 
librbd python bindings (triggered by OpenStack) and a hang in librbd when 
caching is disabled.  OpenStack users are encouraged to upgrade.  No other 
serious bugs have come up since v0.67 came out earlier this week.

Notable changes:

 * librados, librbd: fix constructor for python bindings with certain 
   usages (in particular, that used by OpenStack)
 * librados, librbd: fix aio_flush wakeup when cache is disabled
 * librados: fix locking for aio completion refcounting
 * fixes 'ceph --admin-daemon ...' command error code on error
 * fixes 'ceph daemon ... config set ...' command for boolean config 
   options.

For more detailed information, see the complete release notes and 
changelog:

 * http://ceph.com/docs/master/release-notes/#v0-67-1-dumpling
 * http://ceph.com/docs/master/_downloads/v0.67.1.txt

You can download v0.67.1 Dumpling from the usual locations:

 * Git at git://github.com/ceph/ceph.git
 * Tarball at http://ceph.com/download/ceph-0.67.1.tar.gz
 * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
 * For RPMs, see http://ceph.com/docs/master/install/rpm


 



 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mds lock

2013-08-17 Thread Sage Weil
[moving to ceph-devel]

On Fri, 16 Aug 2013, Jun Jun8 Liu wrote:
> 
> Hi all,  
> 
>  I am doing some research about mds.
> 
> 
> 
>  there are so many types lock and states .But I don't found some
> document to describe.
> 
>  
> 
>  Is there anybody tell me what is "loner" and "lock_mix"

'loner' tracks when a single client has some exclusive capabilities on 
the file.  For example, when it is the only client with the file open, it 
can buffer reads and perform setattr operations locally.

lock_mix (lock->mix) is a transition state from lock (primary mds copy 
exclusive lock, sort of) and mix (shared write lock between multiple 
mds's).  The MDS locking is quite complex, but unfortunately there is not 
much in the way of documentation for the code. :(

sage___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] performance questions

2013-08-17 Thread Jeff Moskow
Hi,

When we rebuilt our ceph cluster, we opted to make our rbd storage 
replication level 3 rather than the previously
configured replication level 2.

Things are MUCH slower (5 nodes, 13 osd's) than before even though most 
of our I/O is read.   Is this to be expected?
What are the recommended ways of seeing who/what is consuming the largest 
amount of disk/network bandwidth?

Thanks!
Jeff

-- 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Mikaël Cluseau

Hi,

troubles with ceph_init (after a test reboot)

# ceph_init restart osd
# ceph_init restart osd.0
/usr/lib/ceph/ceph_init.sh: osd.0 not found (/etc/ceph/ceph.conf defines 
mon.xxx , /var/lib/ceph defines mon.xxx)

1 # ceph-disk list
[...]
/dev/sdc :
 /dev/sdc1 ceph data, prepared, cluster ceph, osd.0
/dev/sdd :
 /dev/sdd1 ceph data, prepared, cluster ceph, osd.1
/dev/sde :
 /dev/sde1 ceph data, prepared, cluster ceph, osd.2
/dev/sdf :
 /dev/sdf1 ceph data, prepared, cluster ceph, osd.3
/dev/sdg :
 /dev/sdg1 ceph data, prepared, cluster ceph, osd.4
/dev/sdh :
 /dev/sdh1 ceph data, prepared, cluster ceph, osd.5

I see at the end of ceph_init taht there's a ceph-disk activate-all, but 
it does nothing I can see:


# ceph_init start osd.0
/usr/lib/ceph/ceph_init.sh: osd.0 not found (/etc/ceph/ceph.conf defines 
mon.xxx , /var/lib/ceph defines mon.xxx)

1 # ceph-disk activate-all
# mount |grep ceph
/dev/mapper/ssd1-ceph--mon on /var/lib/ceph/mon/ceph-xxx type ext4 
(rw,noatime,nodiratime)


Anything I missed ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] performance questions

2013-08-17 Thread Sage Weil
On Sat, 17 Aug 2013, Jeff Moskow wrote:
> Hi,
> 
>   When we rebuilt our ceph cluster, we opted to make our rbd storage 
> replication level 3 rather than the previously configured replication 
> level 2.
> 
>   Things are MUCH slower (5 nodes, 13 osd's) than before even though 
> most of our I/O is read.  Is this to be expected? What are the 
> recommended ways of seeing who/what is consuming the largest amount of 
> disk/network bandwidth?

It really doesn't sound like the replica count is the source of the 
performance difference.  What else has changed?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Sage Weil
On Sun, 18 Aug 2013, Mika?l Cluseau wrote:
> Hi,
> 
> troubles with ceph_init (after a test reboot)
> 
> # ceph_init restart osd
> # ceph_init restart osd.0
> /usr/lib/ceph/ceph_init.sh: osd.0 not found (/etc/ceph/ceph.conf defines
> mon.xxx , /var/lib/ceph defines mon.xxx)
> 1 # ceph-disk list
> [...]
> /dev/sdc :
>  /dev/sdc1 ceph data, prepared, cluster ceph, osd.0
> /dev/sdd :
>  /dev/sdd1 ceph data, prepared, cluster ceph, osd.1
> /dev/sde :
>  /dev/sde1 ceph data, prepared, cluster ceph, osd.2
> /dev/sdf :
>  /dev/sdf1 ceph data, prepared, cluster ceph, osd.3
> /dev/sdg :
>  /dev/sdg1 ceph data, prepared, cluster ceph, osd.4
> /dev/sdh :
>  /dev/sdh1 ceph data, prepared, cluster ceph, osd.5
> 
> I see at the end of ceph_init taht there's a ceph-disk activate-all, but it
> does nothing I can see:
> 
> # ceph_init start osd.0
> /usr/lib/ceph/ceph_init.sh: osd.0 not found (/etc/ceph/ceph.conf defines
> mon.xxx , /var/lib/ceph defines mon.xxx)
> 1 # ceph-disk activate-all
> # mount |grep ceph
> /dev/mapper/ssd1-ceph--mon on /var/lib/ceph/mon/ceph-xxx type ext4
> (rw,noatime,nodiratime)

The ceph-disk activate-all command is looking for partitions that are 
marked with the ceph type uuid.  Maybe the jouranls are missing?  What 
does 

 ceph-disk -v activate /dev/sdc1

say?  Or

 ceph-disk -v activate-all

Where does the 'journal' symlink in the ceph data partitions point to?

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy mon create / gatherkeys problems

2013-08-17 Thread Sage Weil
Hi everyone,

We're trying to get to the bottom of the problems people have been having 
with ceph-deploy mon create .. and ceph-deploy gatherkeys.  There seem to 
be a series of common pitfalls that are causing these problems.  So far 
we've been chasing them in emails on this list and in various bugs in the 
tracker, but it is hard to keep track!

 * If you are still seeing any problems here, please reply to this thread!

 * If you previously had a problem here and then realized you were doing 
   something not quite right and got it working, please reply and share 
   what it was so we can make sure others avoid the problem!

We have a couple of feature tickets open to streamline this process to be 
simpler (1 or 2 steps instead of 4 or 5), but before we rush off and 
implement it I want to make sure we fully understand where all of the 
problems lay.

Thanks!
sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Mikaël Cluseau

On 08/18/2013 08:35 AM, Sage Weil wrote:

The ceph-disk activate-all command is looking for partitions that are
marked with the ceph type uuid.  Maybe the jouranls are missing?  What
does

  ceph-disk -v activate /dev/sdc1

say?  Or

  ceph-disk -v activate-all

Where does the 'journal' symlink in the ceph data partitions point to?


ummm ceph-disk activate seems to work:

# ceph-disk -v activate-all
DEBUG:ceph-disk-python2.7:Scanning /dev/disk/by-parttypeuuid
# ceph-disk -v activate /dev/sdc1
DEBUG:ceph-disk-python2.7:Mounting /dev/sdc1 on 
/var/lib/ceph/tmp/mnt.zGfWOj with options noatime
DEBUG:ceph-disk-python2.7:Cluster uuid is 
d2f0857a-8bea-4b7e-af0c-ee164bc7ecf7

DEBUG:ceph-disk-python2.7:Cluster name is ceph
DEBUG:ceph-disk-python2.7:OSD uuid is ee7dcd25-d65c-47ba-85cb-3c64566ba980
DEBUG:ceph-disk-python2.7:OSD id is 0
DEBUG:ceph-disk-python2.7:Marking with init system sysvinit
DEBUG:ceph-disk-python2.7:ceph osd.0 data dir is ready at 
/var/lib/ceph/tmp/mnt.zGfWOj

DEBUG:ceph-disk-python2.7:Moving mount to final location...
DEBUG:ceph-disk-python2.7:Starting ceph osd.0...

The journal symlink points to journal partitions on my SSDs :

# for d in d e f g h ; do ceph-disk activate /dev/sd${d}1; done
# ls -l /var/lib/ceph/osd/ceph-?/journal
lrwxrwxrwx 1 root root 9 Aug 17 09:05 /var/lib/ceph/osd/ceph-0/journal 
-> /dev/sda5
lrwxrwxrwx 1 root root 9 Aug 17 09:05 /var/lib/ceph/osd/ceph-1/journal 
-> /dev/sda6
lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-2/journal 
-> /dev/sda7
lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-3/journal 
-> /dev/sdb5
lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-4/journal 
-> /dev/sdb6
lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-5/journal 
-> /dev/sdb7


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Mikaël Cluseau

On 08/18/2013 08:39 AM, Mikaël Cluseau wrote:


# ceph-disk -v activate-all
DEBUG:ceph-disk-python2.7:Scanning /dev/disk/by-parttypeuuid 


Maybe /dev/disk/by-parttypeuuid is specific?

# ls -l /dev/disk
total 0
drwxr-xr-x 2 root root 1220 Aug 18 07:01 by-id
drwxr-xr-x 2 root root   60 Aug 18 07:01 by-partlabel
drwxr-xr-x 2 root root  160 Aug 18 07:01 by-partuuid
drwxr-xr-x 2 root root  600 Aug 18 07:01 by-path
drwxr-xr-x 2 root root  200 Aug 18 07:01 by-uuid

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-17 Thread Oliver Daudey
Hey Mark,

On za, 2013-08-17 at 08:16 -0500, Mark Nelson wrote:
> On 08/17/2013 06:13 AM, Oliver Daudey wrote:
> > Hey all,
> >
> > This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I
> > created in the tracker.  Thought I would pass it through the list as
> > well, to get an idea if anyone else is running into it.  It may only
> > show under higher loads.  More info about my setup is in the bug-report
> > above.  Here goes:
> >
> >
> > I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd
> > and mds. I'm using RBD on this cluster as storage for KVM, CephFS is
> > unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100
> > +MB/sec on simple linear writes to a file with `dd' inside a VM on this
> > cluster under regular load and the osds usually averaged 20-100%
> > CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for
> > the osds shot up to 100% to 400% in `top' (multi-core system) and the
> > speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users
> > complained that disk-access inside the VMs was significantly slower and
> > the backups of the RBD-store I was running, also got behind quickly.
> >
> > After downgrading only the osds to v0.61.7 Cuttlefish and leaving the
> > rest at 0.67 Dumpling, speed and load returned to normal. I have
> > repeated this performance-hit upon upgrade on a similar test-cluster
> > under no additional load at all. Although CPU-usage for the osds wasn't
> > as dramatic during these tests because there was no base-load from other
> > VMs, I/O-performance dropped significantly after upgrading during these
> > tests as well, and returned to normal after downgrading the osds.
> >
> > I'm not sure what to make of it. There are no visible errors in the logs
> > and everything runs and reports good health, it's just a lot slower,
> > with a lot more CPU-usage.
> 
> Hi Oliver,
> 
> If you have access to the perf command on this system, could you try 
> running:
> 
> "sudo perf top"
> 
> And if that doesn't give you much,
> 
> "sudo perf record -g"
> 
> then:
> 
> "sudo perf report | less"
> 
> during the period of high CPU usage?  This will give you a call graph. 
> There may be symbols missing, but it might help track down what the OSDs 
> are doing.

Thanks for your help!  I did a couple of runs on my test-cluster,
loading it with writes from 3 VMs concurrently and measuring the results
at the first node with all 0.67 Dumpling-components and with the osds
replaced by 0.61.7 Cuttlefish.  I let `perf top' run and settle for a
while, then copied anything that showed in red and green into this post.
Here are the results (sorry for the word-wraps):

First, with 0.61.7 osds:

 19.91%  [kernel][k] intel_idle
 10.18%  [kernel][k] _raw_spin_lock_irqsave
  6.79%  ceph-osd[.] ceph_crc32c_le
  4.93%  [kernel][k]
default_send_IPI_mask_sequence_phys
  2.71%  [kernel][k] copy_user_generic_string
  1.42%  libc-2.11.3.so  [.] memcpy
  1.23%  [kernel][k] find_busiest_group
  1.13%  librados.so.2.0.0   [.] ceph_crc32c_le_intel
  1.11%  [kernel][k] _raw_spin_lock
  0.99%  kvm [.] 0x1931f8
  0.92%  [igb]   [k] igb_poll
  0.87%  [kernel][k] native_write_cr0
  0.80%  [kernel][k] csum_partial
  0.78%  [kernel][k] __do_softirq
  0.63%  [kernel][k] hpet_legacy_next_event
  0.53%  [ip_tables] [k] ipt_do_table
  0.50%  libc-2.11.3.so  [.] 0x74433

Second test, with 0.67 osds:

 18.32%  [kernel]  [k] intel_idle
  7.58%  [kernel]  [k] _raw_spin_lock_irqsave
  7.04%  ceph-osd  [.] PGLog::undirty()
  4.39%  ceph-osd  [.] ceph_crc32c_le_intel
  3.92%  [kernel]  [k]
default_send_IPI_mask_sequence_phys
  2.25%  [kernel]  [k] copy_user_generic_string
  1.76%  libc-2.11.3.so[.] memcpy
  1.56%  librados.so.2.0.0 [.] ceph_crc32c_le_intel
  1.40%  libc-2.11.3.so[.] vfprintf
  1.12%  libc-2.11.3.so[.] 0x7217b
  1.05%  [kernel]  [k] _raw_spin_lock
  1.01%  [kernel]  [k] find_busiest_group
  0.83%  kvm   [.] 0x193ab8
  0.80%  [kernel]  [k] native_write_cr0
  0.76%  [kernel]  [k] __do_softirq
  0.73%  libc-2.11.3.so[.] _IO_default_xsputn
  0.70%  [kernel]  [k] csum_partial
  0.68%  [igb] [k] igb_poll
  0.58%  [kernel]  [k] hpet_legacy_next_event
  0.54%  [kernel]  [k] __schedule


What jumps right out, is the "PGLog::undirty()", which doesn't show up
with 0.61.7

Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Mikaël Cluseau

On 08/18/2013 08:44 AM, Mikaël Cluseau wrote:

On 08/18/2013 08:39 AM, Mikaël Cluseau wrote:


# ceph-disk -v activate-all
DEBUG:ceph-disk-python2.7:Scanning /dev/disk/by-parttypeuuid 


Maybe /dev/disk/by-parttypeuuid is specific?

# ls -l /dev/disk
total 0
drwxr-xr-x 2 root root 1220 Aug 18 07:01 by-id
drwxr-xr-x 2 root root   60 Aug 18 07:01 by-partlabel
drwxr-xr-x 2 root root  160 Aug 18 07:01 by-partuuid
drwxr-xr-x 2 root root  600 Aug 18 07:01 by-path
drwxr-xr-x 2 root root  200 Aug 18 07:01 by-uuid


ok, it seems the udev rules are missing from my packaging, I'll have 
take a better look at your debian packages ;)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Sage Weil
On Sun, 18 Aug 2013, Mika?l Cluseau wrote:
> On 08/18/2013 08:44 AM, Mika?l Cluseau wrote:
> > On 08/18/2013 08:39 AM, Mika?l Cluseau wrote:
> > > 
> > > # ceph-disk -v activate-all
> > > DEBUG:ceph-disk-python2.7:Scanning /dev/disk/by-parttypeuuid 
> > 
> > Maybe /dev/disk/by-parttypeuuid is specific?
> > 
> > # ls -l /dev/disk
> > total 0
> > drwxr-xr-x 2 root root 1220 Aug 18 07:01 by-id
> > drwxr-xr-x 2 root root   60 Aug 18 07:01 by-partlabel
> > drwxr-xr-x 2 root root  160 Aug 18 07:01 by-partuuid
> > drwxr-xr-x 2 root root  600 Aug 18 07:01 by-path
> > drwxr-xr-x 2 root root  200 Aug 18 07:01 by-uuid
> 
> ok, it seems the udev rules are missing from my packaging, I'll have take a
> better look at your debian packages ;)

Yep!  What distro is this?

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Sage Weil
On Sun, 18 Aug 2013, Mika?l Cluseau wrote:
> On 08/18/2013 08:35 AM, Sage Weil wrote:
> > The ceph-disk activate-all command is looking for partitions that are
> > marked with the ceph type uuid.  Maybe the jouranls are missing?  What
> > does
> > 
> >   ceph-disk -v activate /dev/sdc1
> > 
> > say?  Or
> > 
> >   ceph-disk -v activate-all
> > 
> > Where does the 'journal' symlink in the ceph data partitions point to?
> 
> ummm ceph-disk activate seems to work:
> 
> # ceph-disk -v activate-all
> DEBUG:ceph-disk-python2.7:Scanning /dev/disk/by-parttypeuuid

find /dev/disk/by-parttypeuuid -ls

?

> # ceph-disk -v activate /dev/sdc1
> DEBUG:ceph-disk-python2.7:Mounting /dev/sdc1 on /var/lib/ceph/tmp/mnt.zGfWOj
> with options noatime
> DEBUG:ceph-disk-python2.7:Cluster uuid is d2f0857a-8bea-4b7e-af0c-ee164bc7ecf7
> DEBUG:ceph-disk-python2.7:Cluster name is ceph
> DEBUG:ceph-disk-python2.7:OSD uuid is ee7dcd25-d65c-47ba-85cb-3c64566ba980
> DEBUG:ceph-disk-python2.7:OSD id is 0
> DEBUG:ceph-disk-python2.7:Marking with init system sysvinit
> DEBUG:ceph-disk-python2.7:ceph osd.0 data dir is ready at
> /var/lib/ceph/tmp/mnt.zGfWOj
> DEBUG:ceph-disk-python2.7:Moving mount to final location...
> DEBUG:ceph-disk-python2.7:Starting ceph osd.0...
> 
> The journal symlink points to journal partitions on my SSDs :
> 
> # for d in d e f g h ; do ceph-disk activate /dev/sd${d}1; done
> # ls -l /var/lib/ceph/osd/ceph-?/journal
> lrwxrwxrwx 1 root root 9 Aug 17 09:05 /var/lib/ceph/osd/ceph-0/journal ->
> /dev/sda5
> lrwxrwxrwx 1 root root 9 Aug 17 09:05 /var/lib/ceph/osd/ceph-1/journal ->
> /dev/sda6
> lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-2/journal ->
> /dev/sda7
> lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-3/journal ->
> /dev/sdb5
> lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-4/journal ->
> /dev/sdb6
> lrwxrwxrwx 1 root root 9 Aug 17 09:06 /var/lib/ceph/osd/ceph-5/journal ->
> /dev/sdb7
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Mikaël Cluseau

On 08/18/2013 08:53 AM, Sage Weil wrote:

Yep!  What distro is this?


I'm working on Gentoo packaging to get a full stack of ceph and openstack.

Overlay here:
git clone https://git.isi.nc/cloud/cloud-overlay.git

And a small fork of ceph-deploy to add gentoo support:
git clone https://git.isi.nc/cloud/ceph-deploy.git

I'll put these on github eventually but I'd like to get at least my own 
cluster working before :)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph 0.67, 0.67.1: ceph_init bug

2013-08-17 Thread Mikaël Cluseau

On 08/18/2013 08:53 AM, Sage Weil wrote:
Yep! 


It's working without any change in the udev rules files ;)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw creating pool with empty name after upgrade from 0.61.7 cuttlefish to 0.67 dumpling

2013-08-17 Thread Yehuda Sadeh
On Sat, Aug 17, 2013 at 6:25 AM, Øystein Lønning Nerhus  wrote:
> Hi,
>
> This seems like a bug.
>
> # ceph df
> NAME   ID USED  %USED OBJECTS
> .rgw.root  26 778   0 3
> .rgw   27 1118  0 8
> .rgw.gc28 0 0 32
>30 0 0 8
> .users.uid 31 369   0 2
> .users 32 200 2
> .rgw.buckets.index 33 0 0 4
> .rgw.buckets   34 3519  0 1
>
> Notice the empty pool name.
>
> # rados -p "" ls
> notify.0
> notify.1
> notify.2
> notify.3
> notify.4
> notify.5
> notify.6
> notify.7
>
> Based on https://github.com/ceph/ceph/blob/dumpling/src/rgw/rgw_rados.cc it 
> seems this pool is really supposed to be named ".rgw.control".
>
> I am running Ubuntu 12.10 Quantal with ceph packages from sources.list:
> "deb http://eu.ceph.com/debian-dumpling/ quantal main"
>
> Note: i deleted all rgw pools before i upgraded to 0.67 so im not bringing 
> rgw data from 0.61.7.
>
> Other than the empty pool name, radosgw seems to be working just fine.  I 
> have uploaded and retrieved objects without any problems.

Sounds like a bug, I opened issue #6046.

Thanks,
Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com