Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-17 Thread Alexandre DERUMIER
>>The results are with journal and data configured in the same SSD ?
yes

>>Also, how are you configuring your journal device, is it a block device ?
yes.

~ceph-deploy osd create node:sdb

# parted /dev/sdb
GNU Parted 2.3
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: ATA Crucial_CT1024M5 (scsi)
Disk /dev/sdb: 1024GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt

Number  Start   End SizeFile system  Name  Flags
 2  1049kB  5369MB  5368MB   ceph journal
 1  5370MB  1024GB  1019GB  xfs  ceph data

>>If journal and data are not in the same device result may change.
yes, sure of course

>>BTW, there are SSDs like SanDisk optimas drives that is using capacitor 
>>backed DRAM and thus always ignore these CMD_FLUSH command since drive 
>>guarantees that once data reaches drive, it will power fail safe.So, >>you 
>>don't need kernel patch. 

Oh, good to known! note that kernel patch is really usefull for theses cheap 
consumer crucial m550, but I don't see to much difference for intel s3500.


>>Optimus random write performance is ~15K (4K io_size). Presently, I don't 
>>have any write performance data (on ceph) with that, I will run some test 
>>with that soon and share.

Impresive results! I don't have choose yet my ssds model for my production 
cluster (target 2015),I'll have a look for this optimus drives


- Mail original - 

De: "Somnath Roy"  
À: "Mark Kirkwood" , "Alexandre DERUMIER" 
, "Sebastien Han"  
Cc: ceph-users@lists.ceph.com 
Envoyé: Mercredi 17 Septembre 2014 03:22:05 
Objet: RE: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

Hi Mark/Alexandre, 
The results are with journal and data configured in the same SSD ? 
Also, how are you configuring your journal device, is it a block device ? 
If journal and data are not in the same device result may change. 

BTW, there are SSDs like SanDisk optimas drives that is using capacitor backed 
DRAM and thus always ignore these CMD_FLUSH command since drive guarantees that 
once data reaches drive, it will power fail safe.So, you don't need kernel 
patch. Optimus random write performance is ~15K (4K io_size). Presently, I 
don't have any write performance data (on ceph) with that, I will run some test 
with that soon and share. 

Thanks & Regards 
Somnath 

-Original Message- 
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Kirkwood 
Sent: Tuesday, September 16, 2014 3:36 PM 
To: Alexandre DERUMIER; Sebastien Han 
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS 

On 17/09/14 08:39, Alexandre DERUMIER wrote: 
> Hi, 
> 
>>> I’m just surprised that you’re only getting 5299 with 0.85 since 
>>> I’ve been able to get 6,4K, well I was using the 200GB model 
> 
> Your model is 
> DC S3700 
> 
> mine is DC s3500 
> 
> with lower writes, so that could explain the difference. 
> 

Interesting - I was getting 8K IOPS with 0.85 on a 128G M550 - this suggests 
that the bottleneck is not only sync write performance (as your 
S3500 do much better there), but write performance generally (where the 
M550 is faster). 

Cheers 

Mark 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

 

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies). 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multiple cephfs filesystems per cluster

2014-09-17 Thread David Barker
Hi Cephalopods,

Browsing the list archives, I know this has come up before, but I thought
I'd check in for an update.

I'm in an environment where it would be useful to run a file system per
department in a single cluster (or at a pinch enforcing some client / fs
tree security). Has there been much progress recently?

Many thanks,

Dave
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple cephfs filesystems per cluster

2014-09-17 Thread Wido den Hollander
On 09/17/2014 12:11 PM, David Barker wrote:
> Hi Cephalopods,
> 
> Browsing the list archives, I know this has come up before, but I thought
> I'd check in for an update.
> 
> I'm in an environment where it would be useful to run a file system per
> department in a single cluster (or at a pinch enforcing some client / fs
> tree security). Has there been much progress recently?
> 

No, that's not possible. It's a single hierarchy. However, you can
create subdirectories per department and do a subtree mount.

ACLs and tree security isn't implemented yet however.

> Many thanks,
> 
> Dave
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] TypeError: unhashable type: 'list'

2014-09-17 Thread Santhosh Fernandes
Hi all,

Anyone have successful in replicating data across two zones of Federated
gateway configuration. I am getting "TypeError: unhashable type: 'list'"
error. I am not seeing data part getting replicated.

verbose log :

application/json; charset=UTF-8
Wed, 17 Sep 2014 09:59:22 GMT
/admin/log
2014-09-17T15:29:22.219 15995:DEBUG:boto:Signature:
AWS V280N25RDUA6EQ55T28V:woT+s+oqufKWoHyMIxdK/++Hz7U=
2014-09-17T15:29:22.220 15995:DEBUG:boto:url = '
http://cephog1.santhosh.com:81/admin/log?lock'
params={'locker-id': 'cephOG1:15984', 'length': 60, 'zone-id': u'in-west',
'type': 'metadata', 'id': 52}
headers={'Date': 'Wed, 17 Sep 2014 09:59:22 GMT', 'Content-Length': '0',
'Content-Type': 'application/json; charset=UTF-8', 'Authorization': 'AWS
V280N25RDUA6EQ55T28V:woT+s+oqufKWoHyMIxdK/++Hz7U=', 'User-Agent':
'Boto/2.20.1 Python/2.7.6 Linux/3.13.0-24-generic'}
data=None
2014-09-17T15:29:22.222 15995:INFO:urllib3.connectionpool:Starting new HTTP
connection (1): cephog1.santhosh.com
2014-09-17T15:29:22.223 15995:DEBUG:urllib3.connectionpool:Setting read
timeout to None
2014-09-17T15:29:22.257 15995:DEBUG:urllib3.connectionpool:"POST
/admin/log?lock&locker-id=cephOG1%3A15984&length=60&zone-id=in-west&type=metadata&id=52
HTTP/1.1" 200 None
2014-09-17T15:29:22.258 15995:ERROR:radosgw_agent.worker:syncing entries
for shard 52 failed
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/radosgw_agent/worker.py", line
151, in run
new_retries = self.sync_entries(log_entries, retries)
  File "/usr/lib/python2.7/dist-packages/radosgw_agent/worker.py", line
437, in sync_entries
for section, name in mentioned.union(split_retries):
TypeError: unhashable type: 'list'
2014-09-17T15:29:22.259 15995:DEBUG:radosgw_agent.lock:release and clear
lock
2014-09-17T15:29:22.259 15995:DEBUG:boto:path=/admin/log?unlock
2014-09-17T15:29:22.260 15995:DEBUG:boto:auth_path=/admin/log?unlock
2014-09-17T15:29:22.260 15995:DEBUG:boto:StringToSign:
POST


Any idea/help on this error to resolve?

Regards,
Santhosh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple cephfs filesystems per cluster

2014-09-17 Thread John Spray
Hi David,

We haven't written any code for the multiple filesystems feature so
far, but the new "fs new"/"fs rm"/"fs ls" management commands were
designed with this in mind -- currently only supporting one
filesystem, but to allow slotting in the multiple filesystems feature
without too much disruption.  There is some design work to be done as
well, such as how the system should handle standby MDSs (assigning to
a particular filesystem, floating between filesystems, etc).

Cheers,
John

On Wed, Sep 17, 2014 at 11:11 AM, David Barker  wrote:
> Hi Cephalopods,
>
> Browsing the list archives, I know this has come up before, but I thought
> I'd check in for an update.
>
> I'm in an environment where it would be useful to run a file system per
> department in a single cluster (or at a pinch enforcing some client / fs
> tree security). Has there been much progress recently?
>
> Many thanks,
>
> Dave
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Dumpling cluster can't resolve peering failures, ceph pg query blocks, auth failures in logs

2014-09-17 Thread Florian Haas
Thanks, I did check on that too as I'd seen this before and this was
"the usual drill", but alas, no, that wasn't the problem. This cluster
is having other issues too, though, so I probably need to look into
those first.

Cheers,
Florian

On Mon, Sep 15, 2014 at 7:29 PM, Gregory Farnum  wrote:
> Not sure, but have you checked the clocks on their nodes? Extreme
> clock drift often results in strange cephx errors.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Sun, Sep 14, 2014 at 11:03 PM, Florian Haas  wrote:
>> Hi everyone,
>>
>> [Keeping this on the -users list for now. Let me know if I should
>> cross-post to -devel.]
>>
>> I've been asked to help out on a Dumpling cluster (a system
>> "bequeathed" by one admin to the next, currently on 0.67.10, was
>> originally installed with 0.67.5 and subsequently updated a few
>> times), and I'm seeing a rather odd issue there. The cluster is
>> relatively small, 3 MONs, 4 OSD nodes; each OSD node hosts a rather
>> non-ideal 12 OSDs but its performance issues aren't really the point
>> here.
>>
>> "ceph health detail" shows a bunch of PGs peering, but the usual
>> troubleshooting steps don't really seem to work.
>>
>> For some PGs, "ceph pg  query" just blocks, doesn't return
>> anything. Adding --debug_ms=10 shows that it's simply not getting a
>> response back from one of the OSDs it's trying to talk to, as if
>> packets dropped on the floor or were filtered out. However, opening a
>> simple TCP connection to the OSD's IP and port works perfectly fine
>> (netcat returns a Ceph signature).
>>
>> (Note, though, that because of a daemon flapping issue they at some
>> point set both "noout" and "nodown", so the cluster may not be
>> behaving as normally expected when OSDs fail to respond in time.)
>>
>> Then there are some PGs where "ceph pg  query" is a little more
>> verbose, though not exactly more successful:
>>
>> From ceph health detail:
>>
>> pg 6.c10 is stuck inactive for 1477.781394, current state peering,
>> last acting [85,16]
>>
>> ceph pg 6.b1 query:
>>
>> 2014-09-15 01:06:48.200418 7f29a6efc700  0 cephx: verify_reply
>> couldn't decrypt with error: error decoding block for decryption
>> 2014-09-15 01:06:48.200428 7f29a6efc700  0 -- 10.47.17.1:0/1020420 >>
>> 10.47.16.33:6818/15630 pipe(0x2c00b00 sd=4 :43263 s=1 pgs=0 cs=0 l=1
>> c=0x2c00d90).failed verifying authorize reply
>> 2014-09-15 01:06:48.200465 7f29a6efc700  0 -- 10.47.17.1:0/1020420 >>
>> 10.47.16.33:6818/15630 pipe(0x2c00b00 sd=4 :43263 s=1 pgs=0 cs=0 l=1
>> c=0x2c00d90).fault
>> 2014-09-15 01:06:48.201000 7f29a6efc700  0 cephx: verify_reply
>> couldn't decrypt with error: error decoding block for decryption
>> 2014-09-15 01:06:48.201008 7f29a6efc700  0 -- 10.47.17.1:0/1020420 >>
>> 10.47.16.33:6818/15630 pipe(0x2c00b00 sd=4 :43264 s=1 pgs=0 cs=0 l=1
>> c=0x2c00d90).failed verifying authorize reply
>>
>> Oops. Now the admins swear they didn't touch the keys, but they are
>> also (understandably) reluctant to just kill and redeploy all those
>> OSDs, as these issues are basically scattered over a bunch of PGs
>> touching many OSDs. How would they pinpoint this to be sure that
>> they're not being bitten by a bug or misconfiguration?
>>
>> Not sure if people have seen this before — if so, I'd be grateful for
>> some input. Loïc, Sébastien perhaps? Or João, Greg, Sage?
>>
>> Thanks in advance for any insight people might be able to share. :)
>>
>> Cheers,
>> Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple cephfs filesystems per cluster

2014-09-17 Thread David Barker
Thanks John - It did look like it was heading in that direction!

I did wonder if a 'fs map' & 'fs unmap' would be useful too; filesystem
backups,  migrations between clusters & async DR could be facilitated by
moving underlying pool objects around between clusters.

Dave

On Wed, Sep 17, 2014 at 11:22 AM, John Spray  wrote:

> Hi David,
>
> We haven't written any code for the multiple filesystems feature so
> far, but the new "fs new"/"fs rm"/"fs ls" management commands were
> designed with this in mind -- currently only supporting one
> filesystem, but to allow slotting in the multiple filesystems feature
> without too much disruption.  There is some design work to be done as
> well, such as how the system should handle standby MDSs (assigning to
> a particular filesystem, floating between filesystems, etc).
>
> Cheers,
> John
>
> On Wed, Sep 17, 2014 at 11:11 AM, David Barker 
> wrote:
> > Hi Cephalopods,
> >
> > Browsing the list archives, I know this has come up before, but I thought
> > I'd check in for an update.
> >
> > I'm in an environment where it would be useful to run a file system per
> > department in a single cluster (or at a pinch enforcing some client / fs
> > tree security). Has there been much progress recently?
> >
> > Many thanks,
> >
> > Dave
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] monitor quorum

2014-09-17 Thread James Eckersall
Hi,

I have a ceph cluster running 0.80.1 on Ubuntu 14.04.  I have 3 monitors
and 4 OSD nodes currently.

Everything has been running great up until today where I've got an issue
with the monitors.
I moved mon03 to a different switchport so it would have temporarily lost
connectivity.
Since then, the cluster is reporting that that mon is down, although it's
definitely up.
I've tried restarting the mon services on all three mons, but that hasn't
made a difference.
I definitely, 100% do not have any clock skew on any of the mons.  This has
been triple-checked as the ceph docs seem to suggest that might be the
cause of this issue.

Here is what ceph -s and ceph health detail are reporting as well as the
mon_status for each monitor:


# ceph -s ; ceph health detail
cluster XXX
 health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
 monmap e2: 3 mons at {ceph-mon-01=
10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0},
election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02
 osdmap e49213: 80 osds: 80 up, 80 in
  pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects
197 TB used, 95904 GB / 290 TB avail
   8 active+clean+scrubbing+deep
4856 active+clean
  client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s
HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum)


{ "name": "ceph-mon-01",
  "rank": 0,
  "state": "leader",
  "election_epoch": 932,
  "quorum": [
0,
1],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 2,
  "fsid": "XXX",
  "modified": "0.00",
  "created": "0.00",
  "mons": [
{ "rank": 0,
  "name": "ceph-mon-01",
  "addr": "10.1.1.64:6789\/0"},
{ "rank": 1,
  "name": "ceph-mon-02",
  "addr": "10.1.1.65:6789\/0"},
{ "rank": 2,
  "name": "ceph-mon-03",
  "addr": "10.1.1.66:6789\/0"}]}}


{ "name": "ceph-mon-02",
  "rank": 1,
  "state": "peon",
  "election_epoch": 932,
  "quorum": [
0,
1],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 2,
  "fsid": "XXX",
  "modified": "0.00",
  "created": "0.00",
  "mons": [
{ "rank": 0,
  "name": "ceph-mon-01",
  "addr": "10.1.1.64:6789\/0"},
{ "rank": 1,
  "name": "ceph-mon-02",
  "addr": "10.1.1.65:6789\/0"},
{ "rank": 2,
  "name": "ceph-mon-03",
  "addr": "10.1.1.66:6789\/0"}]}}


{ "name": "ceph-mon-03",
  "rank": 2,
  "state": "electing",
  "election_epoch": 931,
  "quorum": [],
  "outside_quorum": [],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": { "epoch": 2,
  "fsid": "XXX",
  "modified": "0.00",
  "created": "0.00",
  "mons": [
{ "rank": 0,
  "name": "ceph-mon-01",
  "addr": "10.1.1.64:6789\/0"},
{ "rank": 1,
  "name": "ceph-mon-02",
  "addr": "10.1.1.65:6789\/0"},
{ "rank": 2,
  "name": "ceph-mon-03",
  "addr": "10.1.1.66:6789\/0"}]}}


Any help or advice is appreciated.

Regards

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor quorum

2014-09-17 Thread Florian Haas
On Wed, Sep 17, 2014 at 1:58 PM, James Eckersall
 wrote:
> Hi,
>
> I have a ceph cluster running 0.80.1 on Ubuntu 14.04.  I have 3 monitors and
> 4 OSD nodes currently.
>
> Everything has been running great up until today where I've got an issue
> with the monitors.
> I moved mon03 to a different switchport so it would have temporarily lost
> connectivity.
> Since then, the cluster is reporting that that mon is down, although it's
> definitely up.
> I've tried restarting the mon services on all three mons, but that hasn't
> made a difference.
> I definitely, 100% do not have any clock skew on any of the mons.  This has
> been triple-checked as the ceph docs seem to suggest that might be the cause
> of this issue.
>
> Here is what ceph -s and ceph health detail are reporting as well as the
> mon_status for each monitor:
>
>
> # ceph -s ; ceph health detail
> cluster XXX
>  health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
>  monmap e2: 3 mons at
> {ceph-mon-01=10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0},
> election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02
>  osdmap e49213: 80 osds: 80 up, 80 in
>   pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects
> 197 TB used, 95904 GB / 290 TB avail
>8 active+clean+scrubbing+deep
> 4856 active+clean
>   client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s
> HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
> mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum)
>
>
> { "name": "ceph-mon-01",
>   "rank": 0,
>   "state": "leader",
>   "election_epoch": 932,
>   "quorum": [
> 0,
> 1],
>   "outside_quorum": [],
>   "extra_probe_peers": [],
>   "sync_provider": [],
>   "monmap": { "epoch": 2,
>   "fsid": "XXX",
>   "modified": "0.00",
>   "created": "0.00",
>   "mons": [
> { "rank": 0,
>   "name": "ceph-mon-01",
>   "addr": "10.1.1.64:6789\/0"},
> { "rank": 1,
>   "name": "ceph-mon-02",
>   "addr": "10.1.1.65:6789\/0"},
> { "rank": 2,
>   "name": "ceph-mon-03",
>   "addr": "10.1.1.66:6789\/0"}]}}
>
>
> { "name": "ceph-mon-02",
>   "rank": 1,
>   "state": "peon",
>   "election_epoch": 932,
>   "quorum": [
> 0,
> 1],
>   "outside_quorum": [],
>   "extra_probe_peers": [],
>   "sync_provider": [],
>   "monmap": { "epoch": 2,
>   "fsid": "XXX",
>   "modified": "0.00",
>   "created": "0.00",
>   "mons": [
> { "rank": 0,
>   "name": "ceph-mon-01",
>   "addr": "10.1.1.64:6789\/0"},
> { "rank": 1,
>   "name": "ceph-mon-02",
>   "addr": "10.1.1.65:6789\/0"},
> { "rank": 2,
>   "name": "ceph-mon-03",
>   "addr": "10.1.1.66:6789\/0"}]}}
>
>
> { "name": "ceph-mon-03",
>   "rank": 2,
>   "state": "electing",
>   "election_epoch": 931,
>   "quorum": [],
>   "outside_quorum": [],
>   "extra_probe_peers": [],
>   "sync_provider": [],
>   "monmap": { "epoch": 2,
>   "fsid": "XXX",
>   "modified": "0.00",
>   "created": "0.00",
>   "mons": [
> { "rank": 0,
>   "name": "ceph-mon-01",
>   "addr": "10.1.1.64:6789\/0"},
> { "rank": 1,
>   "name": "ceph-mon-02",
>   "addr": "10.1.1.65:6789\/0"},
> { "rank": 2,
>   "name": "ceph-mon-03",
>   "addr": "10.1.1.66:6789\/0"}]}}
>
>
> Any help or advice is appreciated.

It looks like your mon has been unable to communicate with the other
hosts, presumably since the time you un-/replugged it. Check your
switch port configuration. Also, make sure that from 10.1.1.66, you
can not only ping 10.1.1.64 and 10.1.1.65, but make a TCP connection
on port 6789. With that out of the way, check your mon log on
ceph-mon-03 (in /var/log/ceph/mon); it should provide some additional
insight into the problem.

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Florian Haas
Hi Craig,

just dug this up in the list archives.

On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis  wrote:
> In the interest of removing variables, I removed all snapshots on all pools,
> then restarted all ceph daemons at the same time.  This brought up osd.8 as
> well.

So just to summarize this: your 100% CPU problem at the time went away
after you removed all snapshots, and the actual cause of the issue was
never found?

I am seeing a similar issue now, and have filed
http://tracker.ceph.com/issues/9503 to make sure it doesn't get lost
again. Can you take a look at that issue and let me know if anything
in the description sounds familiar?

You mentioned in a later message in the same thread that you would
keep your snapshot script running and "repeat the experiment". Did the
situation change in any way after that? Did the issue come back? Or
did you just stop using snapshots altogether?

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor quorum

2014-09-17 Thread James Eckersall
Hi,

Thanks for the advice.

I feel pretty dumb as it does indeed look like a simple networking issue.
 You know how you check things 5 times and miss the most obvious one...

J

On 17 September 2014 16:04, Florian Haas  wrote:

> On Wed, Sep 17, 2014 at 1:58 PM, James Eckersall
>  wrote:
> > Hi,
> >
> > I have a ceph cluster running 0.80.1 on Ubuntu 14.04.  I have 3 monitors
> and
> > 4 OSD nodes currently.
> >
> > Everything has been running great up until today where I've got an issue
> > with the monitors.
> > I moved mon03 to a different switchport so it would have temporarily lost
> > connectivity.
> > Since then, the cluster is reporting that that mon is down, although it's
> > definitely up.
> > I've tried restarting the mon services on all three mons, but that hasn't
> > made a difference.
> > I definitely, 100% do not have any clock skew on any of the mons.  This
> has
> > been triple-checked as the ceph docs seem to suggest that might be the
> cause
> > of this issue.
> >
> > Here is what ceph -s and ceph health detail are reporting as well as the
> > mon_status for each monitor:
> >
> >
> > # ceph -s ; ceph health detail
> > cluster XXX
> >  health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
> >  monmap e2: 3 mons at
> > {ceph-mon-01=
> 10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0
> },
> > election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02
> >  osdmap e49213: 80 osds: 80 up, 80 in
> >   pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects
> > 197 TB used, 95904 GB / 290 TB avail
> >8 active+clean+scrubbing+deep
> > 4856 active+clean
> >   client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s
> > HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02
> > mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum)
> >
> >
> > { "name": "ceph-mon-01",
> >   "rank": 0,
> >   "state": "leader",
> >   "election_epoch": 932,
> >   "quorum": [
> > 0,
> > 1],
> >   "outside_quorum": [],
> >   "extra_probe_peers": [],
> >   "sync_provider": [],
> >   "monmap": { "epoch": 2,
> >   "fsid": "XXX",
> >   "modified": "0.00",
> >   "created": "0.00",
> >   "mons": [
> > { "rank": 0,
> >   "name": "ceph-mon-01",
> >   "addr": "10.1.1.64:6789\/0"},
> > { "rank": 1,
> >   "name": "ceph-mon-02",
> >   "addr": "10.1.1.65:6789\/0"},
> > { "rank": 2,
> >   "name": "ceph-mon-03",
> >   "addr": "10.1.1.66:6789\/0"}]}}
> >
> >
> > { "name": "ceph-mon-02",
> >   "rank": 1,
> >   "state": "peon",
> >   "election_epoch": 932,
> >   "quorum": [
> > 0,
> > 1],
> >   "outside_quorum": [],
> >   "extra_probe_peers": [],
> >   "sync_provider": [],
> >   "monmap": { "epoch": 2,
> >   "fsid": "XXX",
> >   "modified": "0.00",
> >   "created": "0.00",
> >   "mons": [
> > { "rank": 0,
> >   "name": "ceph-mon-01",
> >   "addr": "10.1.1.64:6789\/0"},
> > { "rank": 1,
> >   "name": "ceph-mon-02",
> >   "addr": "10.1.1.65:6789\/0"},
> > { "rank": 2,
> >   "name": "ceph-mon-03",
> >   "addr": "10.1.1.66:6789\/0"}]}}
> >
> >
> > { "name": "ceph-mon-03",
> >   "rank": 2,
> >   "state": "electing",
> >   "election_epoch": 931,
> >   "quorum": [],
> >   "outside_quorum": [],
> >   "extra_probe_peers": [],
> >   "sync_provider": [],
> >   "monmap": { "epoch": 2,
> >   "fsid": "XXX",
> >   "modified": "0.00",
> >   "created": "0.00",
> >   "mons": [
> > { "rank": 0,
> >   "name": "ceph-mon-01",
> >   "addr": "10.1.1.64:6789\/0"},
> > { "rank": 1,
> >   "name": "ceph-mon-02",
> >   "addr": "10.1.1.65:6789\/0"},
> > { "rank": 2,
> >   "name": "ceph-mon-03",
> >   "addr": "10.1.1.66:6789\/0"}]}}
> >
> >
> > Any help or advice is appreciated.
>
> It looks like your mon has been unable to communicate with the other
> hosts, presumably since the time you un-/replugged it. Check your
> switch port configuration. Also, make sure that from 10.1.1.66, you
> can not only ping 10.1.1.64 and 10.1.1.65, but make a TCP connection
> on port 6789. With that out of the way, check your mon log on
> ceph-mon-03 (in /var/log/ceph/mon); it should provide some additional
> insight into the problem.
>
> Cheers,
> Florian
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Dan Van Der Ster
Hi Florian,

> On 17 Sep 2014, at 17:09, Florian Haas  wrote:
> 
> Hi Craig,
> 
> just dug this up in the list archives.
> 
> On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis  
> wrote:
>> In the interest of removing variables, I removed all snapshots on all pools,
>> then restarted all ceph daemons at the same time.  This brought up osd.8 as
>> well.
> 
> So just to summarize this: your 100% CPU problem at the time went away
> after you removed all snapshots, and the actual cause of the issue was
> never found?
> 
> I am seeing a similar issue now, and have filed
> http://tracker.ceph.com/issues/9503 to make sure it doesn't get lost
> again. Can you take a look at that issue and let me know if anything
> in the description sounds familiar?


Could your ticket be related to the snap trimming issue I’ve finally narrowed 
down in the past couple days?

  http://tracker.ceph.com/issues/9487

Bump up debug_osd to 20 then check the log during one of your incidents. If it 
is busy logging the snap_trimmer messages, then it’s the same issue. (The issue 
is that rbd pools have many purged_snaps, but sometimes after backfilling a PG 
the purged_snaps list is lost and thus the snap trimmer becomes very busy 
whilst re-trimming thousands of snaps. During that time (a few minutes on my 
cluster) the OSD is blocked.)

Cheers, Dan



> 
> You mentioned in a later message in the same thread that you would
> keep your snapshot script running and "repeat the experiment". Did the
> situation change in any way after that? Did the issue come back? Or
> did you just stop using snapshots altogether?
> 
> Cheers,
> Florian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Florian Haas
On Wed, Sep 17, 2014 at 5:24 PM, Dan Van Der Ster
 wrote:
> Hi Florian,
>
>> On 17 Sep 2014, at 17:09, Florian Haas  wrote:
>>
>> Hi Craig,
>>
>> just dug this up in the list archives.
>>
>> On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis  
>> wrote:
>>> In the interest of removing variables, I removed all snapshots on all pools,
>>> then restarted all ceph daemons at the same time.  This brought up osd.8 as
>>> well.
>>
>> So just to summarize this: your 100% CPU problem at the time went away
>> after you removed all snapshots, and the actual cause of the issue was
>> never found?
>>
>> I am seeing a similar issue now, and have filed
>> http://tracker.ceph.com/issues/9503 to make sure it doesn't get lost
>> again. Can you take a look at that issue and let me know if anything
>> in the description sounds familiar?
>
>
> Could your ticket be related to the snap trimming issue I’ve finally narrowed 
> down in the past couple days?
>
>   http://tracker.ceph.com/issues/9487
>
> Bump up debug_osd to 20 then check the log during one of your incidents. If 
> it is busy logging the snap_trimmer messages, then it’s the same issue. (The 
> issue is that rbd pools have many purged_snaps, but sometimes after 
> backfilling a PG the purged_snaps list is lost and thus the snap trimmer 
> becomes very busy whilst re-trimming thousands of snaps. During that time (a 
> few minutes on my cluster) the OSD is blocked.)

That sounds promising, thank you! debug_osd=10 should actually be
sufficient as those snap_trim messages get logged at that level. :)

Do I understand your issue report correctly in that you have found
setting osd_snap_trim_sleep to be ineffective, because it's being
applied when iterating from PG to PG, rather than from snap to snap?
If so, then I'm guessing that that can hardly be intentional...

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Dan Van Der Ster
Hi,
(Sorry for top posting, mobile now).

That's exactly what I observe -- one sleep per PG. The problem is that the 
sleep can't simply be moved since AFAICT the whole PG is locked for the 
duration of the trimmer. So the options I proposed are to limit the number of 
snaps trimmed per call to e.g 16, or to fix the loss of purged_snaps after 
backfilling. Actually, probably both of those are needed. But a real dev would 
know better.

Cheers, Dan


From: Florian Haas 
Sent: Sep 17, 2014 5:33 PM
To: Dan Van Der Ster
Cc: Craig Lewis ;ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

On Wed, Sep 17, 2014 at 5:24 PM, Dan Van Der Ster
 wrote:
> Hi Florian,
>
>> On 17 Sep 2014, at 17:09, Florian Haas  wrote:
>>
>> Hi Craig,
>>
>> just dug this up in the list archives.
>>
>> On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis  
>> wrote:
>>> In the interest of removing variables, I removed all snapshots on all pools,
>>> then restarted all ceph daemons at the same time.  This brought up osd.8 as
>>> well.
>>
>> So just to summarize this: your 100% CPU problem at the time went away
>> after you removed all snapshots, and the actual cause of the issue was
>> never found?
>>
>> I am seeing a similar issue now, and have filed
>> http://tracker.ceph.com/issues/9503 to make sure it doesn't get lost
>> again. Can you take a look at that issue and let me know if anything
>> in the description sounds familiar?
>
>
> Could your ticket be related to the snap trimming issue I’ve finally narrowed 
> down in the past couple days?
>
>   http://tracker.ceph.com/issues/9487
>
> Bump up debug_osd to 20 then check the log during one of your incidents. If 
> it is busy logging the snap_trimmer messages, then it’s the same issue. (The 
> issue is that rbd pools have many purged_snaps, but sometimes after 
> backfilling a PG the purged_snaps list is lost and thus the snap trimmer 
> becomes very busy whilst re-trimming thousands of snaps. During that time (a 
> few minutes on my cluster) the OSD is blocked.)

That sounds promising, thank you! debug_osd=10 should actually be
sufficient as those snap_trim messages get logged at that level. :)

Do I understand your issue report correctly in that you have found
setting osd_snap_trim_sleep to be ineffective, because it's being
applied when iterating from PG to PG, rather than from snap to snap?
If so, then I'm guessing that that can hardly be intentional...

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU

2014-09-17 Thread Florian Haas
On Wed, Sep 17, 2014 at 5:42 PM, Dan Van Der Ster
 wrote:
> From: Florian Haas 
> Sent: Sep 17, 2014 5:33 PM
> To: Dan Van Der Ster
> Cc: Craig Lewis ;ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] RGW hung, 2 OSDs using 100% CPU
>
> On Wed, Sep 17, 2014 at 5:24 PM, Dan Van Der Ster
>  wrote:
>> Hi Florian,
>>
>>> On 17 Sep 2014, at 17:09, Florian Haas  wrote:
>>>
>>> Hi Craig,
>>>
>>> just dug this up in the list archives.
>>>
>>> On Fri, Mar 28, 2014 at 2:04 AM, Craig Lewis 
>>> wrote:
 In the interest of removing variables, I removed all snapshots on all
 pools,
 then restarted all ceph daemons at the same time.  This brought up osd.8
 as
 well.
>>>
>>> So just to summarize this: your 100% CPU problem at the time went away
>>> after you removed all snapshots, and the actual cause of the issue was
>>> never found?
>>>
>>> I am seeing a similar issue now, and have filed
>>> http://tracker.ceph.com/issues/9503 to make sure it doesn't get lost
>>> again. Can you take a look at that issue and let me know if anything
>>> in the description sounds familiar?
>>
>>
>> Could your ticket be related to the snap trimming issue I’ve finally
>> narrowed down in the past couple days?
>>
>>   http://tracker.ceph.com/issues/9487
>>
>> Bump up debug_osd to 20 then check the log during one of your incidents.
>> If it is busy logging the snap_trimmer messages, then it’s the same issue.
>> (The issue is that rbd pools have many purged_snaps, but sometimes after
>> backfilling a PG the purged_snaps list is lost and thus the snap trimmer
>> becomes very busy whilst re-trimming thousands of snaps. During that time (a
>> few minutes on my cluster) the OSD is blocked.)
>
> That sounds promising, thank you! debug_osd=10 should actually be
> sufficient as those snap_trim messages get logged at that level. :)
>
> Do I understand your issue report correctly in that you have found
> setting osd_snap_trim_sleep to be ineffective, because it's being
> applied when iterating from PG to PG, rather than from snap to snap?
> If so, then I'm guessing that that can hardly be intentional...
>
> Cheers,
> Florian
>
> Hi,
> (Sorry for top posting, mobile now).

I've taken the liberty to reformat. :)

> That's exactly what I observe -- one sleep per PG. The problem is that the
> sleep can't simply be moved since AFAICT the whole PG is locked for the
> duration of the trimmer. So the options I proposed are to limit the number
> of snaps trimmed per call to e.g 16, or to fix the loss of purged_snaps
> after backfilling. Actually, probably both of those are needed. But a real
> dev would know better.

Okay. Certainly worth a try. Thanks again! I'll let you know when I know more.

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor quorum

2014-09-17 Thread Florian Haas
On Wed, Sep 17, 2014 at 5:21 PM, James Eckersall
 wrote:
> Hi,
>
> Thanks for the advice.
>
> I feel pretty dumb as it does indeed look like a simple networking issue.
> You know how you check things 5 times and miss the most obvious one...
>
> J

No worries at all .:)

Cheers,
Florian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor quorum

2014-09-17 Thread James Eckersall
Hi,

Now I feel dumb for jumping to the conclusion that it was a simple
networking issue - it isn't.
I've just checked connectivity properly and I can ping and telnet 6789 from
all mon servers to all other mon servers.

I've just restarted the mon03 service and the log is showing the following:

2014-09-17 16:49:02.355148 7f7ef9f8c800  0 starting mon.ceph-mon-03 rank 2
at 10.1.1.66:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-mon-03 fsid
74069c87-b361-4bb8-8ce8-6ae9deb8a9bd
2014-09-17 16:49:02.355375 7f7ef9f8c800  1 mon.ceph-mon-03@-1(probing) e2
preinit fsid 74069c87-b361-4bb8-8ce8-6ae9deb8a9bd
2014-09-17 16:49:02.356347 7f7ef9f8c800  1
mon.ceph-mon-03@-1(probing).paxosservice(pgmap
18241250..18241952) refresh upgraded, format 0 -> 1
2014-09-17 16:49:02.356360 7f7ef9f8c800  1 mon.ceph-mon-03@-1(probing).pg
v0 on_upgrade discarding in-core PGMap
2014-09-17 16:49:02.400316 7f7ef9f8c800  0 mon.ceph-mon-03@-1(probing).mds
e1 print_map
epoch 1
flags 0
created 2013-12-09 10:19:58.534310
modified 2013-12-09 10:19:58.534332
tableserver 0
root 0
session_timeout 60
session_autoclose 300
max_file_size 1099511627776
last_failure 0
last_failure_osd_epoch 0
compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding}
max_mds 1
in
up {}
failed
stopped
data_pools 0
metadata_pool 1
inline_data disabled

2014-09-17 16:49:02.402373 7f7ef9f8c800  0 mon.ceph-mon-03@-1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.402384 7f7ef9f8c800  0 mon.ceph-mon-03@-1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.402386 7f7ef9f8c800  0 mon.ceph-mon-03@-1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.402388 7f7ef9f8c800  0 mon.ceph-mon-03@-1(probing).osd
e49212 crush map has features 1107558400, adjusting msgr requires
2014-09-17 16:49:02.403725 7f7ef9f8c800  1
mon.ceph-mon-03@-1(probing).paxosservice(auth
26001..26154) refresh upgraded, format 0 -> 1
2014-09-17 16:49:02.404834 7f7ef9f8c800  0 mon.ceph-mon-03@-1(probing) e2
 my rank is now 2 (was -1)
2014-09-17 16:49:02.407439 7f7ef331b700  1 mon.ceph-mon-03@2(synchronizing)
e2 sync_obtain_latest_monmap
2014-09-17 16:49:02.407588 7f7ef331b700  1 mon.ceph-mon-03@2(synchronizing)
e2 sync_obtain_latest_monmap obtained monmap e2
2014-09-17 16:49:09.514365 7f7ef331b700  0 log [INF] : mon.ceph-mon-03
calling new monitor election
2014-09-17 16:49:09.514523 7f7ef331b700  1
mon.ceph-mon-03@2(electing).elector(931)
init, last seen epoch 931
2014-09-17 16:49:09.514658 7f7ef331b700  1
mon.ceph-mon-03@2(electing).paxos(paxos
recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514659
lease_expire=0.00 has v0 lc 31224482
2014-09-17 16:49:09.514665 7f7ef331b700  1
mon.ceph-mon-03@2(electing).paxos(paxos
recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514666
lease_expire=0.00 has v0 lc 31224482
2014-09-17 16:49:15.533876 7f7ef3b1c700  1
mon.ceph-mon-03@2(electing).elector(933)
init, last seen epoch 933
2014-09-17 16:49:21.578269 7f7ef3b1c700  1
mon.ceph-mon-03@2(electing).elector(935)
init, last seen epoch 935
2014-09-17 16:49:26.578526 7f7ef3b1c700  1
mon.ceph-mon-03@2(electing).elector(935)
init, last seen epoch 935
2014-09-17 16:49:31.578790 7f7ef3b1c700  1
mon.ceph-mon-03@2(electing).elector(935)
init, last seen epoch 935
2014-09-17 16:49:36.579044 7f7ef3b1c700  1
mon.ceph-mon-03@2(electing).elector(935)
init, last seen epoch 935


The last lines about "electing" repeat forever.  The other mons are logging
far more entries than I have seen them log before.  They look like the
following (note the timestamps - all of these log lines are from just a 2
second period):

2014-09-17 16:55:10.019407 7fd5a479a700  1 mon.ceph-mon-02@1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019408
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.019418 7fd5a479a700  1 mon.ceph-mon-02@1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019418
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.180220 7fd5a479a700  1 mon.ceph-mon-02@1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180222
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.180233 7fd5a479a700  1 mon.ceph-mon-02@1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180234
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.192668 7fd5a479a700  1 mon.ceph-mon-02@1(peon).paxos(paxos
active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.192670
lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038
2014-09-17 16:55:10.192691 7fd5a479a700  1 mon.ceph-mon-02@1(peon).paxos(paxos
active c 31224401..3122

Re: [ceph-users] [Ceph-community] Can't Start-up MDS

2014-09-17 Thread Gregory Farnum
That looks like the beginning of an mds creation to me. What's your
problem in more detail, and what's the output of "ceph -s"?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Sep 15, 2014 at 5:34 PM, Shun-Fa Yang  wrote:
> Hi all,
>
> I'm installed ceph v 0.80.5 on Ubuntu 14.04 server version by using
> apt-get...
>
> The log of mds shows as following:
>
> 2014-09-15 17:24:58.291305 7fd6f6d47800  0 ceph version 0.80.5
> (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 10487
>
> 2014-09-15 17:24:58.302164 7fd6f6d47800 -1 mds.-1.0 *** no OSDs are up as of
> epoch 8, waiting
>
> 2014-09-15 17:25:08.302930 7fd6f6d47800 -1 mds.-1.-1 *** no OSDs are up as
> of epoch 8, waiting
>
> 2014-09-15 17:25:19.322092 7fd6f1938700  1 mds.-1.0 handle_mds_map standby
>
> 2014-09-15 17:25:19.325024 7fd6f1938700  1 mds.0.3 handle_mds_map i am now
> mds.0.3
>
> 2014-09-15 17:25:19.325026 7fd6f1938700  1 mds.0.3 handle_mds_map state
> change up:standby --> up:creating
>
> 2014-09-15 17:25:19.325196 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:1
>
> 2014-09-15 17:25:19.325377 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:100
>
> 2014-09-15 17:25:19.325381 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:600
>
> 2014-09-15 17:25:19.325449 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:601
>
> 2014-09-15 17:25:19.325489 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:602
>
> 2014-09-15 17:25:19.325538 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:603
>
> 2014-09-15 17:25:19.325564 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:604
>
> 2014-09-15 17:25:19.325603 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:605
>
> 2014-09-15 17:25:19.325627 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:606
>
> 2014-09-15 17:25:19.325655 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:607
>
> 2014-09-15 17:25:19.325682 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:608
>
> 2014-09-15 17:25:19.325714 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:609
>
> 2014-09-15 17:25:19.325738 7fd6f1938700  0 mds.0.cache creating system inode
> with ino:200
>
> Could someone tell me how to solve it?
>
> Thanks.
>
> --
> 楊順發(yang shun-fa)
>
> ___
> Ceph-community mailing list
> ceph-commun...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Next Week: Ceph Day San Jose

2014-09-17 Thread Ross Turk

Hey everyone!  We just posted the agenda for next week’s Ceph Day in San Jose:

http://ceph.com/cephdays/san-jose/

This Ceph Day will be held in a beautiful facility provided by our friends at 
Brocade.  We have a lot of great speakers from Brocade, Red Hat, Dell, Fujitsu, 
HGST, and Supermicro, so if you’re in the area we welcome you to join us.

To register with a 25% discount, use this link:

https://cephdaysanjose.eventbrite.com/?discount=Community

We hope to see you there!

Cheers,
Ross


--  
Ross Turk  
Director, Ceph Marketing & Community
@rossturk @ceph  

"Sufficiently advanced technology is indistinguishable from magic."  
-- Arthur C. Clarke


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] getting ulimit set error while installing ceph in admin node

2014-09-17 Thread Subhadip Bagui
Hi,

any suggestions ?

Regards,
Subhadip

---

On Wed, Sep 17, 2014 at 9:05 AM, Subhadip Bagui  wrote:

> Hi
>
> I'm getting the below error while installing ceph in admin node. Please
> let me know how to resolve the same.
>
>
> [ceph@ceph-admin ceph-cluster]$* ceph-deploy mon create-initial
> ceph-admin*
>
>
> [ceph_deploy.conf][DEBUG ] found configuration file at:
> /home/ceph/.cephdeploy.conf
>
> [ceph_deploy.cli][INFO  ] Invoked (1.5.14): /usr/bin/ceph-deploy mon
> create-initial ceph-admin
>
> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-admin
>
> [ceph_deploy.mon][DEBUG ] detecting platform for host ceph-admin ...
>
> [ceph-admin][DEBUG ] connected to host: ceph-admin
>
> [ceph-admin][DEBUG ] detect platform information from remote host
>
> [ceph-admin][DEBUG ] detect machine type
>
> [ceph_deploy.mon][INFO  ] distro info: CentOS 6.5 Final
>
> [ceph-admin][DEBUG ] determining if provided host has same hostname in
> remote
>
> [ceph-admin][DEBUG ] get remote short hostname
>
> [ceph-admin][DEBUG ] deploying mon to ceph-admin
>
> [ceph-admin][DEBUG ] get remote short hostname
>
> [ceph-admin][DEBUG ] remote hostname: ceph-admin
>
> [ceph-admin][DEBUG ] write cluster configuration to
> /etc/ceph/{cluster}.conf
>
> [ceph-admin][DEBUG ] create the mon path if it does not exist
>
> [ceph-admin][DEBUG ] checking for done path:
> /var/lib/ceph/mon/ceph-ceph-admin/done
>
> [ceph-admin][DEBUG ] done path does not exist:
> /var/lib/ceph/mon/ceph-ceph-admin/done
>
> [ceph-admin][INFO  ] creating keyring file:
> /var/lib/ceph/tmp/ceph-ceph-admin.mon.keyring
>
> [ceph-admin][DEBUG ] create the monitor keyring file
>
> [ceph-admin][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs
> -i ceph-admin --keyring /var/lib/ceph/tmp/ceph-ceph-admin.mon.keyring
>
> [ceph-admin][DEBUG ] ceph-mon: set fsid to
> a36227e3-a39f-41cb-bba1-fea098a4fc65
>
> [ceph-admin][DEBUG ] ceph-mon: created monfs at
> /var/lib/ceph/mon/ceph-ceph-admin for mon.ceph-admin
>
> [ceph-admin][INFO  ] unlinking keyring file
> /var/lib/ceph/tmp/ceph-ceph-admin.mon.keyring
>
> [ceph-admin][DEBUG ] create a done file to avoid re-doing the mon
> deployment
>
> [ceph-admin][DEBUG ] create the init path if it does not exist
>
> [ceph-admin][DEBUG ] locating the `service` executable...
>
> [ceph-admin][INFO  ] Running command: sudo /sbin/service ceph -c
> /etc/ceph/ceph.conf start mon.ceph-admin
>
> [ceph-admin][DEBUG ] === mon.ceph-admin ===
>
> [ceph-admin][DEBUG ] Starting Ceph mon.ceph-admin on ceph-admin...
>
> [ceph-admin][DEBUG ] *failed: 'ulimit -n 32768;  /usr/bin/ceph-mon -i
> ceph-admin --pid-file /var/run/ceph/mon.ceph-admin.pid -c
> /etc/ceph/ceph.conf --cluster ceph '*
>
> [ceph-admin][DEBUG ] Starting ceph-create-keys on ceph-admin...
>
> [ceph-admin][WARNIN] No data was received after 7 seconds, disconnecting...
>
> [ceph-admin][INFO  ] Running command: sudo ceph --cluster=ceph
> --admin-daemon /var/run/ceph/ceph-mon.ceph-admin.asok mon_status
>
> [ceph-admin][ERROR ] admin_socket: exception getting command descriptions:
> [Errno 2] No such file or directory
>
> [ceph-admin][WARNIN] monitor: mon.ceph-admin, might not be running yet
>
> [ceph-admin][INFO  ] Running command: sudo ceph --cluster=ceph
> --admin-daemon /var/run/ceph/ceph-mon.ceph-admin.asok mon_status
>
> [ceph-admin][ERROR ] admin_socket: exception getting command descriptions:
> [Errno 2] No such file or directory
>
> [ceph-admin][WARNIN] ceph-admin is not defined in `mon initial members`
>
> [ceph-admin][WARNIN] monitor ceph-admin does not exist in monmap
>
> [ceph-admin][WARNIN] neither `public_addr` nor `public_network` keys are
> defined for monitors
>
> [ceph-admin][WARNIN] monitors may not be able to form quorum
> [ceph_deploy.mon][INFO  ] processing monitor mon.ceph-monitor
>
>
>
> Regards,
> Subhadip
>
> ---
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph issue: rbd vs. qemu-kvm

2014-09-17 Thread Steven Timm


I am trying to use Ceph as a data store with OpenNebula 4.6 and
have followed the instructions in OpenNebula's documentation
at
http://docs.opennebula.org/4.8/administration/storage/ceph_ds.html

and compared them against the "using libvirt with ceph"

http://ceph.com/docs/master/rbd/libvirt/

We are using the ceph-recompiled qemu-kvm and qemu-img as found at

http://ceph.com/packages/qemu-kvm/

under Scientific Linux 6.5 which is a Redhat clone.  Also a kernel-lt-3.10
kernel.

[root@fgtest15 qemu]# kvm -version
QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c) 
2003-2008 Fabrice Bellard




From qemu-img


Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2
qed parallels nbd blkdebug host_cdrom host_floppy host_device file rbd


--
Libvirt is trying to execute the following KVM command:

2014-09-17 19:50:12.774+: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none 
/usr/libexec/qemu-kvm -name one-60 -S -M rhel6.3.0 -enable-kvm -m 4096 
-smp 2,sockets=2,cores=1,threads=1 -uuid 
572499bf-07f3-3014-8d6a-dfa1ebb99aa4 -nodefconfig -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-60.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc 
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
-drive 
file=/var/lib/one//datastores/102/60/disk.1,if=none,id=drive-virtio-disk1,format=raw,cache=none 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 
-drive 
file=/var/lib/one//datastores/102/60/disk.2,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 
-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:02:0b:04,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:60 -k en-us -vga 
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

char device redirected to /dev/pts/3
qemu-kvm: -drive 
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none: 
could not open disk image 
rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789: 
Invalid argument

2014-09-17 19:50:12.980+: shutting down

---

just to show that from the command line I can see the rbd pool fine

[root@fgtest15 qemu]# rbd list one
foo
one-19
one-19-58-0
one-19-60-0
[root@fgtest15 qemu]# rbd info one/one-19-60-0
rbd image 'one-19-60-0':
size 40960 MB in 10240 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.3c39.238e1f29
format: 1


and even mount stuff with rbd map, etc.

It's only inside libvirt that we had the problem.

At first we were getting "permission denied" but then I upped the
permissions allowed to the libvirt user (client.libvirt2) and then
we are just getting  "invalid argument"


client.libvirt2
key: AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==
caps: [mon] allow r
caps: [osd] allow *, allow rwx pool=one

--

Any idea why kvm doesn't like the argument I am delivering in the file= 
argument?  Better--does anyone have a working kvm command out

of either opennebula or openstack against which I can compare?

Thanks

Steve Timm





--
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] getting ulimit set error while installing ceph in admin node

2014-09-17 Thread John Wilkins
Subhadip,

I updated the master branch of the preflight docs here:
http://ceph.com/docs/master/start/  We did encounter some issues that
were resolved with those preflight steps.

I think it might be either requiretty or SELinux. I will keep you
posted. Let me know if it helps.

On Wed, Sep 17, 2014 at 12:13 PM, Subhadip Bagui  wrote:
> Hi,
>
> any suggestions ?
>
> Regards,
> Subhadip
>
> ---
>
> On Wed, Sep 17, 2014 at 9:05 AM, Subhadip Bagui  wrote:
>>
>> Hi
>>
>> I'm getting the below error while installing ceph in admin node. Please
>> let me know how to resolve the same.
>>
>>
>> [ceph@ceph-admin ceph-cluster]$ ceph-deploy mon create-initial ceph-admin
>>
>>
>> [ceph_deploy.conf][DEBUG ] found configuration file at:
>> /home/ceph/.cephdeploy.conf
>>
>> [ceph_deploy.cli][INFO  ] Invoked (1.5.14): /usr/bin/ceph-deploy mon
>> create-initial ceph-admin
>>
>> [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts ceph-admin
>>
>> [ceph_deploy.mon][DEBUG ] detecting platform for host ceph-admin ...
>>
>> [ceph-admin][DEBUG ] connected to host: ceph-admin
>>
>> [ceph-admin][DEBUG ] detect platform information from remote host
>>
>> [ceph-admin][DEBUG ] detect machine type
>>
>> [ceph_deploy.mon][INFO  ] distro info: CentOS 6.5 Final
>>
>> [ceph-admin][DEBUG ] determining if provided host has same hostname in
>> remote
>>
>> [ceph-admin][DEBUG ] get remote short hostname
>>
>> [ceph-admin][DEBUG ] deploying mon to ceph-admin
>>
>> [ceph-admin][DEBUG ] get remote short hostname
>>
>> [ceph-admin][DEBUG ] remote hostname: ceph-admin
>>
>> [ceph-admin][DEBUG ] write cluster configuration to
>> /etc/ceph/{cluster}.conf
>>
>> [ceph-admin][DEBUG ] create the mon path if it does not exist
>>
>> [ceph-admin][DEBUG ] checking for done path:
>> /var/lib/ceph/mon/ceph-ceph-admin/done
>>
>> [ceph-admin][DEBUG ] done path does not exist:
>> /var/lib/ceph/mon/ceph-ceph-admin/done
>>
>> [ceph-admin][INFO  ] creating keyring file:
>> /var/lib/ceph/tmp/ceph-ceph-admin.mon.keyring
>>
>> [ceph-admin][DEBUG ] create the monitor keyring file
>>
>> [ceph-admin][INFO  ] Running command: sudo ceph-mon --cluster ceph --mkfs
>> -i ceph-admin --keyring /var/lib/ceph/tmp/ceph-ceph-admin.mon.keyring
>>
>> [ceph-admin][DEBUG ] ceph-mon: set fsid to
>> a36227e3-a39f-41cb-bba1-fea098a4fc65
>>
>> [ceph-admin][DEBUG ] ceph-mon: created monfs at
>> /var/lib/ceph/mon/ceph-ceph-admin for mon.ceph-admin
>>
>> [ceph-admin][INFO  ] unlinking keyring file
>> /var/lib/ceph/tmp/ceph-ceph-admin.mon.keyring
>>
>> [ceph-admin][DEBUG ] create a done file to avoid re-doing the mon
>> deployment
>>
>> [ceph-admin][DEBUG ] create the init path if it does not exist
>>
>> [ceph-admin][DEBUG ] locating the `service` executable...
>>
>> [ceph-admin][INFO  ] Running command: sudo /sbin/service ceph -c
>> /etc/ceph/ceph.conf start mon.ceph-admin
>>
>> [ceph-admin][DEBUG ] === mon.ceph-admin ===
>>
>> [ceph-admin][DEBUG ] Starting Ceph mon.ceph-admin on ceph-admin...
>>
>> [ceph-admin][DEBUG ] failed: 'ulimit -n 32768;  /usr/bin/ceph-mon -i
>> ceph-admin --pid-file /var/run/ceph/mon.ceph-admin.pid -c
>> /etc/ceph/ceph.conf --cluster ceph '
>>
>> [ceph-admin][DEBUG ] Starting ceph-create-keys on ceph-admin...
>>
>> [ceph-admin][WARNIN] No data was received after 7 seconds,
>> disconnecting...
>>
>> [ceph-admin][INFO  ] Running command: sudo ceph --cluster=ceph
>> --admin-daemon /var/run/ceph/ceph-mon.ceph-admin.asok mon_status
>>
>> [ceph-admin][ERROR ] admin_socket: exception getting command descriptions:
>> [Errno 2] No such file or directory
>>
>> [ceph-admin][WARNIN] monitor: mon.ceph-admin, might not be running yet
>>
>> [ceph-admin][INFO  ] Running command: sudo ceph --cluster=ceph
>> --admin-daemon /var/run/ceph/ceph-mon.ceph-admin.asok mon_status
>>
>> [ceph-admin][ERROR ] admin_socket: exception getting command descriptions:
>> [Errno 2] No such file or directory
>>
>> [ceph-admin][WARNIN] ceph-admin is not defined in `mon initial members`
>>
>> [ceph-admin][WARNIN] monitor ceph-admin does not exist in monmap
>>
>> [ceph-admin][WARNIN] neither `public_addr` nor `public_network` keys are
>> defined for monitors
>>
>> [ceph-admin][WARNIN] monitors may not be able to form quorum
>>
>> [ceph_deploy.mon][INFO  ] processing monitor mon.ceph-monitor
>>
>>
>>
>> Regards,
>> Subhadip
>>
>> ---
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
John Wilkins
Senior Technical Writer
Inktank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-admin pools list error

2014-09-17 Thread John Wilkins
Does radosgw-admin have authentication keys available and with
appropriate permissions?

http://ceph.com/docs/master/radosgw/config/#create-a-user-and-keyring

On Fri, Sep 12, 2014 at 3:13 AM, Santhosh Fernandes
 wrote:
> Hi,
>
> Anyone help me why my radosgw-admin pool list  give me this error
>
> #radosgw-admin pools list
> couldn't init storage provider
>
> But the rados lspools list all the pools,
>
> Regards,
> Santhosh
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
John Wilkins
Senior Technical Writer
Inktank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph issue: rbd vs. qemu-kvm

2014-09-17 Thread Luke Jing Yuan
Hi,

>From the ones we managed to configure in our lab here. I noticed that using 
>image format "raw" instead of "qcow2" worked for us.

Regards,
Luke

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steven 
Timm
Sent: Thursday, 18 September, 2014 5:01 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] ceph issue: rbd vs. qemu-kvm


I am trying to use Ceph as a data store with OpenNebula 4.6 and have followed 
the instructions in OpenNebula's documentation at 
http://docs.opennebula.org/4.8/administration/storage/ceph_ds.html

and compared them against the "using libvirt with ceph"

http://ceph.com/docs/master/rbd/libvirt/

We are using the ceph-recompiled qemu-kvm and qemu-img as found at

http://ceph.com/packages/qemu-kvm/

under Scientific Linux 6.5 which is a Redhat clone.  Also a kernel-lt-3.10 
kernel.

[root@fgtest15 qemu]# kvm -version
QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c)
2003-2008 Fabrice Bellard


>From qemu-img

Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2
qed parallels nbd blkdebug host_cdrom host_floppy host_device file rbd


--
Libvirt is trying to execute the following KVM command:

2014-09-17 19:50:12.774+: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none
/usr/libexec/qemu-kvm -name one-60 -S -M rhel6.3.0 -enable-kvm -m 4096
-smp 2,sockets=2,cores=1,threads=1 -uuid
572499bf-07f3-3014-8d6a-dfa1ebb99aa4 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-60.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/var/lib/one//datastores/102/60/disk.1,if=none,id=drive-virtio-disk1,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1
-drive
file=/var/lib/one//datastores/102/60/disk.2,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:02:0b:04,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:60 -k en-us -vga
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
char device redirected to /dev/pts/3
qemu-kvm: -drive
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none:
could not open disk image
rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789:
Invalid argument
2014-09-17 19:50:12.980+: shutting down

---

just to show that from the command line I can see the rbd pool fine

[root@fgtest15 qemu]# rbd list one
foo
one-19
one-19-58-0
one-19-60-0
[root@fgtest15 qemu]# rbd info one/one-19-60-0
rbd image 'one-19-60-0':
size 40960 MB in 10240 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.3c39.238e1f29
format: 1


and even mount stuff with rbd map, etc.

It's only inside libvirt that we had the problem.

At first we were getting "permission denied" but then I upped the
permissions allowed to the libvirt user (client.libvirt2) and then
we are just getting  "invalid argument"


client.libvirt2
key: AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==
caps: [mon] allow r
caps: [osd] allow *, allow rwx pool=one

--

Any idea why kvm doesn't like the argument I am delivering in the file=
argument?  Better--does anyone have a working kvm command out
of either opennebula or openstack against which I can compare?

Thanks

Steve Timm





--
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


DISCLAIMER:


This e-mail (including any attachments) is for the addressee(s) only and may be 
confidential, especially as regards personal data. If you are not the intended 
recipient, please note t

Re: [ceph-users] radosgw-admin pools list error

2014-09-17 Thread Santhosh Fernandes
Hi john,

I specify the name then I got this error.

#radosgw-admin pools list -n client.radosgw.in-west-1
could not list placement set: (2) No such file or directory

Regards,
Santhosh

On Thu, Sep 18, 2014 at 3:44 AM, John Wilkins 
wrote:

> Does radosgw-admin have authentication keys available and with
> appropriate permissions?
>
> http://ceph.com/docs/master/radosgw/config/#create-a-user-and-keyring
>
> On Fri, Sep 12, 2014 at 3:13 AM, Santhosh Fernandes
>  wrote:
> > Hi,
> >
> > Anyone help me why my radosgw-admin pool list  give me this error
> >
> > #radosgw-admin pools list
> > couldn't init storage provider
> >
> > But the rados lspools list all the pools,
> >
> > Regards,
> > Santhosh
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> John Wilkins
> Senior Technical Writer
> Inktank
> john.wilk...@inktank.com
> (415) 425-9599
> http://inktank.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph mds unable to start with 0.85

2014-09-17 Thread 廖建锋
dear,
 my ceph cluster worked for about two weeks,  mds crashed every 2-3 days,
Now it stuck on replay , looks like replay crash and restart mds process again
 what can i do for this?

 1015 => # ceph -s
cluster 07df7765-c2e7-44de-9bb3-0b13f6517b18
health HEALTH_ERR 56 pgs inconsistent; 56 scrub errors; mds cluster is 
degraded; noscrub,nodeep-scrub flag(s) set
monmap e1: 2 mons at 
{storage-1-213=10.1.0.213:6789/0,storage-1-214=10.1.0.214:6789/0}, election 
epoch 26, quorum 0,1 storage-1-213,storage-1-214
mdsmap e624: 1/1/1 up {0=storage-1-214=up:replay}, 1 up:standby
osdmap e1932: 18 osds: 18 up, 18 in
flags noscrub,nodeep-scrub
pgmap v732381: 500 pgs, 3 pools, 2155 GB data, 39187 kobjects
4479 GB used, 32292 GB / 36772 GB avail
444 active+clean
56 active+clean+inconsistent
client io 125 MB/s rd, 31 op/s

MDS log here:

014-09-18 12:36:10.684841 7f8240512700 5 mds.-1.-1 handle_mds_map epoch 620 
from mon.0
2014-09-18 12:36:10.684888 7f8240512700 1 mds.-1.0 handle_mds_map standby
2014-09-18 12:38:55.584370 7f8240512700 5 mds.-1.0 handle_mds_map epoch 621 
from mon.0
2014-09-18 12:38:55.584432 7f8240512700 1 mds.0.272 handle_mds_map i am now 
mds.0.272
2014-09-18 12:38:55.584436 7f8240512700 1 mds.0.272 handle_mds_map state change 
up:standby --> up:replay
2014-09-18 12:38:55.584440 7f8240512700 1 mds.0.272 replay_start
2014-09-18 12:38:55.584456 7f8240512700 7 mds.0.cache set_recovery_set
2014-09-18 12:38:55.584460 7f8240512700 1 mds.0.272 recovery set is
2014-09-18 12:38:55.584464 7f8240512700 1 mds.0.272 need osdmap epoch 1929, 
have 1927
2014-09-18 12:38:55.584467 7f8240512700 1 mds.0.272 waiting for osdmap 1929 
(which blacklists prior instance)
2014-09-18 12:38:55.584523 7f8240512700 5 mds.0.272 handle_mds_failure for 
myself; not doing anything
2014-09-18 12:38:55.585662 7f8240512700 2 mds.0.272 boot_start 0: opening 
inotable
2014-09-18 12:38:55.585864 7f8240512700 2 mds.0.272 boot_start 0: opening 
sessionmap
2014-09-18 12:38:55.586003 7f8240512700 2 mds.0.272 boot_start 0: opening mds 
log
2014-09-18 12:38:55.586049 7f8240512700 5 mds.0.log open discovering log bounds
2014-09-18 12:38:55.586136 7f8240512700 2 mds.0.272 boot_start 0: opening snap 
table
2014-09-18 12:38:55.586984 7f8240512700 5 mds.0.272 ms_handle_connect on 
10.1.0.213:6806/6114
2014-09-18 12:38:55.587037 7f8240512700 5 mds.0.272 ms_handle_connect on 
10.1.0.213:6811/6385
2014-09-18 12:38:55.587285 7f8240512700 5 mds.0.272 ms_handle_connect on 
10.1.0.213:6801/6110
2014-09-18 12:38:55.591700 7f823ca08700 4 mds.0.log Waiting for journal 200 to 
recover...
2014-09-18 12:38:55.593297 7f8240512700 5 mds.0.272 ms_handle_connect on 
10.1.0.214:6806/6238
2014-09-18 12:38:55.600952 7f823ca08700 4 mds.0.log Journal 200 recovered.
2014-09-18 12:38:55.600967 7f823ca08700 4 mds.0.log Recovered journal 200 in 
format 1
2014-09-18 12:38:55.600973 7f823ca08700 2 mds.0.272 boot_start 1: 
loading/discovering base inodes
2014-09-18 12:38:55.600979 7f823ca08700 0 mds.0.cache creating system inode 
with ino:100
2014-09-18 12:38:55.601279 7f823ca08700 0 mds.0.cache creating system inode 
with ino:1
2014-09-18 12:38:55.602557 7f8240512700 5 mds.0.272 ms_handle_connect on 
10.1.0.214:6811/6276
2014-09-18 12:38:55.607234 7f8240512700 2 mds.0.272 boot_start 2: replaying mds 
log
2014-09-18 12:38:55.675025 7f823ca08700 7 mds.0.cache adjust_subtree_auth -1,-2 
-> -2,-2 on [dir 1 / [2,head] auth v=0 cv=0/0 state=1073741824 f() n() 
hs=0+0,ss=0+0 0x5da]
2014-09-18 12:38:55.675055 7f823ca08700 7 mds.0.cache current root is [dir 1 / 
[2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 | subtree=1 
0x5da]
2014-09-18 12:38:55.675065 7f823ca08700 7 mds.0.cache adjust_subtree_auth -1,-2 
-> -2,-2 on [dir 100 ~mds0/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() 
hs=0+0,ss=0+0 0x5da03b8]
2014-09-18 12:38:55.675076 7f823ca08700 7 mds.0.cache current root is [dir 100 
~mds0/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 | 
subtree=1 0x5da03b8]
2014-09-18 12:38:55.675087 7f823ca08700 7 mds.0.cache 
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 1 / [2,head] auth v=1076158 
cv=0/0 dir_auth=-2 state=1073741824 f(v0 m2014-09-09 17:49:20.00 1=0+1) 
n(v87567 rc2014-09-16 12:44:41.750069 b1824476527135 
31747410=31708953+38457)/n(v87567 rc2014-09-16 12:44:38.450226 b1824464654503 
31746894=31708437+38457) hs=0+0,ss=0+0 | subtree=1 0x5da] bound_dfs []
2014-09-18 12:38:55.675116 7f823ca08700 7 mds.0.cache 
adjust_bounded_subtree_auth -2,-2 -> 0,-2 on [dir 1 / [2,head] auth v=1076158 
cv=0/0 dir_auth=-2 state=1073741824 f(v0 m2014-09-09 17:49:20.00 1=0+1) 
n(v87567 rc2014-09-16 12:44:41.750069 b1824476527135 
31747410=31708953+38457)/n(v87567 rc2014-09-16 12:44:38.450226 b1824464654503 
31746894=31708437+38457) hs=0+0,ss=0+0 | subtree=1 0x5da] bounds
2014-09-18 12:38:55.675129 7f823ca08700 7 mds.0.cache current root is [dir 1 / 
[2,head] auth v=1076158 cv=0/0 dir_auth=-2 state=1073741824 f(v0 m2014-09-09 
17:49:20

Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K IOPS

2014-09-17 Thread Zhang, Jian
Have anyone ever testing multi volume performance on a *FULL* SSD setup?
We are able to get ~18K IOPS for 4K random read on a single volume with fio 
(with rbd engine) on a 12x DC3700 Setup, but only able to get ~23K (peak) IOPS 
even with multiple volumes. 
Seems the maximum random write performance we can get on the entire cluster is 
quite close to single volume performance. 

Thanks
Jian


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Sebastien Han
Sent: Tuesday, September 16, 2014 9:33 PM
To: Alexandre DERUMIER
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 3, 2K 
IOPS

Hi,

Thanks for keeping us updated on this subject.
dsync is definitely killing the ssd.

I don't have much to add, I'm just surprised that you're only getting 5299 with 
0.85 since I've been able to get 6,4K, well I was using the 200GB model, that 
might explain this.


On 12 Sep 2014, at 16:32, Alexandre DERUMIER  wrote:

> here the results for the intel s3500
> 
> max performance is with ceph 0.85 + optracker disabled.
> intel s3500 don't have d_sync problem like crucial
> 
> %util show almost 100% for read and write, so maybe the ssd disk performance 
> is the limit.
> 
> I have some stec zeusram 8GB in stock (I used them for zfs zil), I'll try to 
> bench them next week.
> 
> 
> 
> 
> 
> 
> INTEL s3500
> ---
> raw disk
> 
> 
> randread: fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k 
> --iodepth=32 --group_reporting --invalidate=0 --name=abc 
> --ioengine=aio bw=288207KB/s, iops=72051
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdb   0,00 0,00 73454,000,00 293816,00 0,00 8,00  
>   30,960,420,420,00   0,01  99,90
> 
> randwrite: fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=4k 
> --iodepth=32 --group_reporting --invalidate=0 --name=abc --ioengine=aio 
> --sync=1 bw=48131KB/s, iops=12032
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdb   0,00 0,000,00 24120,00 0,00 48240,00 4,00   
>   2,080,090,000,09   0,04 100,00
> 
> 
> ceph 0.80
> -
> randread: no tuning:  bw=24578KB/s, iops=6144
> 
> 
> randwrite: bw=10358KB/s, iops=2589
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdb   0,00   373,000,00 8878,00 0,00 34012,50 7,66
>  1,630,180,000,18   0,06  50,90
> 
> 
> ceph 0.85 :
> -
> 
> randread :  bw=41406KB/s, iops=10351
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdb   2,00 0,00 10425,000,00 41816,00 0,00 8,02   
>   1,360,130,130,00   0,07  75,90
> 
> randwrite : bw=17204KB/s, iops=4301
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdb   0,00   333,000,00 9788,00 0,00 57909,0011,83
>  1,460,150,000,15   0,07  67,80
> 
> 
> ceph 0.85 tuning op_tracker=false
> 
> 
> randread :  bw=86537KB/s, iops=21634
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdb  25,00 0,00 21428,000,00 86444,00 0,00 8,07   
>   3,130,150,150,00   0,05  98,00
> 
> randwrite:  bw=21199KB/s, iops=5299
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
> avgqu-sz   await r_await w_await  svctm  %util
> sdb   0,00  1563,000,00 9880,00 0,00 75223,5015,23
>  2,090,210,000,21   0,07  80,00
> 
> 
> - Mail original -
> 
> De: "Alexandre DERUMIER" 
> À: "Cedric Lemarchand" 
> Cc: ceph-users@lists.ceph.com
> Envoyé: Vendredi 12 Septembre 2014 08:15:08
> Objet: Re: [ceph-users] [Single OSD performance on SSD] Can't go over 
> 3, 2K IOPS
> 
> results of fio on rbd with kernel patch
> 
> 
> 
> fio rbd crucial m550 1 osd 0.85 (osd_enable_op_tracker true or false, same 
> result):
> ---
> bw=12327KB/s, iops=3081
> 
> So no much better than before, but this time, iostat show only 15% 
> utils, and latencies are lower
> 
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await 
> r_await w_await svctm %util sdb 0,00 29,00 0,00 3075,00 0,00 36748,50 
> 23,90 0,29 0,10 0,00 0,10 0,05 15,20
> 
> 
> So, the write bottleneck seem to be in ceph.
> 
> 
> 
> I will send s3500 result today
> 
> - Mail original -
> 
> De: "Alexandre DERUMIER" 
> À: "Cedric Lemarchand" 
> Cc: ceph-users@lists.ceph.com
> Envoyé: Vendredi 12 Septem

Re: [ceph-users] ceph issue: rbd vs. qemu-kvm

2014-09-17 Thread Stijn De Weirdt

hi steven,

we ran into issues when trying to use a non-default user ceph user in 
opennebula (don't remeber what the default was; but it's probably not 
libvirt2 ), patches are in https://github.com/OpenNebula/one/pull/33, 
devs sort-of confirmed they will be in 4.8.1. this way you can set 
CEPH_USER in the datastore template. (but if this is the case, i think 
that onedatastore list fails to show size of the datastore)


stijn


On 09/18/2014 04:38 AM, Luke Jing Yuan wrote:

Hi,


From the ones we managed to configure in our lab here. I noticed that using image format 
"raw" instead of "qcow2" worked for us.


Regards,
Luke

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steven 
Timm
Sent: Thursday, 18 September, 2014 5:01 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] ceph issue: rbd vs. qemu-kvm


I am trying to use Ceph as a data store with OpenNebula 4.6 and have followed 
the instructions in OpenNebula's documentation at 
http://docs.opennebula.org/4.8/administration/storage/ceph_ds.html

and compared them against the "using libvirt with ceph"

http://ceph.com/docs/master/rbd/libvirt/

We are using the ceph-recompiled qemu-kvm and qemu-img as found at

http://ceph.com/packages/qemu-kvm/

under Scientific Linux 6.5 which is a Redhat clone.  Also a kernel-lt-3.10 
kernel.

[root@fgtest15 qemu]# kvm -version
QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c)
2003-2008 Fabrice Bellard



From qemu-img


Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2
qed parallels nbd blkdebug host_cdrom host_floppy host_device file rbd


--
Libvirt is trying to execute the following KVM command:

2014-09-17 19:50:12.774+: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none
/usr/libexec/qemu-kvm -name one-60 -S -M rhel6.3.0 -enable-kvm -m 4096
-smp 2,sockets=2,cores=1,threads=1 -uuid
572499bf-07f3-3014-8d6a-dfa1ebb99aa4 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-60.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/var/lib/one//datastores/102/60/disk.1,if=none,id=drive-virtio-disk1,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1
-drive
file=/var/lib/one//datastores/102/60/disk.2,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:02:0b:04,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:60 -k en-us -vga
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
char device redirected to /dev/pts/3
qemu-kvm: -drive
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none:
could not open disk image
rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789:
Invalid argument
2014-09-17 19:50:12.980+: shutting down

---

just to show that from the command line I can see the rbd pool fine

[root@fgtest15 qemu]# rbd list one
foo
one-19
one-19-58-0
one-19-60-0
[root@fgtest15 qemu]# rbd info one/one-19-60-0
rbd image 'one-19-60-0':
 size 40960 MB in 10240 objects
 order 22 (4096 kB objects)
 block_name_prefix: rb.0.3c39.238e1f29
 format: 1


and even mount stuff with rbd map, etc.

It's only inside libvirt that we had the problem.

At first we were getting "permission denied" but then I upped the
permissions allowed to the libvirt user (client.libvirt2) and then
we are just getting  "invalid argument"


client.libvirt2
 key: AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==
 caps: [mon] allow r
 caps: [osd] allow *, allow rwx pool=one

--

Any idea why kvm doesn't like the argument I am delivering in the file=
argument?  Better--does anyone have a working kvm command out
of either opennebula or openstack against which I can compare?

Thanks

Steve Timm





--
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Sci

Re: [ceph-users] ceph issue: rbd vs. qemu-kvm

2014-09-17 Thread Osier Yang


On 2014年09月18日 10:38, Luke Jing Yuan wrote:

Hi,

 From the ones we managed to configure in our lab here. I noticed that using image format 
"raw" instead of "qcow2" worked for us.

Regards,
Luke

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steven 
Timm
Sent: Thursday, 18 September, 2014 5:01 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] ceph issue: rbd vs. qemu-kvm


I am trying to use Ceph as a data store with OpenNebula 4.6 and have followed 
the instructions in OpenNebula's documentation at 
http://docs.opennebula.org/4.8/administration/storage/ceph_ds.html

and compared them against the "using libvirt with ceph"

http://ceph.com/docs/master/rbd/libvirt/

We are using the ceph-recompiled qemu-kvm and qemu-img as found at

http://ceph.com/packages/qemu-kvm/

under Scientific Linux 6.5 which is a Redhat clone.  Also a kernel-lt-3.10 
kernel.

[root@fgtest15 qemu]# kvm -version
QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c)
2003-2008 Fabrice Bellard


 From qemu-img

Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2
qed parallels nbd blkdebug host_cdrom host_floppy host_device file rbd


--
Libvirt is trying to execute the following KVM command:

2014-09-17 19:50:12.774+: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none
/usr/libexec/qemu-kvm -name one-60 -S -M rhel6.3.0 -enable-kvm -m 4096
-smp 2,sockets=2,cores=1,threads=1 -uuid
572499bf-07f3-3014-8d6a-dfa1ebb99aa4 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-60.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/var/lib/one//datastores/102/60/disk.1,if=none,id=drive-virtio-disk1,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1
-drive
file=/var/lib/one//datastores/102/60/disk.2,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0
-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:02:0b:04,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:60 -k en-us -vga
cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
char device redirected to /dev/pts/3
qemu-kvm: -drive
file=rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789,if=none,id=drive-virtio-disk0,format=qcow2,cache=none:
could not open disk image
rbd:one/one-19-60-0:id=libvirt2:key=AQAV5BlU2OV7NBAApurqxG0K8UkZlQVy6hKmkA==:auth_supported=cephx\;none:mon_host=stkendca01a\:6789\;stkendca04a\:6789\;stkendca02a\:6789:
Invalid argument



The error is from qemu-kvm.

You need to check whether your qemu-kvm supports all the arguments 
listed above for
option "-drive". As you mentioned, the qemu-kvm is built by youself. 
It's likely that you
missed something or the qemu-kvm version is old, and doesn't support 
some of the

arguments.

Regards,
Osier
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com