[ceph-users] about using SSD in cephfs, attached with some quantified benchmarks

2016-11-25 Thread JiaJia Zhong
confusing questions: (ceph0.94)


1. Is there any way to cache the whole metadata datas into MDS's memory ?   
(metadata osds dates-async> MDS memory)


I dunno if I misunderstand the role of mds :(,   so many post threads that 
advising Using SSD osds for metadata.
the metadata stores Inode information for files. Yes, It's fast to stat, ls, 
readdir for cephfs, 
but If metadatas could be cached in memory,  metadata osds datas 
-async> MDS memory, I guess, this may be better ?
we can use ssd journals, so write speed would not be the bottleneck.  cached 
mdatadatas are not large even if there are huge number of files.  ( I got that 
MooseFS strores all metadata in memory ?)


2. 
Any descriptions for the Journal under hood ? though it's like swap partition 
for Linux ~


using a Intel PCIE SSD as journals of HDD osds, 
I ran command blow for a  rough benchmark all of the osds simultaneously,


# for i in $(ps aux | grep osd | awk '{print $14}' | grep -v "^$" | sort); do 
ceph tell osd.$i bench & done


compared to anther HOST without SSD journal, these got a better bytes_per_sec, 
rising about more than 100%.
HDD journal OSDS  30MB/s --> HDD OSDS with SSD journal more than 60MB/s.   
(12 osds/host, hosts are almost same)


MB/s (HDD journal + HDD)
39
35
35
35
33
31
29
26
26
26
26
25

the top 39MB/s one an stata SSD OSD with stata SSD journal, but the speed seems 
to be  not faster than the otheres with HDD Journal + HDD Data.


MB/s (PCIE SSD Journal + HDD)
195
129
92
88
71
71
65
61
57
54
52
50

that 195MB/s is PCIE SSD Journal + SSD Data, the speed seems to be very fast. 
others are PCIE SSD Journal + HDD Data,




"bytes_per_sec": 166451390.00 for single bench on (PCIE Journal + HDD)
158.74MB/s

"bytes_per_sec": 78472933.00 for single bench on   (HDD Journal + HDD)  
 74.83MB/s


It seems that  "data ---> HDD Journal" is probable the main bottleneck ? how to 
track this
data > SSD Journal --> osd data partition
data > HDD Journal -> osd data partition


3.
any  cache or memory suggestion for better performance for cephfs?


key ceph.conf as below
[global]
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 512
osd pool default pgp num = 512
osd journal size = 1


[mds]
mds cache size = 11474836


[osd]
osd op threads = 4
filestore op threads = 4
osd crush update on start = false
#256M
osd max write size = 256
#256M
journal max write bytes = 268435456___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Assertion "needs_recovery" fails when balance_read reaches a replica OSD where the target object is not recovered yet.

2016-11-25 Thread xxhdx1985126
Hi, everyone.

In our online system, some OSDs always fail due to the following error:

2016-10-25 19:00:00.626567 7f9a63bff700 -1 error_msg osd/ReplicatedPG.cc: In 
function 'void ReplicatedPG::wait_for_unreadable_object(const hobject_t&, 
OpRequestRef)' thread 7f9a63bff700 time 2016-10-25 19:00:00.624499
osd/ReplicatedPG.cc: 387: FAILED assert(needs_recovery)

ceph version 0.94.5-12-g83f56a1 (83f56a1c84e3dbd95a4c394335a7b1dc926dd1c4)
 1: (ReplicatedPG::wait_for_unreadable_object(hobject_t const&, 
std::tr1::shared_ptr)+0x3f5) [0x8b5a65]
 2: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0x5e9) 
[0x8f0c79]
 3: (ReplicatedPG::do_request(std::tr1::shared_ptr&, 
ThreadPool::TPHandle&)+0x4e3) [0x87fdc3]
 4: (OSD::dequeue_op(boost::intrusive_ptr, 
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x178) [0x66b3f8]
 5: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59e) 
[0x66f8ee]
 6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x795) [0xa76d85]
 7: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa7a610]
 8: /lib64/libpthread.so.0() [0x3471407a51]
 9: (clone()+0x6d) [0x34710e893d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed 
to interpret this.

Our verion of ceph is 0.94.5. 
After doing some reading of the source code and analysis of our online 
scenarios, we made some conjecture:
   When encountering a large number of "balance_reads", the OSDs can be so 
busy that they can't send heartbeats in time, which could lead to monitors 
wrongly mark them down and triggers other OSDs to go through 
peering+recovery+process during which, on the replica OSDs, the assertion 
"needs_recovery" at ReplicatedPG.cc:387 has a large probability to fail.

To confirm this guess, we did some designated test. If I write extra code to 
make the recovery of some object wait for those ops targeting that object with 
the type "CEPH_MSG_OSD_OP"  to finish, the assertion "needs_recovery" at 
ReplicatedPG.cc:387 will always fail. And on the other hand, if I make those 
ops targeting some object with the type "CEPH_MSG_OSD_OP" wait for the 
corresponding recovery to finish, the assertion won't be triggered.

Can we come to the conclusion that the cause to the assertion failure is just 
as we thought? And, it seems that the purpose of the failed assertion is to 
make sure that the "missing_loc.needs_recovery_map" do contain the unreadable 
object. However, "missing_loc.needs_recovery_map" seems to be always empty on 
replica OSDs. Can we fix this problem simply by bypassing this assertion in 
some way like:
  if ( is_primary() ){
  bool needs_recovery = missing_loc.needs_recovery(soid, 
&v);
  assert(needs_recovery);
   }

I've also submit a new issue: BUG #18021. Please help me. Thank you:-)___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Q on radosGW

2016-11-25 Thread Andrey Shevel
Hi everybody,

I am trying to start swift gateway

unfortunately I get message

in /var/log/messages
unable to stat setuser_match_path /var/lib/ceph/$type/$cluster-$id:
(2) No such file or directory

[ceph@ceph-swift-gateway ~]$ sudo systemctl status
ceph-rado...@rgw.ceph-swift-gateway
● ceph-rado...@rgw.ceph-swift-gateway.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service;
enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Fri 2016-11-25 12:54:11
MSK; 1s ago
  Process: 21031 ExecStart=/usr/bin/radosgw -f --cluster ${CLUSTER}
--name client.%i --setuser ceph --setgroup ceph (code=exited,
status=1/FAILURE)
 Main PID: 21031 (code=exited, status=1/FAILURE)

Nov 25 12:54:10 ceph-swift-gateway.pnpi.spb.ru systemd[1]: Unit
ceph-rado...@rgw.ceph-swift-gateway.service entered failed state.
Nov 25 12:54:10 ceph-swift-gateway.pnpi.spb.ru systemd[1]:
ceph-rado...@rgw.ceph-swift-gateway.service failed.
Nov 25 12:54:11 ceph-swift-gateway.pnpi.spb.ru systemd[1]:
ceph-rado...@rgw.ceph-swift-gateway.service holdoff time over,
scheduling restart.
Nov 25 12:54:11 ceph-swift-gateway.pnpi.spb.ru systemd[1]: start
request repeated too quickly for
ceph-rado...@rgw.ceph-swift-gateway.service
Nov 25 12:54:11 ceph-swift-gateway.pnpi.spb.ru systemd[1]: Failed to
start Ceph rados gateway.
Nov 25 12:54:11 ceph-swift-gateway.pnpi.spb.ru systemd[1]: Unit
ceph-rado...@rgw.ceph-swift-gateway.service entered failed state.
Nov 25 12:54:11 ceph-swift-gateway.pnpi.spb.ru systemd[1]:
ceph-rado...@rgw.ceph-swift-gateway.service failed.

Actually I am not sure that I understand what does mean the $id: in message line

In the directory /var/lib/ceph/  I have
[ceph@ceph-swift-gateway ~]$ ls -l /var/lib/ceph/
total 0
drwxr-x--- 2 ceph ceph   6 Sep 21 14:35 bootstrap-mds
drwxr-x--- 2 ceph ceph   6 Sep 21 14:35 bootstrap-osd
drwxr-x--- 2 ceph ceph  25 Nov 21 19:06 bootstrap-rgw
drwxr-x--- 2 ceph ceph   6 Sep 21 14:35 mds
drwxr-x--- 2 ceph ceph   6 Sep 21 14:35 mon
drwxr-x--- 2 ceph ceph   6 Sep 21 14:35 osd
drwxr-xr-x 8 ceph ceph 135 Nov 23 21:36 radosgw
drwxr-x--- 2 ceph ceph   6 Sep 21 14:35 tmp


and in radosgw I have

[ceph@ceph-swift-gateway ~]$ ls -l /var/lib/ceph/radosgw/
total 0
drwxr-xr-x 2 ceph ceph  6 Nov 23 21:30 ceph-0
drwxrwxr-x 2 ceph ceph  6 Nov 23 21:36 ceph-gw0
drwxr-xr-x 2 ceph ceph  6 Nov 23 21:20 ceph-radosgw
drwxrwxr-x 2 ceph ceph  6 Nov 23 21:33 ceph-radosgw0
drwxr-xr-x 2 ceph ceph  6 Nov 23 21:20 ceph-rgw.ceph-radosgw
drwxr-xr-x 2 ceph ceph 45 Nov 21 19:17 ceph-rgw.ceph-swift-gateway


[ceph@ceph-swift-gateway ~]$ ceph -v
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
[ceph@ceph-swift-gateway ~]$ hostname -s
ceph-swift-gateway



Thanks in advance to anybody who can give an idea






-- 
Andrey Y Shevel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Introducing DeepSea: A tool for deploying Ceph using Salt

2016-11-25 Thread M Ranga Swami Reddy
Hello Tim,
Can you please confirm, if the DeepSea works on Ubuntu also?

Thanks
Swami

On Thu, Nov 3, 2016 at 11:22 AM, Tim Serong  wrote:

> Hi All,
>
> I thought I should make a little noise about a project some of us at
> SUSE have been working on, called DeepSea.  It's a collection of Salt
> states, runners and modules for orchestrating deployment of Ceph
> clusters.  To help everyone get a feel for it, I've written a blog post
> which walks through using DeepSea to set up a small test cluster:
>
>   http://ourobengr.com/2016/11/hello-salty-goodness/
>
> If you'd like to try it out yourself, the code is on GitHub:
>
>   https://github.com/SUSE/DeepSea
>
> More detailed documentation can be found at:
>
>   https://github.com/SUSE/DeepSea/wiki/intro
>   https://github.com/SUSE/DeepSea/wiki/management
>   https://github.com/SUSE/DeepSea/wiki/policy
>
> Usual story: feedback, issues, pull requests are all welcome ;)
>
> Enjoy,
>
> Tim
> --
> Tim Serong
> Senior Clustering Engineer
> SUSE
> tser...@suse.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-25 Thread Nick Fisk
Possibly, do you know the exact steps to reproduce? I'm guessing the PG 
splitting was the cause, but whether this on its own would cause the problem or 
also needs the introduction of new OSD's at the same time, might make tracing 
the cause hard.

> -Original Message-
> From: Daznis [mailto:daz...@gmail.com]
> Sent: 24 November 2016 19:44
> To: Nick Fisk 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
> 
> I will try it, but I wanna see if it stays stable for a few days. Not sure if 
> I should report this bug or not.
> 
> On Thu, Nov 24, 2016 at 6:05 PM, Nick Fisk  wrote:
> > Can you add them with different ID's, it won't look pretty but might get 
> > you out of this situation?
> >
> >> -Original Message-
> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> >> Of Daznis
> >> Sent: 24 November 2016 15:43
> >> To: Nick Fisk 
> >> Cc: ceph-users 
> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
> >>
> >> Yes, unfortunately, it is. And the story still continues. I have
> >> noticed that only 4 OSD are doing this and zapping and readding it
> >> does not solve the issue. Removing them completely from the cluster solve 
> >> that issue, but I can't reuse their ID's. If I add another
> one with the same ID it starts doing the same "funky" crashes. For now the 
> cluster remains "stable" without the OSD's.
> >>
> >>
> >>
> >>
> >> On Wed, Nov 23, 2016 at 4:00 PM, Nick Fisk  wrote:
> >> > I take it you have size =2 or min_size=1 or something like that for the 
> >> > cache pool? 1 OSD shouldn’t prevent PG's from recovering.
> >> >
> >> > Your best bet would be to see if the PG that is causing the assert
> >> > can be removed and let the OSD start up. If you are lucky, the PG
> >> causing the problems might not be one which also has unfound objects,
> >> otherwise you are likely have to get heavily involved in recovering 
> >> objects with the object store tool.
> >> >
> >> >> -Original Message-
> >> >> From: Daznis [mailto:daz...@gmail.com]
> >> >> Sent: 23 November 2016 13:56
> >> >> To: Nick Fisk 
> >> >> Cc: ceph-users 
> >> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
> >> >>
> >> >> No, it's still missing some PGs and objects and can't recover as
> >> >> it's blocked by that OSD. I can boot the OSD up by removing all
> >> >> the PG related files from current directory, but that doesn't
> >> >> solve the missing objects problem. Not really sure if I can move
> >> >> the object
> >> back to their place manually, but I will try it.
> >> >>
> >> >> On Wed, Nov 23, 2016 at 3:08 PM, Nick Fisk  wrote:
> >> >> > Sorry, I'm afraid I'm out of ideas about that one, that error
> >> >> > doesn't mean very much to me. The code suggests the OSD is
> >> >> > trying to
> >> >> get an attr from the disk/filesystem, but for some reason it
> >> >> doesn't like that. You could maybe whack the debug logging for OSD
> >> >> and filestore up to max and try and see what PG/file is accessed
> >> >> just before the crash, but I'm not sure what the fix would be,
> >> >> even if
> >> you manage to locate the dodgy PG.
> >> >> >
> >> >> > Does the cluster have all PG's recovered now? Unless anyone else
> >> >> > can comment, you might be best removing/wiping and then re-
> >> >> adding the OSD.
> >> >> >
> >> >> >> -Original Message-
> >> >> >> From: Daznis [mailto:daz...@gmail.com]
> >> >> >> Sent: 23 November 2016 12:55
> >> >> >> To: Nick Fisk 
> >> >> >> Cc: ceph-users 
> >> >> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache 
> >> >> >> OSD.
> >> >> >>
> >> >> >> Thank you. That helped quite a lot. Now I'm just stuck with one OSD 
> >> >> >> crashing with:
> >> >> >>
> >> >> >> osd/PG.cc: In function 'static int
> >> >> >> PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
> >> >> >> ceph::bufferlist*)' thread 7f36bbdd6880 time
> >> >> >> 2016-11-23 13:42:43.27
> >> >> >> 8539
> >> >> >> osd/PG.cc: 2911: FAILED assert(r > 0)
> >> >> >>
> >> >> >>  ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
> >> >> >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int,
> >> >> >> char
> >> >> >> const*)+0x85) [0xbde2c5]
> >> >> >>  2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*,
> >> >> >> ceph::buffer::list*)+0x8ba) [0x7cf4da]
> >> >> >>  3: (OSD::load_pgs()+0x9ef) [0x6bd31f]
> >> >> >>  4: (OSD::init()+0x181a) [0x6c0e8a]
> >> >> >>  5: (main()+0x29dd) [0x6484bd]
> >> >> >>  6: (__libc_start_main()+0xf5) [0x7f36b916bb15]
> >> >> >>  7: /usr/bin/ceph-osd() [0x661ea9]
> >> >> >>
> >> >> >> On Wed, Nov 23, 2016 at 12:31 PM, Nick Fisk  wrote:
> >> >> >> >> -Original Message-
> >> >> >> >> From: Daznis [mailto:daz...@gmail.com]
> >> >> >> >> Sent: 23 November 2016 10:17
> >> >> >> >> To: n...@fisk.me.uk
> >> >> >> >> Cc: ceph-users 
> >> >> >> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache 
> >> >> >> >> OSD.
> 

Re: [ceph-users] Ceph OSDs cause kernel unresponsive

2016-11-25 Thread Nick Fisk
Hi,

 

I didn’t so the maths, so maybe 7GB isn’t worth tuning for, although every 
little helps ;-)

 

I don’t believe peering or recovery should effect this value, but other things 
will consume memory during recovery, but I’m not aware if this can be limited 
or tuned.

 

Yes, the write and read cache’s will consume memory and may limit Linux’s 
ability to react quickly enough in tight memory conditions. I believe you can 
be in a state where it looks like you have more memory potentially available 
than actually is usable at that point in time. The min_free_bytes can help 
here. 

 

From: Craig Chi [mailto:craig...@synology.com] 
Sent: 25 November 2016 01:46
To: Brad Hubbard 
Cc: Nick Fisk ; Ceph Users 
Subject: Re: [ceph-users] Ceph OSDs cause kernel unresponsive

 

Hi Nick,

 

I have seen the report before, if I understand correctly, the 
osd_map_cache_size generally introduces a fixed amount of memory usage. We are 
using the default value of 200, and a single osd map I got from our cluster is 
404KB.

 

That is totally 404KB * 200 * 90 (osds) = about 7GB on each node.

 

Will the memory consumption generated by this factor become larger when 
unstably peering or recovering? If not, we still need to find the root cause of 
why free memory drops without control.

 

Does anyone know that what is the relation between filestore or journal 
configurations and the OSD's memory consumption? Is it possible that the 
filestore queue or journal queue occupy huge memory pages and cause filesystem 
cache hard to release (and result in oom)?

 

At last, about nobarrier, I fully knew the consequence and is seriously testing 
on this option. Sincerely appreciate your kindness and useful suggestions.

 

Sincerely,
Craig Chi

On 2016-11-25 07:23, Brad Hubbard  wrote:

Two of these appear to be hung task timeouts and the other is an invalid opcode.

There is no evidence here of memory exhaustion (although it remains to be seen 
whether this is a factor but I'd expect to see evidence of shrinker activity in 
the stacks) and I would speculate the increased memory utilisation is due to 
the issues with the OSD tasks.

I would suggest that the next step here is to work out specifically why the 
invalid opcode happened and/or why kernel tasks are hanging for > 120 seconds.

To do that you may need to capture a vmcore and analyse it and/or engage your 
kernel support team to investigate further.
 

 

On Fri, Nov 25, 2016 at 8:26 AM, Nick Fisk mailto:n...@fisk.me.uk> > wrote:

There’s a couple of things you can do to reduce memory usage by limiting the 
number of OSD maps each OSD stores, but you will still be pushing up against 
the limits of the ram you have available. There is a Cern 30PB test (should be 
on google) which gives some details on some of the settings, but quite a few 
are no longer relevant in jewel.

 

Once other thing, I saw you have nobarrier set on mount options. Please please 
please understand the consequences of this option

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
 ] On Behalf Of Craig Chi
Sent: 24 November 2016 10:37
To: Nick Fisk mailto:n...@fisk.me.uk> >
Cc: ceph-users@lists.ceph.com  
Subject: Re: [ceph-users] Ceph OSDs cause kernel unresponsive

 

Hi Nick,

 

Thank you for your helpful information.

 

I knew that Ceph recommends 1GB/1TB RAM, but we are not going to change the 
hardware architecture now.

Are there any methods to set the resource limit one OSD can consume?

 

And for your question, we currently set system configuration as:

 

vm.swappiness=10
kernel.pid_max=4194303
fs.file-max=26234859
vm.zone_reclaim_mode=0
vm.vfs_cache_pressure=50
vm.min_free_kbytes=4194303

 

I would try to configure vm.min_free_kbytes larger and test.

I will be grateful if anyone has the experience of how to tune these values for 
Ceph.

 

Sincerely,
Craig Chi

 

On 2016-11-24 17:48, Nick Fisk mailto:n...@fisk.me.uk> > 
wrote:

Hi Craig,

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
 ] On Behalf Of Craig Chi
Sent: 24 November 2016 08:34
To: ceph-users@lists.ceph.com  
Subject: [ceph-users] Ceph OSDs cause kernel unresponsive

 

Hi Cephers,

We have encountered kernel hanging issue on our Ceph cluster. Just like 
http://imgur.com/a/U2Flz 

  , http://imgur.com/a/lyEko 


[ceph-users] Ceph performance laggy (requests blocked > 32) on OpenStack

2016-11-25 Thread Kevin Olbrich
Hi,

we are running 80 VMs using KVM in OpenStack via RBD in Ceph Jewel on a
total of 53 disks (RAID parity already excluded).
Our nodes are using Intel P3700 DC-SSDs for journaling.

Most VMs are linux based and load is low to medium. There are also about 10
VMs running Windows 2012R2, two of them run remote services (terminal).

My question is: Are 80 VMs hosted on 53 disks (mostly 7.2k SATA) to much?
We sometime experience lags where nearly all servers suffer from "blocked
IO > 32" seconds.

What are your experiences?

Mit freundlichen Grüßen / best regards,
Kevin Olbrich.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance laggy (requests blocked > 32) on OpenStack

2016-11-25 Thread RDS
If I use slow HDD, I can get the same outcome. Placing journals on fast SAS or 
NVMe SSD will make a difference. If you are using SATA SSD, those SSD are much 
slower. Instead of guessing why Ceph is lagging, have you looked at ceph -w and 
iostat and vmstat reports during your tests? Io stat will tell you HDD and SSD 
stats (i use the commands: iostat -tzxm 5 to show only active disks). If they 
are dedicated luns, then look at %utilization and service times. Looking at 
vmstat, check the ‘b’ column which shows how often your system is blocked 
waiting on io.

> On Nov 25, 2016, at 8:48 AM, Kevin Olbrich  wrote:
> 
> Hi,
> 
> we are running 80 VMs using KVM in OpenStack via RBD in Ceph Jewel on a total 
> of 53 disks (RAID parity already excluded).
> Our nodes are using Intel P3700 DC-SSDs for journaling.
> 
> Most VMs are linux based and load is low to medium. There are also about 10 
> VMs running Windows 2012R2, two of them run remote services (terminal).
> 
> My question is: Are 80 VMs hosted on 53 disks (mostly 7.2k SATA) to much? We 
> sometime experience lags where nearly all servers suffer from "blocked IO > 
> 32" seconds.
> 
> What are your experiences?
> 
> Mit freundlichen Grüßen / best regards,
> Kevin Olbrich.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Rick Stehno


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-25 Thread Daznis
I think it's because of these errors:

2016-11-25 14:51:25.644495 7fb73eef8700 -1 log_channel(cluster) log
[ERR] : 14.28 deep-scrub stat mismatch, got 145/144 objects, 0/0
clones, 57/57 dirty, 0/0 omap, 54/53 hit_set_archive, 0/0 whiteouts,
365399477/365399252 bytes,51328/51103 hit_set_archive bytes.

2016-11-25 14:55:56.529405 7f89bae5a700 -1 log_channel(cluster) log
[ERR] : 13.dd deep-scrub stat mismatch, got 149/148 objects, 0/0
clones, 55/55 dirty, 0/0 omap, 63/61 hit_set_archive, 0/0 whiteouts,
360765725/360765503 bytes,55581/54097 hit_set_archive bytes.

I have no clue why they appeared. The cluster was running fine for
months so I have no logs on how it happened. I just enabled them after
"shit hit the fan".


On Fri, Nov 25, 2016 at 12:26 PM, Nick Fisk  wrote:
> Possibly, do you know the exact steps to reproduce? I'm guessing the PG 
> splitting was the cause, but whether this on its own would cause the problem 
> or also needs the introduction of new OSD's at the same time, might make 
> tracing the cause hard.
>
>> -Original Message-
>> From: Daznis [mailto:daz...@gmail.com]
>> Sent: 24 November 2016 19:44
>> To: Nick Fisk 
>> Cc: ceph-users 
>> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
>>
>> I will try it, but I wanna see if it stays stable for a few days. Not sure 
>> if I should report this bug or not.
>>
>> On Thu, Nov 24, 2016 at 6:05 PM, Nick Fisk  wrote:
>> > Can you add them with different ID's, it won't look pretty but might get 
>> > you out of this situation?
>> >
>> >> -Original Message-
>> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>> >> Of Daznis
>> >> Sent: 24 November 2016 15:43
>> >> To: Nick Fisk 
>> >> Cc: ceph-users 
>> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
>> >>
>> >> Yes, unfortunately, it is. And the story still continues. I have
>> >> noticed that only 4 OSD are doing this and zapping and readding it
>> >> does not solve the issue. Removing them completely from the cluster solve 
>> >> that issue, but I can't reuse their ID's. If I add another
>> one with the same ID it starts doing the same "funky" crashes. For now the 
>> cluster remains "stable" without the OSD's.
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Nov 23, 2016 at 4:00 PM, Nick Fisk  wrote:
>> >> > I take it you have size =2 or min_size=1 or something like that for the 
>> >> > cache pool? 1 OSD shouldn’t prevent PG's from recovering.
>> >> >
>> >> > Your best bet would be to see if the PG that is causing the assert
>> >> > can be removed and let the OSD start up. If you are lucky, the PG
>> >> causing the problems might not be one which also has unfound objects,
>> >> otherwise you are likely have to get heavily involved in recovering 
>> >> objects with the object store tool.
>> >> >
>> >> >> -Original Message-
>> >> >> From: Daznis [mailto:daz...@gmail.com]
>> >> >> Sent: 23 November 2016 13:56
>> >> >> To: Nick Fisk 
>> >> >> Cc: ceph-users 
>> >> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
>> >> >>
>> >> >> No, it's still missing some PGs and objects and can't recover as
>> >> >> it's blocked by that OSD. I can boot the OSD up by removing all
>> >> >> the PG related files from current directory, but that doesn't
>> >> >> solve the missing objects problem. Not really sure if I can move
>> >> >> the object
>> >> back to their place manually, but I will try it.
>> >> >>
>> >> >> On Wed, Nov 23, 2016 at 3:08 PM, Nick Fisk  wrote:
>> >> >> > Sorry, I'm afraid I'm out of ideas about that one, that error
>> >> >> > doesn't mean very much to me. The code suggests the OSD is
>> >> >> > trying to
>> >> >> get an attr from the disk/filesystem, but for some reason it
>> >> >> doesn't like that. You could maybe whack the debug logging for OSD
>> >> >> and filestore up to max and try and see what PG/file is accessed
>> >> >> just before the crash, but I'm not sure what the fix would be,
>> >> >> even if
>> >> you manage to locate the dodgy PG.
>> >> >> >
>> >> >> > Does the cluster have all PG's recovered now? Unless anyone else
>> >> >> > can comment, you might be best removing/wiping and then re-
>> >> >> adding the OSD.
>> >> >> >
>> >> >> >> -Original Message-
>> >> >> >> From: Daznis [mailto:daz...@gmail.com]
>> >> >> >> Sent: 23 November 2016 12:55
>> >> >> >> To: Nick Fisk 
>> >> >> >> Cc: ceph-users 
>> >> >> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache 
>> >> >> >> OSD.
>> >> >> >>
>> >> >> >> Thank you. That helped quite a lot. Now I'm just stuck with one OSD 
>> >> >> >> crashing with:
>> >> >> >>
>> >> >> >> osd/PG.cc: In function 'static int
>> >> >> >> PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*,
>> >> >> >> ceph::bufferlist*)' thread 7f36bbdd6880 time
>> >> >> >> 2016-11-23 13:42:43.27
>> >> >> >> 8539
>> >> >> >> osd/PG.cc: 2911: FAILED assert(r > 0)
>> >> >> >>
>> >> >> >>  ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2

Re: [ceph-users] about using SSD in cephfs, attached with some quantified benchmarks

2016-11-25 Thread John Spray
On Fri, Nov 25, 2016 at 8:16 AM, JiaJia Zhong 
wrote:

> confusing questions: (ceph0.94)
>
> 1. Is there any way to cache the whole metadata datas into MDS's memory ?
>   (metadata osds dates-async> MDS memory)
>
> I dunno if I misunderstand the role of mds :(,   so many post threads that
> advising Using SSD osds for metadata.
> the metadata stores Inode information for files. Yes, It's fast to stat,
> ls, readdir for cephfs,
> but If metadatas could be cached in memory,  metadata osds datas
> -async> MDS memory, I guess, this may be better ?
> we can use ssd journals, so write speed would not be the bottleneck.
>  cached mdatadatas are not large even if there are huge number of files.  (
> I got that MooseFS strores all metadata in memory ?)
>

The MDS does cache your metadata in memory, but it also needs to quickly
write it safely to disk to fully commit a metadata operation and allow
clients to proceed.  Even if your metadata fits entirely in memory (i.e.
you have fewer than mds_cache_size files) you will still want a fast
metadata pool.

John


> 2.
> Any descriptions for the Journal under hood ? though it's like swap
> partition for Linux ~
>
> using a Intel PCIE SSD as journals of HDD osds,
> I ran command blow for a  rough benchmark all of the osds simultaneously,
>
> # for i in $(ps aux | grep osd | awk '{print $14}' | grep -v "^$" | sort);
> do ceph tell osd.$i bench & done
>
> compared to anther HOST without SSD journal, these got a better
> bytes_per_sec, rising about more than 100%.
> HDD journal OSDS  30MB/s --> HDD OSDS with SSD journal more than
> 60MB/s.   (12 osds/host, hosts are almost same)
>
> MB/s (HDD journal + HDD)
> 39
> 35
> 35
> 35
> 33
> 31
> 29
> 26
> 26
> 26
> 26
> 25
> the top 39MB/s one an stata SSD OSD with stata SSD journal, but the speed
> seems to be  not faster than the otheres with HDD Journal + HDD Data.
>
> MB/s (PCIE SSD Journal + HDD)
> 195
> 129
> 92
> 88
> 71
> 71
> 65
> 61
> 57
> 54
> 52
> 50
> that 195MB/s is PCIE SSD Journal + SSD Data, the speed seems to be very
> fast. others are PCIE SSD Journal + HDD Data,
>
>
> "bytes_per_sec": 166451390.00 for single bench on (PCIE Journal + HDD)
>158.74MB/s
> "bytes_per_sec": 78472933.00 for single bench on   (HDD Journal + HDD)
>   74.83MB/s
>
> It seems that  "data ---> HDD Journal" is probable the main bottleneck ?
> how to track this
> data > SSD Journal --> osd data partition
> data > HDD Journal -> osd data partition
>
> 3.
> any  cache or memory suggestion for better performance for cephfs?
>
> key ceph.conf as below
> [global]
> osd pool default size = 2
> osd pool default min size = 1
> osd pool default pg num = 512
> osd pool default pgp num = 512
> osd journal size = 1
>
> [mds]
> mds cache size = 11474836
>
> [osd]
> osd op threads = 4
> filestore op threads = 4
> osd crush update on start = false
> #256M
> osd max write size = 256
> #256M
> journal max write bytes = 268435456
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph strange issue after adding a cache OSD.

2016-11-25 Thread Nick Fisk
It might be worth trying to raise a ticket with those errors and say that you 
believe they occurred after PG splitting on the cache tier and also include the 
asserts you originally posted.

> -Original Message-
> From: Daznis [mailto:daz...@gmail.com]
> Sent: 25 November 2016 13:59
> To: Nick Fisk 
> Cc: ceph-users 
> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
> 
> I think it's because of these errors:
> 
> 2016-11-25 14:51:25.644495 7fb73eef8700 -1 log_channel(cluster) log [ERR] : 
> 14.28 deep-scrub stat mismatch, got 145/144 objects, 0/0
> clones, 57/57 dirty, 0/0 omap, 54/53 hit_set_archive, 0/0 whiteouts,
> 365399477/365399252 bytes,51328/51103 hit_set_archive bytes.
> 
> 2016-11-25 14:55:56.529405 7f89bae5a700 -1 log_channel(cluster) log [ERR] : 
> 13.dd deep-scrub stat mismatch, got 149/148 objects, 0/0
> clones, 55/55 dirty, 0/0 omap, 63/61 hit_set_archive, 0/0 whiteouts,
> 360765725/360765503 bytes,55581/54097 hit_set_archive bytes.
> 
> I have no clue why they appeared. The cluster was running fine for months so 
> I have no logs on how it happened. I just enabled them
> after "shit hit the fan".
> 
> 
> On Fri, Nov 25, 2016 at 12:26 PM, Nick Fisk  wrote:
> > Possibly, do you know the exact steps to reproduce? I'm guessing the PG 
> > splitting was the cause, but whether this on its own would
> cause the problem or also needs the introduction of new OSD's at the same 
> time, might make tracing the cause hard.
> >
> >> -Original Message-
> >> From: Daznis [mailto:daz...@gmail.com]
> >> Sent: 24 November 2016 19:44
> >> To: Nick Fisk 
> >> Cc: ceph-users 
> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
> >>
> >> I will try it, but I wanna see if it stays stable for a few days. Not sure 
> >> if I should report this bug or not.
> >>
> >> On Thu, Nov 24, 2016 at 6:05 PM, Nick Fisk  wrote:
> >> > Can you add them with different ID's, it won't look pretty but might get 
> >> > you out of this situation?
> >> >
> >> >> -Original Message-
> >> >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> >> >> Behalf Of Daznis
> >> >> Sent: 24 November 2016 15:43
> >> >> To: Nick Fisk 
> >> >> Cc: ceph-users 
> >> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache OSD.
> >> >>
> >> >> Yes, unfortunately, it is. And the story still continues. I have
> >> >> noticed that only 4 OSD are doing this and zapping and readding it
> >> >> does not solve the issue. Removing them completely from the
> >> >> cluster solve that issue, but I can't reuse their ID's. If I add
> >> >> another
> >> one with the same ID it starts doing the same "funky" crashes. For now the 
> >> cluster remains "stable" without the OSD's.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Nov 23, 2016 at 4:00 PM, Nick Fisk  wrote:
> >> >> > I take it you have size =2 or min_size=1 or something like that for 
> >> >> > the cache pool? 1 OSD shouldn’t prevent PG's from
> recovering.
> >> >> >
> >> >> > Your best bet would be to see if the PG that is causing the
> >> >> > assert can be removed and let the OSD start up. If you are
> >> >> > lucky, the PG
> >> >> causing the problems might not be one which also has unfound
> >> >> objects, otherwise you are likely have to get heavily involved in 
> >> >> recovering objects with the object store tool.
> >> >> >
> >> >> >> -Original Message-
> >> >> >> From: Daznis [mailto:daz...@gmail.com]
> >> >> >> Sent: 23 November 2016 13:56
> >> >> >> To: Nick Fisk 
> >> >> >> Cc: ceph-users 
> >> >> >> Subject: Re: [ceph-users] Ceph strange issue after adding a cache 
> >> >> >> OSD.
> >> >> >>
> >> >> >> No, it's still missing some PGs and objects and can't recover
> >> >> >> as it's blocked by that OSD. I can boot the OSD up by removing
> >> >> >> all the PG related files from current directory, but that
> >> >> >> doesn't solve the missing objects problem. Not really sure if I
> >> >> >> can move the object
> >> >> back to their place manually, but I will try it.
> >> >> >>
> >> >> >> On Wed, Nov 23, 2016 at 3:08 PM, Nick Fisk  wrote:
> >> >> >> > Sorry, I'm afraid I'm out of ideas about that one, that error
> >> >> >> > doesn't mean very much to me. The code suggests the OSD is
> >> >> >> > trying to
> >> >> >> get an attr from the disk/filesystem, but for some reason it
> >> >> >> doesn't like that. You could maybe whack the debug logging for
> >> >> >> OSD and filestore up to max and try and see what PG/file is
> >> >> >> accessed just before the crash, but I'm not sure what the fix
> >> >> >> would be, even if
> >> >> you manage to locate the dodgy PG.
> >> >> >> >
> >> >> >> > Does the cluster have all PG's recovered now? Unless anyone
> >> >> >> > else can comment, you might be best removing/wiping and then
> >> >> >> > re-
> >> >> >> adding the OSD.
> >> >> >> >
> >> >> >> >> -Original Message-
> >> >> >> >> From: Daznis [mailto:daz...@gmail.com]
> >> >> >> >> Sent: 23 November 

[ceph-users] CoW clone performance

2016-11-25 Thread Kees Meijs
Hi list,

We're using CoW clones (using OpenStack via Glance and Cinder) to store
virtual machine images.

For example:

> # rbd info cinder-volumes/volume-a09bd74b-f100-4043-a422-5e6be20d26b2
> rbd image 'volume-a09bd74b-f100-4043-a422-5e6be20d26b2':
> size 25600 MB in 3200 objects
> order 23 (8192 kB objects)
> block_name_prefix: rbd_data.c569832b851bc
> format: 2
> features: layering, striping
> flags:
> parent: glance-images/37a54104-fe3c-4e2a-a94b-da0f3776e1ac@snap
> overlap: 4096 MB
> stripe unit: 8192 kB
> stripe count: 1

It seems our storage cluster writes a lot, also when the virtualization
cluster isn't loaded at all and there seem to be more writes than reads.
In general that is, which is quite odd and unexpected.

In addition, performance is not as good as we would like.

Can someone please share their thoughts on this matter and for example
at flattening (or maybe not) the volumes.

Thanks in advance!

Cheers,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] general ceph cluster design

2016-11-25 Thread nick
Hi,
we are currently planning a new ceph cluster which will be used for 
virtualization (providing RBD storage for KVM machines) and we have some 
general questions.

* Is it advisable to have one ceph cluster spread over multiple datacenters 
(latency is low, as they are not so far from each other)? Is anybody doing 
this in a production setup? We know that any network issue would affect virtual 
machines in all locations instead just one, but we can see a lot of advantages 
as well.

* We are planning to combine the hosts for ceph and KVM (so far we are using 
seperate hosts for virtual machines and ceph storage). We see the big 
advantage (next to the price drop) of an automatic ceph expansion when adding 
more compute nodes as we got into situations in the past where we had too many 
compute nodes and the ceph cluster was not expanded properly (performance 
dropped over time). On the other side there would be changes to the crush map 
every time we add a compute node and that might end in a lot of data movement 
in ceph. Is anybody using combined servers for compute and ceph storage and 
has some experience?

* is there a maximum amount of OSDs in a ceph cluster? We are planning to use 
a minimum of 8 OSDs per server and going to have a cluster with about 100 
servers which would end in about 800 OSDs.

Thanks for any help...

Cheers
Nick

signature.asc
Description: This is a digitally signed message part.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance laggy (requests blocked > 32) on OpenStack

2016-11-25 Thread Thomas Danan
Hi Kévin,  I am currently having a similar issue. in my env I have around 16 
Linux vms (vmware) more or less equaly loaded accessing a 1PB ceph hammer 
cluster (40 dn, 800 osds) through rbd.

Very often we have IO freeze on the VM xfs FS and we also continuously have 
slow requests on osd ( up to 10/20 minutes sometime ).
In my case the slow requests / blocked ops are because primary osd is waiting 
for subops i.e. waiting for replication to happen on secondary osd. In my case 
not all the VM are  blocked at the same time . ..

I still do not have explanation, root cause, nor wa 

Will keep you Inormed if I find something ...



Sent from my Samsung device


 Original message 
From: Kevin Olbrich 
Date: 11/25/16 19:19 (GMT+05:30)
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Ceph performance laggy (requests blocked > 32) on 
OpenStack

Hi,

we are running 80 VMs using KVM in OpenStack via RBD in Ceph Jewel on a total 
of 53 disks (RAID parity already excluded).
Our nodes are using Intel P3700 DC-SSDs for journaling.

Most VMs are linux based and load is low to medium. There are also about 10 VMs 
running Windows 2012R2, two of them run remote services (terminal).

My question is: Are 80 VMs hosted on 53 disks (mostly 7.2k SATA) to much? We 
sometime experience lags where nearly all servers suffer from "blocked IO > 32" 
seconds.

What are your experiences?

Mit freundlichen Grüßen / best regards,
Kevin Olbrich.



This electronic message contains information from Mycom which may be privileged 
or confidential. The information is intended to be for the use of the 
individual(s) or entity named above. If you are not the intended recipient, be 
aware that any disclosure, copying, distribution or any other use of the 
contents of this information is prohibited. If you have received this 
electronic message in error, please notify us by post or telephone (to the 
numbers or correspondence address above) or by email (at the email address 
above) immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Introducing DeepSea: A tool for deploying Ceph using Salt

2016-11-25 Thread Lenz Grimmer
Hi Swami,

On 11/25/2016 11:04 AM, M Ranga Swami Reddy wrote:

> Can you please confirm, if the DeepSea works on Ubuntu also?

Not yet, as far as I can tell, but testing/feedback/patches are very
welcome ;)

One of the benefits of using Salt is that it supports multiple
distributions. However, currently the Salt scripts in DeepSea contain a
few SUSE-specific functions, e.g. using "zypper" to install additional
packages at some places. That needs to be replaced with the generic
functions provided by Salt, for example.

BTW, we're in the process of setting up a dedicated mailing list for
DeepSea - we'll let you know when it's set up.

Lenz



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] general ceph cluster design

2016-11-25 Thread Maxime Guyot
Hi Nick,

See inline comments.

Cheers,
Maxime 

On 25/11/16 16:01, "ceph-users on behalf of nick" 
 wrote:

>Hi,
>we are currently planning a new ceph cluster which will be used for 
>virtualization (providing RBD storage for KVM machines) and we have some 
>general questions.
>
>* Is it advisable to have one ceph cluster spread over multiple 
> datacenters 
>(latency is low, as they are not so far from each other)? Is anybody doing 
>this in a production setup? We know that any network issue would affect 
> virtual 
>machines in all locations instead just one, but we can see a lot of 
> advantages 
>as well.

I think the general consensus is to limit the size of the failure domain. That 
said, it depends the use case and what you mean by “multiple datacenters” and 
“latency is low”: writes will have to be journal-ACK:ed  by the OSDs in the 
other datacenter. If there is 10ms latency between Location1 and Location2, 
then it would add 10ms to each write operation if crushmap requires replicas in 
each location. Speaking of which a 3rd location would help with sorting our 
quorum (1 mon at each location) in “triangle” configuration.

If this is for DR: RBD-mirroring is supposed to address that, you might not 
want to have 1 big cluster ( = failure domain).
If this is for VM live migration: Usually requires spread L2 adjacency (failure 
domain) or overlays (VXLAN and the likes), “network trombone” effect can be a 
problem depending on the setup

I know of Nantes University who used/is using a 3 datacenter Ceph cluster:  
http://dachary.org/?p=2087 

>
>* We are planning to combine the hosts for ceph and KVM (so far we are 
> using 
>seperate hosts for virtual machines and ceph storage). We see the big 
>advantage (next to the price drop) of an automatic ceph expansion when 
> adding 
>more compute nodes as we got into situations in the past where we had too 
> many 
>compute nodes and the ceph cluster was not expanded properly (performance 
>dropped over time). On the other side there would be changes to the crush 
> map 
>every time we add a compute node and that might end in a lot of data 
> movement 
>in ceph. Is anybody using combined servers for compute and ceph storage 
> and 
>has some experience?

The challenge is to avoid ceph-osd to become a noisy neighbor for the VMs 
hosted on the hypervisor, especially under recovery. I’ve heard people using 
CPU pinning, containers, and QoS to keep it under control.
Sebastian has an article on his blog this topic: 
https://www.sebastien-han.fr/blog/2016/07/11/Quick-dive-into-hyperconverged-architecture-with-OpenStack-and-Ceph/
 

For the performance dropped over time, you can look to improve your 
capacity:performance ratio.

>* is there a maximum amount of OSDs in a ceph cluster? We are planning to 
> use 
>a minimum of 8 OSDs per server and going to have a cluster with about 100 
>servers which would end in about 800 OSDs.

There are a couple of thread from the ML about this: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028371.html  and 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-November/014246.html 

>
>Thanks for any help...
>
>Cheers
>Nick

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] docker storage driver

2016-11-25 Thread Pedro Benites

Hi,

I want to configure a registry with "Ceph rados storage driver", but 
after start the registry with "docker run -d  -p 5000:5000 
--restart=always --name registry -v 
`pwd`/config.yml:/etc/docker/registry/config.yml registry:2"


I got this error in the dockers logs:

"panic: StorageDriver not registered: rados"

Someone know why I got that error o whats mean?

This is my configuration file for the registry:

version: 0.1
log:
  level: debug
  formatter: text
  fields:
service: registry
storage:
  cache:
blobdescriptor: inmemory
  rados:
poolname: docker_rep
username: admin
http:
  addr: :5000
  headers:
X-Content-Type-Options: [nosniff]
health:
  storagedriver:
enabled: true
interval: 10s
threshold: 3


Regards.
Pedro.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH mirror down again

2016-11-25 Thread Vy Nguyen Tan
Hello,

I want to install CEPH on new nodes but I can't reach CEPH repo, It seems
the repo are broken. I am using CentOS 7.2 and ceph-deploy 1.5.36.

*[root@cp ~]# ping -c 3 download.ceph.com *

*PING download.ceph.com  (173.236.253.173) 56(84)
bytes of data.*


*--- download.ceph.com  ping statistics ---*

*3 packets transmitted, 0 received, 100% packet loss, time 11999ms*


*[root@cp ~]# curl https://download.ceph.com/debian-jewel/
*

*curl: (7) Failed to connect to 2607:f298:6050:51f3:f816:3eff:fe71:9135:
Network is unreachable*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH mirror down again

2016-11-25 Thread Matt Taylor

Hey,

There are many alternate mirrors available:

http://docs.ceph.com/docs/jewel/install/mirrors/

Pick the closest one to you. :)

Cheers,
Matt.

On 26/11/16 14:05, Vy Nguyen Tan wrote:

Hello,

I want to install CEPH on new nodes but I can't reach CEPH repo, It
seems the repo are broken. I am using CentOS 7.2 and ceph-deploy 1.5.36.

/[root@cp ~]# ping -c 3 download.ceph.com /

/PING download.ceph.com  (173.236.253.173)
56(84) bytes of data./

/
/

/--- download.ceph.com  ping statistics ---/

/3 packets transmitted, 0 received, 100% packet loss, time 11999ms/

/
/

/[root@cp ~]# curl https://download.ceph.com/debian-jewel//

/curl: (7) Failed to connect to 2607:f298:6050:51f3:f816:3eff:fe71:9135:
Network is unreachable/




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH mirror down again

2016-11-25 Thread Joao Eduardo Luis

On 11/26/2016 03:05 AM, Vy Nguyen Tan wrote:

Hello,

I want to install CEPH on new nodes but I can't reach CEPH repo, It
seems the repo are broken. I am using CentOS 7.2 and ceph-deploy 1.5.36.


Patrick sent an email to the list informing this would happen back on 
Nov 18th; quote:



Due to Dreamhost shutting down the old DreamCompute cluster in their
US-East 1 region, we are in the process of beginning the migration of
Ceph infrastructure.  We will need to move download.ceph.com,
tracker.ceph.com, and docs.ceph.com to their US-East 2 region.

The current plan is to move the VMs on 25 NOV 2016 throughout the day.
Expect them to be down intermittently.


  -Joao

P.S.: also, it's Ceph; not CEPH.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH mirror down again

2016-11-25 Thread Vy Nguyen Tan
Hi Matt and Joao,

Thank you for your information. I am installing Ceph with alternative
mirror (ceph-deploy install --repo-url http://hk.ceph.com/rpm-jewel/el7/
--gpg-url http://hk.ceph.com/keys/release.asc {host}) and everything work
again.

On Sat, Nov 26, 2016 at 10:12 AM, Joao Eduardo Luis  wrote:

> On 11/26/2016 03:05 AM, Vy Nguyen Tan wrote:
>
>> Hello,
>>
>> I want to install CEPH on new nodes but I can't reach CEPH repo, It
>> seems the repo are broken. I am using CentOS 7.2 and ceph-deploy 1.5.36.
>>
>
> Patrick sent an email to the list informing this would happen back on Nov
> 18th; quote:
>
> Due to Dreamhost shutting down the old DreamCompute cluster in their
>> US-East 1 region, we are in the process of beginning the migration of
>> Ceph infrastructure.  We will need to move download.ceph.com,
>> tracker.ceph.com, and docs.ceph.com to their US-East 2 region.
>>
>> The current plan is to move the VMs on 25 NOV 2016 throughout the day.
>> Expect them to be down intermittently.
>>
>
>   -Joao
>
> P.S.: also, it's Ceph; not CEPH.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH mirror down again

2016-11-25 Thread Andrus, Brian Contractor
Hmm. Apparently download.ceph.com = us-west.ceph.com
And there is no repomd.xml on us-east.ceph.com

This seems to happen a little too often for something that is stable and 
released. Makes it seem like the old BBS days of “I want to play DOOM, so I’m 
shutting the services down”


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238




From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Vy 
Nguyen Tan
Sent: Friday, November 25, 2016 7:28 PM
To: Joao Eduardo Luis 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] CEPH mirror down again

Hi Matt and Joao,

Thank you for your information. I am installing Ceph with alternative mirror 
(ceph-deploy install --repo-url http://hk.ceph.com/rpm-jewel/el7/ --gpg-url 
http://hk.ceph.com/keys/release.asc {host}) and everything work again.

On Sat, Nov 26, 2016 at 10:12 AM, Joao Eduardo Luis 
mailto:j...@suse.de>> wrote:
On 11/26/2016 03:05 AM, Vy Nguyen Tan wrote:
Hello,

I want to install CEPH on new nodes but I can't reach CEPH repo, It
seems the repo are broken. I am using CentOS 7.2 and ceph-deploy 1.5.36.

Patrick sent an email to the list informing this would happen back on Nov 18th; 
quote:
Due to Dreamhost shutting down the old DreamCompute cluster in their
US-East 1 region, we are in the process of beginning the migration of
Ceph infrastructure.  We will need to move 
download.ceph.com,
tracker.ceph.com, and 
docs.ceph.com to their US-East 2 region.

The current plan is to move the VMs on 25 NOV 2016 throughout the day.
Expect them to be down intermittently.

  -Joao

P.S.: also, it's Ceph; not CEPH.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH mirror down again

2016-11-25 Thread Wido den Hollander

> Op 26 november 2016 om 5:13 schreef "Andrus, Brian Contractor" 
> :
> 
> 
> Hmm. Apparently download.ceph.com = us-west.ceph.com
> And there is no repomd.xml on us-east.ceph.com
> 

You could check http://us-east.ceph.com/timestamp to see how far behind it is 
on download.ceph.com

For what repo are you missing the repomd.xml?

Wido

> This seems to happen a little too often for something that is stable and 
> released. Makes it seem like the old BBS days of “I want to play DOOM, so I’m 
> shutting the services down”
> 
> 
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> voice: 831-656-6238
> 
> 
> 
> 
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Vy 
> Nguyen Tan
> Sent: Friday, November 25, 2016 7:28 PM
> To: Joao Eduardo Luis 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] CEPH mirror down again
> 
> Hi Matt and Joao,
> 
> Thank you for your information. I am installing Ceph with alternative mirror 
> (ceph-deploy install --repo-url http://hk.ceph.com/rpm-jewel/el7/ --gpg-url 
> http://hk.ceph.com/keys/release.asc {host}) and everything work again.
> 
> On Sat, Nov 26, 2016 at 10:12 AM, Joao Eduardo Luis 
> mailto:j...@suse.de>> wrote:
> On 11/26/2016 03:05 AM, Vy Nguyen Tan wrote:
> Hello,
> 
> I want to install CEPH on new nodes but I can't reach CEPH repo, It
> seems the repo are broken. I am using CentOS 7.2 and ceph-deploy 1.5.36.
> 
> Patrick sent an email to the list informing this would happen back on Nov 
> 18th; quote:
> Due to Dreamhost shutting down the old DreamCompute cluster in their
> US-East 1 region, we are in the process of beginning the migration of
> Ceph infrastructure.  We will need to move 
> download.ceph.com,
> tracker.ceph.com, and 
> docs.ceph.com to their US-East 2 region.
> 
> The current plan is to move the VMs on 25 NOV 2016 throughout the day.
> Expect them to be down intermittently.
> 
>   -Joao
> 
> P.S.: also, it's Ceph; not CEPH.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developers Required - Bangalore

2016-11-25 Thread Thangaraj Vinayagamoorthy
Hi ,

We are looking strong Ceph Developer in our Organization.

Kindly share your contact number if you are interested to solve good data
problems.

Regards,
Thangaraj V
7899706889

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com