[ceph-users] The woes of sequential reads

2014-04-07 Thread Christian Balzer

Hello, 

Nothing new, I know. But some numbers to mull and ultimately weep over.

Ceph cluster based on Debian Jessie (thus ceph 0.72.x), 2 nodes, 2 OSDs
each. 
Infiniband 4xQDR, IPoIB interconnects, 1 GByte/s bandwidth end to end. 
There was nothing going on aside from the tests.

Just going to use the bonnie++ values for throughput to keep it simple and
short.

On the OSD itself:
---
Version  1.97   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ceph-01 64G   1309731  96 467763  51   1703299  79 
784.0  32


On a compute node, host side:
---
Version  1.97   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
comp-02256G   296928  60 64216  16   145015  17 291.6  
12
---
Ouch. Well the write speed is probably the OSD journal SSDs being hobbled
by being on SATA-2 links of the onboard AMD chipset. I had planned for
that shortcoming, alas the cheap and cheerful Marvell 88SE9230 based
PCIex4 controller can't get a stable link under any linux kernel I tried.
OTOH, I don't expect more than 30MB/s average writes for all the VMs
combined. 
Despite having been aware of the sequential read speed issues, I really
was disappointed here. 10% of a single OSD. The OSD processes and actual
disks were bored stiff during the read portion of that bonnie run.

OK, lets increase read-ahead (no or negative effects on the OSDs, FYI
since I've seen that mentioned a few times as well.
So after a "echo 4096 > /sys/block/vda/queue/read_ahead_kb" we get:
---
Version  1.97   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
comp-02256G   280277  44 158633  30   655827  46 577.9  
17
---
Better, not great, but certainly around what I expected.

So lets see how this is inside a VM (Wheezy). This is ganeti on jessie,
thus no qemu caching and kernelspace RBD (no qemu with userspace support
outside sid/experimental yet):
---
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
fp-001   8G   170374  29 27599   7   34059   5 328.0  12
---
Le mega ouch. So writes are down to 10% of the OSD and the reads are...
deplorable. 
Setting the read-ahead inside the VM to 4MB gives us about 380MB/s reads,
so in line with the writes, that is half of the host speed. 
I will test this with userspace qemu when available. 

However setting the read-ahead may not be a feasible option, be it access
to the VM, it being upgraded, etc. 
Something more transparent that can be controlled by the people running
the host or ceph cluster is definitely needed:
https://wiki.ceph.com/Planning/Blueprints/Emperor/Kernel_client_read_ahead_optimization

Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fuse or kernel to mount rbd?

2014-04-07 Thread Chad Seys
Hi Sage et al,
  Thanks for the info!  How stable are the cutting edge kernels like 3.13 ?  
Is 3.8 (e.g. from Ubuntu Raring) a better choice?

Thanks again!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fuse or kernel to mount rbd?

2014-04-07 Thread Sage Weil
On Mon, 7 Apr 2014, Chad Seys wrote:
> Hi Sage et al,
>   Thanks for the info!  How stable are the cutting edge kernels like 3.13 ?  
> Is 3.8 (e.g. from Ubuntu Raring) a better choice?

3.8 will not support layering.

>From RBD's perspective, the newest kernels are the most stable.  If it 
were me I would go for whatever 14.04 is using.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover cluster health after loosing 100% of OSDs.

2014-04-07 Thread Gregory Farnum
On Sat, Apr 5, 2014 at 10:00 AM, Max Kutsevol  wrote:
> Hello!
>
> I am new to ceph, please take that into account.
>
> I'm experimenting with 3mons+2osds setup and got into situation when I
> recreated both of osds.
>
> My pools:
> ceph> osd lspools
>  0 data,1 metadata,
>
> These are just the defaults, I deleted rbd pool, the other two I can't
> delete it says that they are used by CephFS (no mds is running - why
> it's used?)
>
> Cluster status
>
> ceph> status
> cluster 8c3d2e5d-fce9-425b-8028-d2105a9cac3f
> health HEALTH_WARN 128 pgs degraded; 128 pgs stale; 128 pgs stuck stale;
> 128 pgs stuck unclean; 2/2 in osds are down
> monmap e2: 3 mons at
> {mon0=10.1.0.7:6789/0,mon1=10.1.0.8:6789/0,mon2=10.1.0.11:6789/0},
> election epoch 52, quorum 0,1,2 mon0,mon1,mon2
>   osdmap e70: 2 osds: 0 up, 2 in
>pgmap v129: 128 pgs, 3 pools, 0 bytes data, 0 objects 2784 kB used,
> 36804 MB / 40956 MB avail 128 stale+active+degraded
>
>
> Effectively there is no data for that PGs. I formatted it myself. How
> can I tell ceph that there is no way to get that data back and it should
> forget about that PGs and go on?

Look in the docs (ceph.com/docs) for the "lost" commands. However,
once you've killed all the OSDs in a cluster there's basically no
point to keeping the "cluster" around; you should just wipe it and
start over again.

> Also, how can I delete 'data' and 'metadata' pools or they are need for
> some internal stuff (I won't use mds).

Hmm, I think we inadvertently made this impossible. I've made a bug:
http://tracker.ceph.com/issues/8010
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy fails to generate keys

2014-04-07 Thread Diedrich Ehlerding
[monitoprs do not start properly with ceph-deploy]
Brian Chandler:

> > thank you for your, response, however:
> >> Including iptables? CentOS/RedHat default to iptables enabled and
> >> closed.
> >>
> >> "iptables -Lvn" to be 100% sure.
> > hvrrzceph1:~ # iptables -Lvn
> > iptables: No chain/target/match by that name.
> > hvrrzceph1:~ #
> >
> Ergh, my mistake: iptables -L -v -n


 
hvrrzceph1:~ # iptables -L -v -n
Chain INPUT (policy ACCEPT 8739 packets, 476K bytes)
 pkts bytes target prot opt in out source   
destination 

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target prot opt in out source   
destination 

Chain OUTPUT (policy ACCEPT 6270 packets, 505K bytes)
 pkts bytes target prot opt in out source   
destination 
hvrrzceph1:~ #

The servers do not run any firewall, and they are connected to the 
same switch. ssh login works over three networks (one to be used as 
admin network, one as public network, and another one as cluster 
network). 

Any hint is appreciated ...

Diedrich
-- 
Diedrich Ehlerding, Fujitsu Technology Solutions GmbH,
FTS CE SC PS&IS W, Hildesheimer Str 25, D-30880 Laatzen
Fon +49 511 8489-1806, Fax -251806, Mobil +49 173 2464758
Firmenangaben: http://de.ts.fujitsu.com/imprint.html


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd map changed when upgrading from emperor to v0.78

2014-04-07 Thread Thanh Tran
Hi,

I replace the code in the file /etc/init.d/ceph of v0.78 with the code of
emperor and everything is ok for now. The replaced code:

#Code of v0.78
if [ "$type" = "osd" ]; then
get_conf update_crush "" "osd crush update on start"
if [ "${update_crush:-1}" = "1" -o "${update_crush:-1}" =
"true" ]; then
# update location in crush
get_conf osd_location_hook
"$BINDIR/ceph-crush-location" "osd crush location hook"
osd_location=`$osd_location_hook --cluster ceph --id
$id --type osd`
get_conf osd_weight "" "osd crush initial weight"
defaultweight="$(df -P -k $osd_data/. | tail -1 | awk
'{ d=$2/1073741824 ; r = sprintf("%.2f", d); print r }')"
get_conf osd_keyring "$osd_data/keyring" "keyring"
do_cmd "timeout 10 $BINDIR/ceph -c $conf --name=osd.$id
--keyring=$osd_keyring osd crush create-or-move -- $id
${osd_weight:-${defaultweight:-1}} $osd_location"
fi
fi

#Code of emperor
  if [ "$type" = "osd" ]; then
get_conf update_crush "" "osd crush update on start"
if [ "${update_crush:-1}" = "1" -o "{$update_crush:-1}" =
"true" ]; then
# update location in crush; put in some suitable
defaults on the
# command line, ceph.conf can override what it wants
get_conf osd_location "" "osd crush location"
get_conf osd_weight "" "osd crush initial weight"
defaultweight="$(do_cmd "df $osd_data/. | tail -1 | awk
'{ d= \$2/1073741824 ; r = sprintf(\"%.2f\", d); print r }'")"
get_conf osd_keyring "$osd_data/keyring" "keyring"
do_cmd "timeout 10 $BINDIR/ceph \
--name=osd.$id \
--keyring=$osd_keyring \
osd crush create-or-move \
-- \
$id \
${osd_weight:-${defaultweight:-1}} \
root=default \
host=$host \
$osd_location"
fi
fi

Best regards,
Thanh Tran


On Mon, Apr 7, 2014 at 3:49 PM, Thanh Tran  wrote:

> Hi,
>
> First, i installed ceph with version emperor by mkcephfs, everything is ok.
> My cluster has 3 server. Please see http://pastebin.com/avTRfi5F for
> config information and additional information.
>
> Then, i upgraded to v0.78 and restart ceph, osd map changed (see "ceph osd
> tree" at http://pastebin.com/avTRfi5F).
> osd.1 and osd.4 should belong to host cephtest19, osd.2 and osd.5 should
> belong to host cephtest20.
>
> Processes osd.1, osd.4 and osd.2, osd.5 are still running on cephtest19
> and cephtest20.
>
> Please help me to investigate this issue.
>
> Best regards,
> Thanh Tran
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph yarn setup

2014-04-07 Thread Gurvinder Singh
Hi,

As mentioned earlier, here is the link to how to guide to make yarn work
with Ceph emperor.

http://blog.uninettlabs.no/?p=54

Feel free to ask any questions.

Gurvinder Singh
Uninett AS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error when building ceph. fatal error: civetweb/civetweb.h: No such file or directory

2014-04-07 Thread Thanh Tran
Hi,

When i build ceph from source code that I downloaded from
https://github.com/ceph/ceph/tree/v0.78, it has error as following:

rgw/rgw_civetweb.cc:4:31: fatal error: civetweb/civetweb.h: No such file or
directory
compilation terminated.
make[3]: *** [rgw/rgw_civetweb.o] Error 1
make[3]: Leaving directory `/home/thanhtv3/ceph/source/ceph-0.78/src'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/thanhtv3/ceph/source/ceph-0.78/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/thanhtv3/ceph/source/ceph-0.78/src'
make: *** [all-recursive] Error 1

what did i missed?

Best regards,
Thanh Tran
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot remove rbd image, snapshot busy

2014-04-07 Thread Jonathan Gowar
Thanks.  I managed to remove the images by unprotecting them first.

On Fri, 2014-04-04 at 10:15 +0800, YIP Wai Peng wrote:
> Yes. You can see whether the snapshots are protected by using snap rm
> instead of snap purge.
> 
> # rbd --pool mypool snap rm 5216ba99-1d8e-4155-9877-7d77d7b6caa0@snap
> # rbd --pool mypool snap unprotect 5216ba99-1d8e-4155-9877-7d77d7b6caa0@snap

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph hbase issue

2014-04-07 Thread Gurvinder Singh
Hi,

I am trying to make HBase 0.96 work on top of Ceph 0.72.2. When I start
the Hbase-master I am getting this error.

2014-04-05 23:39:39,475 DEBUG [master:pltrd023:6] wal.FSHLog: Moved
1 WAL file(s) to /hbase/data/hbase/meta/1588230740/oldWALs
2014-04-05 23:39:39,538 FATAL [master:host:6] master.HMaster:
Unhandled exception. Starting shutdown.
java.io.IOException: Error accessing
ceph://mon-host:6789/hbase/data/hbase/meta/.tabledesc
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1486)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1524)
at
org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1582)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.getCurrentTableInfoStatus(FSTableDescriptors.java:348)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:329)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:310)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.createTableDescriptorForTableDirectory(FSTableDescriptors.java:709)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.createTableDescriptor(FSTableDescriptors.java:690)
at
org.apache.hadoop.hbase.util.FSTableDescriptors.createTableDescriptor(FSTableDescriptors.java:677)
at
org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:486)
at
org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)
at
org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:127)
at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:789)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
at java.lang.Thread.run(Thread.java:724)


The only odd thing/warn i see in the log file is this

 wal.FSHLog: FileSystem's output stream doesn't support
getNumCurrentReplicas; --HDFS-826 not available;
fsOut=org.apache.hadoop.fs.ceph.CephOutputStream

It has able to create hbase root and other directories such as data,meta
etc. So it seems Hbase is able to communicate with Ceph, but somehow it
is not able to create all the necessary files Any suggestions there ?

I have added these config options in the hbase-site.xml file

 
fs.defaultFS
ceph://mon-host:6789/
  

  
ceph.conf.options
client_readahead_min=4193404
  

  
ceph.conf.file
/etc/ceph/ceph.conf
  

  
ceph.auth.id
admin
  

  
ceph.auth.keyfile
/etc/hbase/conf/admin.secret
  

  
fs.ceph.impl
org.apache.hadoop.fs.ceph.CephFileSystem
  

  
fs.AbstractFileSystem.ceph.impl
org.apache.hadoop.fs.ceph.CephHadoop2FileSystem
  

  
hbase.rootdir
ceph://mon-host:6789/hbase
The directory shared by RegionServers.

  


- Gurvinder
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy fails to generate keys

2014-04-07 Thread Alfredo Deza
On Mon, Apr 7, 2014 at 3:50 AM, Diedrich Ehlerding
 wrote:
> [monitoprs do not start properly with ceph-deploy]
> Brian Chandler:
>
>> > thank you for your, response, however:
>> >> Including iptables? CentOS/RedHat default to iptables enabled and
>> >> closed.
>> >>
>> >> "iptables -Lvn" to be 100% sure.
>> > hvrrzceph1:~ # iptables -Lvn
>> > iptables: No chain/target/match by that name.
>> > hvrrzceph1:~ #
>> >
>> Ergh, my mistake: iptables -L -v -n
>
>
>
> hvrrzceph1:~ # iptables -L -v -n
> Chain INPUT (policy ACCEPT 8739 packets, 476K bytes)
>  pkts bytes target prot opt in out source
> destination
>
> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
>  pkts bytes target prot opt in out source
> destination
>
> Chain OUTPUT (policy ACCEPT 6270 packets, 505K bytes)
>  pkts bytes target prot opt in out source
> destination
> hvrrzceph1:~ #
>
> The servers do not run any firewall, and they are connected to the
> same switch. ssh login works over three networks (one to be used as
> admin network, one as public network, and another one as cluster
> network).
>
> Any hint is appreciated ...

Have you increased the verbosity for the monitors, restarted them, and
looked at the log output?
>
> Diedrich
> --
> Diedrich Ehlerding, Fujitsu Technology Solutions GmbH,
> FTS CE SC PS&IS W, Hildesheimer Str 25, D-30880 Laatzen
> Fon +49 511 8489-1806, Fax -251806, Mobil +49 173 2464758
> Firmenangaben: http://de.ts.fujitsu.com/imprint.html
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd map changed when upgrading from emperor to v0.78

2014-04-07 Thread Thanh Tran
Hi,

First, i installed ceph with version emperor by mkcephfs, everything is ok.
My cluster has 3 server. Please see http://pastebin.com/avTRfi5F for config
information and additional information.

Then, i upgraded to v0.78 and restart ceph, osd map changed (see "ceph osd
tree" at http://pastebin.com/avTRfi5F).
osd.1 and osd.4 should belong to host cephtest19, osd.2 and osd.5 should
belong to host cephtest20.

Processes osd.1, osd.4 and osd.2, osd.5 are still running on cephtest19 and
cephtest20.

Please help me to investigate this issue.

Best regards,
Thanh Tran
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error when building ceph. fatal error: civetweb/civetweb.h: No such file or directory

2014-04-07 Thread Kai Zhang
Hi Thanh,

I think you miss the "$ git submodule update --init", which clones all the 
submodules required for compilation.

Cheers,
Kai

At 2014-04-07 09:35:32,"Thanh Tran"  wrote:

Hi,


When i build ceph from source code that I downloaded from 
https://github.com/ceph/ceph/tree/v0.78, it has error as following:


rgw/rgw_civetweb.cc:4:31: fatal error: civetweb/civetweb.h: No such file or 
directory
compilation terminated.
make[3]: *** [rgw/rgw_civetweb.o] Error 1
make[3]: Leaving directory `/home/thanhtv3/ceph/source/ceph-0.78/src'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/thanhtv3/ceph/source/ceph-0.78/src'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/thanhtv3/ceph/source/ceph-0.78/src'
make: *** [all-recursive] Error 1


what did i missed?


Best regards,
Thanh Tran___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW: bad request

2014-04-07 Thread Gandalf Corvotempesta
I'm getting these trying to upload any file:

2014-04-07 14:33:27.084369 7f5268f86700  5 Getting permissions
id=testuser owner=testuser perm=2
2014-04-07 14:33:27.084372 7f5268f86700 10  uid=testuser requested
perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
2014-04-07 14:33:27.084377 7f5268f86700  2 req 1:0.020670:s3:PUT
/testbucket/613322b6f4b9ce218633fb8f25ade7b7_16.jpg:put_obj:verifying
op params
2014-04-07 14:33:27.084385 7f5268f86700  2 req 1:0.020677:s3:PUT
/testbucket/613322b6f4b9ce218633fb8f25ade7b7_16.jpg:put_obj:executing
2014-04-07 14:33:41.035313 7f5276bac700  2
RGWDataChangesLog::ChangesRenewThread: start
2014-04-07 14:33:57.094495 7f5268f86700  2 req 1:30.030785:s3:PUT
/testbucket/613322b6f4b9ce218633fb8f25ade7b7_16.jpg:put_obj:http
status=400
2014-04-07 14:33:57.094680 7f5268f86700  1 == req done
req=0xe05530 http_status=400 ==

Any advice? I'm using apache and fastcgi coming from ceph repos.
Bucket was created successfully with s3cmd
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Active-Active iSCSI cluster

2014-04-07 Thread Walter Huf
There's this 
articlethat
says that you shouldn't ever do iSCSI multipath to multiple iSCSI
targets on top of the same DRBD volume. Can someone tell me whether that is
a limitation of DRBD, and that Ceph would perform correctly in such a
situation?
Another obstacle in such an active-active setup would be communicating
state information between the copies of iSCSI target software, such as SCSI
Persistent Reservations.

Can anyone think of any other obstacles to running an active-active iSCSI
cluster on top of Ceph?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW: bad request

2014-04-07 Thread Yehuda Sadeh
On Mon, Apr 7, 2014 at 5:38 AM, Gandalf Corvotempesta
 wrote:
> I'm getting these trying to upload any file:
>
> 2014-04-07 14:33:27.084369 7f5268f86700  5 Getting permissions
> id=testuser owner=testuser perm=2
> 2014-04-07 14:33:27.084372 7f5268f86700 10  uid=testuser requested
> perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
> 2014-04-07 14:33:27.084377 7f5268f86700  2 req 1:0.020670:s3:PUT
> /testbucket/613322b6f4b9ce218633fb8f25ade7b7_16.jpg:put_obj:verifying
> op params
> 2014-04-07 14:33:27.084385 7f5268f86700  2 req 1:0.020677:s3:PUT
> /testbucket/613322b6f4b9ce218633fb8f25ade7b7_16.jpg:put_obj:executing
> 2014-04-07 14:33:41.035313 7f5276bac700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2014-04-07 14:33:57.094495 7f5268f86700  2 req 1:30.030785:s3:PUT
> /testbucket/613322b6f4b9ce218633fb8f25ade7b7_16.jpg:put_obj:http
> status=400
> 2014-04-07 14:33:57.094680 7f5268f86700  1 == req done
> req=0xe05530 http_status=400 ==
>
> Any advice? I'm using apache and fastcgi coming from ceph repos.
> Bucket was created successfully with s3cmd

Try bumping up logs (debug rgw = 20, debug ms = 1). Not enough info
here to say much, note that it takes exactly 30 seconds for the
gateway to send the error response, may be some timeout. I'd verify
that the correct fastcgi module is running.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy fails to generate keys

2014-04-07 Thread Neil Levine
Is SELinux enabled?

On Mon, Apr 7, 2014 at 12:50 AM, Diedrich Ehlerding
 wrote:
> [monitoprs do not start properly with ceph-deploy]
> Brian Chandler:
>
>> > thank you for your, response, however:
>> >> Including iptables? CentOS/RedHat default to iptables enabled and
>> >> closed.
>> >>
>> >> "iptables -Lvn" to be 100% sure.
>> > hvrrzceph1:~ # iptables -Lvn
>> > iptables: No chain/target/match by that name.
>> > hvrrzceph1:~ #
>> >
>> Ergh, my mistake: iptables -L -v -n
>
>
>
> hvrrzceph1:~ # iptables -L -v -n
> Chain INPUT (policy ACCEPT 8739 packets, 476K bytes)
>  pkts bytes target prot opt in out source
> destination
>
> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
>  pkts bytes target prot opt in out source
> destination
>
> Chain OUTPUT (policy ACCEPT 6270 packets, 505K bytes)
>  pkts bytes target prot opt in out source
> destination
> hvrrzceph1:~ #
>
> The servers do not run any firewall, and they are connected to the
> same switch. ssh login works over three networks (one to be used as
> admin network, one as public network, and another one as cluster
> network).
>
> Any hint is appreciated ...
>
> Diedrich
> --
> Diedrich Ehlerding, Fujitsu Technology Solutions GmbH,
> FTS CE SC PS&IS W, Hildesheimer Str 25, D-30880 Laatzen
> Fon +49 511 8489-1806, Fax -251806, Mobil +49 173 2464758
> Firmenangaben: http://de.ts.fujitsu.com/imprint.html
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question about mark_unfound_lost on RGW metadata.

2014-04-07 Thread Craig Lewis

Ceph is telling me that it can't find some data:
2014-04-07 11:15:09.901992 mon.0 [INF] pgmap v5436846: 2592 pgs: 2164 
active+clean, 142 active+remapped+wait_backfill, 150 
active+degraded+wait_backfill, 1 active+recovering+degraded, 2 
active+degraded+backfilling, 133 active+degraded+remapped+wait_backfill; 
15094 GB data, 28749 GB used, 30839 GB / 59588 GB avail; 
3496837/37879443 objects degraded (9.231%); *1/18361235 unfound 
(0.000%)*; 25900 kB/s, 26 objects/s recovering


querying all the PGs tells me that 11.483 has 1 missing object, named 
.dir.us-west-1.51941060.1.


pg query says the recovery state is:
  "might_have_unfound": [
{ "osd": 11,
  "status": "querying"},
{ "osd": 13,
  "status": "already probed"}],

Active OSDs for this PG are [3,13], so osd.13 is the 2ndry for this PG.  
osd.11 does not have the data.  I recently replaced osd.11, and this 
data was unfound before the drive swap.  So it looks like I have no 
choice but to use mark_unfound_lost.



I have some concerns though.  Pool 11 is .rgw.buckets.  I assume from 
the object's name, .dir.us-west-1 is related to replication. us-west-1 
is the master zone, and these errors are occuring in the slave zone 
(us-central-1).


What are the risks of using ceph pg {pgid} mark_unfound_lost revert on 
that particular object?  I'm comfortable losing objects in the slave, I 
can re-upload them to the master zone.  I just want to make sure I'm not 
going to render the slave zone unusable.




Thanks for the help.





--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.79 released (firefly release candicate)

2014-04-07 Thread Sage Weil
This release is intended to serve as a release candidate for firefly, 
which will hopefully be v0.80.  No changes are being made to the code base 
at this point except those that fix bugs.  Please test this release if you 
intend to make use of the new erasure-coded pools or cache tiers in 
firefly.

This release fixes a range of bugs found in v0.78 and streamlines the
user experience when creating erasure-coded pools.  There is also a
raft of fixes for the MDS (multi-mds, directory fragmentation, and
large directories).  The main notable new piece of functionality is a
small change to allow radosgw to use an erasure-coded pool for object
data.


Upgrading
-

* Erasure pools created with v0.78 will no longer function with v0.79.  You
  will need to delete the old pool and create a new one.

* A bug was fixed in the authentication handshake with big-endian
  architectures that prevent authentication between big- and
  little-endian machines in the same cluster.  If you have a cluster
  that consists entirely of big-endian machines, you will need to
  upgrade all daemons and clients and restart.

* The 'ceph.file.layout' and 'ceph.dir.layout' extended attributes are
  no longer included in the listxattr(2) results to prevent problems with
  'cp -a' and similar tools.

* Monitor 'auth' read-only commands now expect the user to have 'rx' caps.
  This is the same behavior that was present in dumpling, but in emperor
  and more recent development releases the 'r' cap was sufficient.  The
  affected commands are::

ceph auth export
ceph auth get
ceph auth get-key
ceph auth print-key
ceph auth list


Notable Changes
---

* ceph-conf: stop creating bogus log files (Josh Durgin, Sage Weil)
* common: fix authentication on big-endian architectures (Dan Mick)
* debian: change directory ownership between ceph and ceph-common (Sage Weil)
* init: fix startup ordering/timeout problem with OSDs (Dmitry Smirnov)
* librbd: skip zeroes/holes when copying sparse images (Josh Durgin)
* mds: cope with MDS failure during creation (John Spray)
* mds: fix crash from client sleep/resume (Zheng Yan)
* mds: misc fixes for directory fragments (Zheng Yan)
* mds: misc fixes for larger directories (Zheng Yan)
* mds: misc fixes for multiple MDSs (Zheng Yan)
* mds: remove .ceph directory (John Spray)
* misc coverity fixes, cleanups (Danny Al-Gaaf)
* mon: add erasure profiles and improve erasure pool creation (Loic Dachary)
* mon: 'ceph osd pg-temp ...' and primary-temp commands (Ilya Dryomov)
* mon: fix pool count in 'ceph -s' output (Sage Weil)
* msgr: improve connection error detection between clients and monitors 
  (Greg Farnum, Sage Weil)
* osd: add/fix CPU feature detection for jerasure (Loic Dachary)
* osd: improved scrub checks on clones (Sage Weil, Sam Just)
* osd: many erasure fixes (Sam Just)
* osd: move to jerasure2 library (Loic Dachary)
* osd: new tests for erasure pools (David Zafman)
* osd: reduce scrub lock contention (Guang Yang)
* rgw: allow use of an erasure data pool (Yehuda Sadeh)


Downloading
---

* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.79.tar.gz
* For packages, see http://ceph.com/docs/master/install/get-packages
* For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] atomic + asynchr

2014-04-07 Thread Steven Paster
I am using the Ceph C api to create and update Ceph objects and their 
respective omaps and xattributes.  I have two requirements:

  1.  Write operations must be written atomically such that either all updates 
to an object complete, or, all updates to the object fail. If even one update 
fails then the object remains unchanged.
  2.  Simultaneously, I need to be assured that all writes have been persisted 
to disk on at least two of three replicas.

I see that there are methods to read/write atomically, and there are methods 
for synchronous writes with completion callbacks, but is there a mechanism for 
doing both a the same time.  Am I overlooking anything? Will any of the 
flush_cache calls provide the guarantee I'm looking for?

Steven Paster
Platform Architect
Actiance Corp.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW Quotas

2014-04-07 Thread Derek Yarnell
Hi,

Most likely Yehuda can speak to this.  So there is some documentation on
the way to set quotas for a user in master (src/rgw/rgw_rest_user.cc
line 712) but these are not in the docs it seems yet. I have started to
incorporate them but I don't see anything as an example for how to
document the JSON body.  Is there a convention that I should try to follow?

Second question relates to the bucket quotas.  In the example above
these are set via the uid=$uid and quota-type parameters.  This means
that this applies to all buckets the user owns I would expect (since I
am not required to give a bucket name).  Is the current design that a
bucket can't have a independent quota?

Thanks,
derek

-- 
Derek T. Yarnell
University of Maryland
Institute for Advanced Computer Studies
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW Quotas

2014-04-07 Thread Yehuda Sadeh
On Mon, Apr 7, 2014 at 4:34 PM, Derek Yarnell  wrote:
> Hi,
>
> Most likely Yehuda can speak to this.  So there is some documentation on
> the way to set quotas for a user in master (src/rgw/rgw_rest_user.cc
> line 712) but these are not in the docs it seems yet. I have started to
> incorporate them but I don't see anything as an example for how to
> document the JSON body.  Is there a convention that I should try to follow?

The geo-replication related api has some requests with JSON body:

http://wiki.ceph.com/Development/RESTful_API_for_DR_%2F%2F_Geo-Replication
>
> Second question relates to the bucket quotas.  In the example above
> these are set via the uid=$uid and quota-type parameters.  This means
> that this applies to all buckets the user owns I would expect (since I
> am not required to give a bucket name).  Is the current design that a
> bucket can't have a independent quota?
>

Buckets can have independent quota, it just cannot be set using this
specific api (which is used to control user info hence the api entry
point there is /admin/user). The bucket specific quota can be set
either through radosgw-admin, or by using the metadata api.

Yehuda
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW Quotas

2014-04-07 Thread John Wilkins
Derek,

I have some. I'll get them by the end of the week at the latest.


On Mon, Apr 7, 2014 at 4:52 PM, Yehuda Sadeh  wrote:

> On Mon, Apr 7, 2014 at 4:34 PM, Derek Yarnell 
> wrote:
> > Hi,
> >
> > Most likely Yehuda can speak to this.  So there is some documentation on
> > the way to set quotas for a user in master (src/rgw/rgw_rest_user.cc
> > line 712) but these are not in the docs it seems yet. I have started to
> > incorporate them but I don't see anything as an example for how to
> > document the JSON body.  Is there a convention that I should try to
> follow?
>
> The geo-replication related api has some requests with JSON body:
>
> http://wiki.ceph.com/Development/RESTful_API_for_DR_%2F%2F_Geo-Replication
> >
> > Second question relates to the bucket quotas.  In the example above
> > these are set via the uid=$uid and quota-type parameters.  This means
> > that this applies to all buckets the user owns I would expect (since I
> > am not required to give a bucket name).  Is the current design that a
> > bucket can't have a independent quota?
> >
>
> Buckets can have independent quota, it just cannot be set using this
> specific api (which is used to control user info hence the api entry
> point there is /admin/user). The bucket specific quota can be set
> either through radosgw-admin, or by using the metadata api.
>
> Yehuda
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
John Wilkins
Senior Technical Writer
Intank
john.wilk...@inktank.com
(415) 425-9599
http://inktank.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OpenStack Survey

2014-04-07 Thread Patrick McGarry
Greetings Cephers!

If you haven't been following news in the OpenStack world it might be
easy to miss that another user survey is being collected to assess
what the OpenStack landscape looks like.  We'd really love it if you
could go tell your OpenStack story, especially since we love to make
sure that people who are using both OpenStack && Ceph together are
accurately represented.

https://www.openstack.org/user-survey/

Keep in mind that submissions close this Friday at 23:00 UTC, so there
isn't tons of time left.  I'll send out another reminder or two, but
don't wait...tell the world how awesome you think OpenStack and Ceph
are together!

In addition to a collection of random strangers on the internet, we'd
love to get a chance to meet you in person.  So, if you are swinging
through the OpenStack Developer Summit in Austin this May, please stop
by the Inktank booth and say hi!  Thanks.

https://www.openstack.org/summit/openstack-summit-atlanta-2014/


Best Regards,

Patrick McGarry
Director, Community || Inktank
http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW and Object Lifecycle Managment

2014-04-07 Thread Craig Lewis
Does RGW support the S3 Object Lifecycle Management? 
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html


I'm not finding much using Google, or the Ceph wiki search.  The best I 
found was https://ceph.com/docs/master/radosgw/swift/tutorial/, which 
explains how manually create and destroy objects.


Looking at the code, I see that 'lifecycle' is a reserved keyword for 
RGW, but I don't see it doing anything with that reserved keyword.




--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about mark_unfound_lost on RGW metadata.

2014-04-07 Thread Craig Lewis
The PG with the unfound object has been in active+recovering+degraded 
state for much longer than usual. Most PGs spend about 20 minutes in 
that state, then complete. This one has been in this in 
active+recovering+degraded for about 4 hours now.
11.4838851188521797425590630823082 
active+recovering+degraded2014-04-07 10:31:53.146930 
13421'124257513855:1647415[3,13][3,13] 7936'1019031
2014-03-24 00:53:42.2658287936'1019031 2014-03-24 00:53:42.265828


Is this because it can't find the unfound object?  Or is this because I 
set osd flag noout and nodown?


So far it's not a big deal.  There's plenty of other backfilling and 
recovery that needs to happen.  It just seems strange to me.



*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 4/7/14 14:38 , Craig Lewis wrote:

Ceph is telling me that it can't find some data:
2014-04-07 11:15:09.901992 mon.0 [INF] pgmap v5436846: 2592 pgs: 2164 
active+clean, 142 active+remapped+wait_backfill, 150 
active+degraded+wait_backfill, 1 active+recovering+degraded, 2 
active+degraded+backfilling, 133 
active+degraded+remapped+wait_backfill; 15094 GB data, 28749 GB used, 
30839 GB / 59588 GB avail; 3496837/37879443 objects degraded (9.231%); 
*1/18361235 unfound (0.000%)*; 25900 kB/s, 26 objects/s recovering


querying all the PGs tells me that 11.483 has 1 missing object, named 
.dir.us-west-1.51941060.1.


pg query says the recovery state is:
  "might_have_unfound": [
{ "osd": 11,
  "status": "querying"},
{ "osd": 13,
  "status": "already probed"}],

Active OSDs for this PG are [3,13], so osd.13 is the 2ndry for this 
PG.  osd.11 does not have the data.  I recently replaced osd.11, and 
this data was unfound before the drive swap.  So it looks like I have 
no choice but to use mark_unfound_lost.



I have some concerns though.  Pool 11 is .rgw.buckets.  I assume from 
the object's name, .dir.us-west-1 is related to replication. us-west-1 
is the master zone, and these errors are occuring in the slave zone 
(us-central-1).


What are the risks of using ceph pg {pgid} mark_unfound_lost revert on 
that particular object?  I'm comfortable losing objects in the slave, 
I can re-upload them to the master zone.  I just want to make sure I'm 
not going to render the slave zone unusable.




Thanks for the help.





--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about mark_unfound_lost on RGW metadata.

2014-04-07 Thread Craig Lewis


Is this because it can't find the unfound object?  Or is this because 
I set osd flag noout and nodown?


Sorry, I meant to say, is it because I set:
root@ceph0c:~# ceph osd dump | grep 'flags'
flags nodown,noout,noscrub,nodeep-scrub


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW and Object Lifecycle Managment

2014-04-07 Thread Gregory Farnum
Nope, that's not supported. See
http://ceph.com/docs/master/radosgw/s3/#features-support
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Apr 7, 2014 at 6:41 PM, Craig Lewis  wrote:
> Does RGW support the S3 Object Lifecycle Management?
> http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
>
> I'm not finding much using Google, or the Ceph wiki search.  The best I
> found was https://ceph.com/docs/master/radosgw/swift/tutorial/, which
> explains how manually create and destroy objects.
>
> Looking at the code, I see that 'lifecycle' is a reserved keyword for RGW,
> but I don't see it doing anything with that reserved keyword.
>
>
>
> --
>
> Craig Lewis
> Senior Systems Engineer
> Office +1.714.602.1309
> Email cle...@centraldesktop.com
>
> Central Desktop. Work together in ways you never thought possible.
> Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com