Hi community, 10 months ago, we discovered issue, after removing cache tier
from cluster with cluster HEALTH, and start email thread, as result - new
bug was created on tracker by Samuel Just
http://tracker.ceph.com/issues/12738
Till that time, i'm looking for good moment to upgrade (after fix was
Hi community, 10 months ago, we discovered issue, after removing cache tier
from cluster with cluster HEALTH, and start email thread, as result - new
bug was created on tracker by Samuel Just
http://tracker.ceph.com/issues/12738
Till that time, i'm looking for good moment to upgrade (after fix was
Wido den Hollander :
>
>
> On 03-11-15 10:04, Voloshanenko Igor wrote:
> > Wido, also minor issue with 0,2.0 java-rados
> >
>
> Did you also re-compile CloudStack against the new rados-java? I still
> think it's related to when the Agent starts cleaning up and the
Voloshanenko Igor :
> Wido, it's the main issue. No records at all...
>
>
> So, from last time:
>
>
> 2015-11-02 11:40:33,204 DEBUG [kvm.resource.LibvirtComputingResource]
> (agentRequest-Handler-2:null) Executing: /bin/bash -c free|grep Mem:|awk
> '{print
dteDiskInfo method... but i can;t find any
bad code there (((
2015-11-03 10:40 GMT+02:00 Wido den Hollander :
>
>
> On 03-11-15 01:54, Voloshanenko Igor wrote:
> > Thank you, Jason!
> >
> > Any advice, for troubleshooting
> >
> > I'm looking in co
problem is that the RADOS IO
> context is being closed prior to closing the RBD image.
>
> --
>
> Jason Dillaman
>
>
> ----- Original Message -
>
> > From: "Voloshanenko Igor"
> > To: "Ceph Users"
> > Sent: Thursday, October 29
Dear all, can anybody help?
2015-10-30 10:37 GMT+02:00 Voloshanenko Igor :
> It's pain, but not... :(
> We already used your updated lib in dev env... :(
>
> 2015-10-30 10:06 GMT+02:00 Wido den Hollander :
>
>>
>>
>> On 29-10-15 16:38, Voloshanenko Igor
It's pain, but not... :(
We already used your updated lib in dev env... :(
2015-10-30 10:06 GMT+02:00 Wido den Hollander :
>
>
> On 29-10-15 16:38, Voloshanenko Igor wrote:
> > Hi Wido and all community.
> >
> > We catched very idiotic issue on our Cloudstack in
>From all we analyzed - look like - it's this issue
http://tracker.ceph.com/issues/13045
PR: https://github.com/ceph/ceph/pull/6097
Can anyone help us to confirm this? :)
2015-10-29 23:13 GMT+02:00 Voloshanenko Igor :
> Additional trace:
>
> #0 0x7f30f9891cc9 in __GI_rai
2
#12 0x7f30f995547d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
2015-10-29 17:38 GMT+02:00 Voloshanenko Igor :
> Hi Wido and all community.
>
> We catched very idiotic issue on our Cloudstack installation, which
> related to ceph and possible to java-rados lib.
&g
Hi Wido and all community.
We catched very idiotic issue on our Cloudstack installation, which related
to ceph and possible to java-rados lib.
So, we have constantly agent crashed (which cause very big problem for
us... ).
When agent crashed - it's crash JVM. And no event in logs at all.
We enab
Great!
Yes, behaviour exact as i described. So looks like it's root cause )
Thank you, Sam. Ilya!
2015-08-21 21:08 GMT+03:00 Samuel Just :
> I think I found the bug -- need to whiteout the snapset (or decache
> it) upon evict.
>
> http://tracker.ceph.com/issues/12748
> -Sam
>
> On Fri, Aug 21, 2
To be honest, Samsung 850 PRO not 24/7 series... it's something about
desktop+ series, but anyway - results from this drives - very very bad in
any scenario acceptable by real life...
Possible 845 PRO more better, but we don't want to experiment anymore... So
we choose S3500 240G. Yes, it's cheape
Exact as in our case.
Ilya, same for images from our side. Headers opened from hot tier
пятница, 21 августа 2015 г. пользователь Ilya Dryomov написал:
> On Fri, Aug 21, 2015 at 2:02 AM, Samuel Just > wrote:
> > What's supposed to happen is that the client transparently directs all
> > requests
${HOST}-${TYPE} weight
1.000/" cm
echo "Compile new CRUSHMAP"
crushtool -c cm -o cm.new
echo "Inject new CRUSHMAP"
ceph osd setcrushmap -i cm.new
#echo "Clean..."
#rm -rf cm cm.new
echo "Unset noout option for CEPH cluster"
ceph osd unset noout
ech
Will do, Sam!
thank in advance for you help!
2015-08-21 2:28 GMT+03:00 Samuel Just :
> Ok, create a ticket with a timeline and all of this information, I'll
> try to look into it more tomorrow.
> -Sam
>
> On Thu, Aug 20, 2015 at 4:25 PM, Voloshanenko Igor
> wrote:
>
As i we use journal collocation for journal now (because we want to utilize
cache layer ((( ) i use ceph-disk to create new OSD (changed journal size
on ceph.conf). I don;t prefer manual work))
So create very simple script to update journal size
2015-08-21 2:25 GMT+03:00 Voloshanenko Igor
Exactly
пятница, 21 августа 2015 г. пользователь Samuel Just написал:
> And you adjusted the journals by removing the osd, recreating it with
> a larger journal, and reinserting it?
> -Sam
>
> On Thu, Aug 20, 2015 at 4:24 PM, Voloshanenko Igor
> > wrote:
> > Right
PM, Samuel Just wrote:
> > Yeah, I'm trying to confirm that the issues did happen in writeback mode.
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 4:21 PM, Voloshanenko Igor
> > wrote:
> >> Right. But issues started...
> >>
> >> 2015-08-21
values). For any new images - no
2015-08-21 2:21 GMT+03:00 Voloshanenko Igor :
> Right. But issues started...
>
> 2015-08-21 2:20 GMT+03:00 Samuel Just :
>
>> But that was still in writeback mode, right?
>> -Sam
>>
>> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko
Right. But issues started...
2015-08-21 2:20 GMT+03:00 Samuel Just :
> But that was still in writeback mode, right?
> -Sam
>
> On Thu, Aug 20, 2015 at 4:18 PM, Voloshanenko Igor
> wrote:
> > WE haven't set values for max_bytes / max_objects.. and all data
> initi
arted...
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> > wrote:
> >> No, when we start draining cache - bad pgs was in place...
> >> We have big rebalance (disk by disk - to change journal side on both
> >> hot/cold layer
On Thu, Aug 20, 2015 at 4:11 PM, Voloshanenko Igor
> wrote:
> > No, when we start draining cache - bad pgs was in place...
> > We have big rebalance (disk by disk - to change journal side on both
> > hot/cold layers).. All was Ok, but after 2 days - arrived scrub errors
> and 2
&g
king correctly?)
> -Sam
>
> On Thu, Aug 20, 2015 at 4:07 PM, Voloshanenko Igor
> wrote:
> > Good joke )
> >
> > 2015-08-21 2:06 GMT+03:00 Samuel Just :
> >>
> >> Certainly, don't reproduce this with a cluster you care about :).
>
ient or on the osd. Why did
> > you have it in that mode?
> > -Sam
> >
> > On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> > wrote:
> >> We used 4.x branch, as we have "very good" Samsung 850 pro in
> production,
> >> and they don;t suppor
pool,
> that's probably where the bug is. Odd. It could also be a bug
> specific 'forward' mode either in the client or on the osd. Why did
> you have it in that mode?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:58 PM, Voloshanenko Igor
> wrote:
> > We use
2015 at 3:56 PM, Voloshanenko Igor
> wrote:
> > root@test:~# uname -a
> > Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22
> UTC
> > 2015 x86_64 x86_64 x86_64 GNU/Linux
> >
> > 2015-08-21 1:54 GMT+03:00 Samuel Just :
> >>
> >> Also
ist.
2015-08-21 1:56 GMT+03:00 Voloshanenko Igor :
> root@test:~# uname -a
> Linux ix-s5 4.0.4-040004-generic #201505171336 SMP Sun May 17 17:37:22 UTC
> 2015 x86_64 x86_64 x86_64 GNU/Linux
>
> 2015-08-21 1:54 GMT+03:00 Samuel Just :
>
>> Also, can you include the kernel ver
.for sake of closing the thread.
> >>
> >> On 17 August 2015 at 21:15, Voloshanenko Igor <
> igor.voloshane...@gmail.com>
> >> wrote:
> >>>
> >>> Hi all, can you please help me with unexplained situation...
> >>>
> >>
This was related to the caching layer, which doesnt support snapshooting
> per
> > docs...for sake of closing the thread.
> >
> > On 17 August 2015 at 21:15, Voloshanenko Igor <
> igor.voloshane...@gmail.com>
> > wrote:
> >>
> >> Hi all, can you p
two images.
> -Sam
>
> On Thu, Aug 20, 2015 at 3:42 PM, Voloshanenko Igor
> wrote:
> > Sam, i try to understand which rbd contain this chunks.. but no luck. No
> rbd
> > images block names started with this...
> >
> >> Actually, now that I think about it, yo
ke that easy) on
> an object to stdout so you can confirm what's actually there. oftc
> #ceph-devel or the ceph-devel mailing list would be the right place to
> ask questions.
>
> Otherwise, it'll probably get done in the next few weeks.
> -Sam
>
> On Thu
emove the spurious clone from the
> head/snapdir metadata.
>
> Am I right that you haven't actually seen any osd crashes or user
> visible corruption (except possibly on snapshots of those two images)?
> -Sam
>
> On Thu, Aug 20, 2015 at 10:07 AM, Voloshanenko Igor
his in the tracker?
> >> -Sam
> >>
> >> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
> >> wrote:
> >> > Issue, that in forward mode, fstrim doesn't work proper, and when we
> >> > take
> >> > snapshot - data not proper update
Not yet. I will create.
But according to mail lists and Inktank docs - it's expected behaviour when
cache enable
2015-08-20 19:56 GMT+03:00 Samuel Just :
> Is there a bug for this in the tracker?
> -Sam
>
> On Thu, Aug 20, 2015 at 9:54 AM, Voloshanenko Igor
> wrote:
> &
Thu, Aug 20, 2015 at 9:41 AM, Voloshanenko Igor
> wrote:
> > Samuel, we turned off cache layer few hours ago...
> > I will post ceph.log in few minutes
> >
> > For snap - we found issue, was connected with cache tier..
> >
> > 2015-08-20 19:23 GMT+03:00 Samuel
scrub both inconsistent pgs and post the
> ceph.log from before when you started the scrub until after. Also,
> what command are you using to take snapshots?
> -Sam
>
> On Thu, Aug 20, 2015 at 3:59 AM, Voloshanenko Igor
> wrote:
> > Hi Samuel, we try to fix it in trick way.
whole ceph.log from the 6 hours before and after the snippet you
> linked above? Are you using cache/tiering? Can you attach the osdmap
> (ceph osd getmap -o )?
> -Sam
>
> On Tue, Aug 18, 2015 at 4:15 AM, Voloshanenko Igor
> wrote:
> > ceph - 0.94.2
> > Its happen duri
'm not sure how that could happen and I'd expect the pg
> repair to handle that but if it's not there's probably something
> wrong; what version of Ceph are you running? Sam, is this something
> you've seen, a new bug, or some kind of config issue?
> -Greg
>
&
No. This will no help (((
I try to found data, but it's look exist with same time stamp on all osd or
missing on all osd ...
So, need advice , what I need to do...
вторник, 18 августа 2015 г. пользователь Abhishek L написал:
>
> Voloshanenko Igor writes:
>
> > Hi Irek,
No. This will no help (((
I try to found data, but it's look exist with same time stamp on all osd or
missing on all osd ...
So, need advice , what I need to do...
вторник, 18 августа 2015 г. пользователь Abhishek L написал:
>
> Voloshanenko Igor writes:
>
> > Hi Irek,
-- Пересылаемое сообщение -
От: *Voloshanenko Igor*
Дата: вторник, 18 августа 2015 г.
Тема: Repair inconsistent pgs..
Кому: Irek Fasikhov
Some additional inforamtion (Tnx Irek for questions!)
Pool values:
root@test:~# ceph osd pool get cold-storage size
size: 3
root@test
repair'
> | awk {'print$1'}`;do ceph pg repair $i;done
>
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>
> 2015-08-18 8:27 GMT+03:00 Voloshanenko Igor :
>
>> Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs
>>
Hi all, at our production cluster, due high rebalancing ((( we have 2 pgs
in inconsistent state...
root@temp:~# ceph health detail | grep inc
HEALTH_ERR 2 pgs inconsistent; 18 scrub errors
pg 2.490 is active+clean+inconsistent, acting [56,15,29]
pg 2.c4 is active+clean+inconsistent, acting [56,10,
Hi all, can you please help me with unexplained situation...
All snapshot inside ceph broken...
So, as example, we have VM template, as rbd inside ceph.
We can map it and mount to check that all ok with it
root@test:~# rbd map cold-storage/0e23c701-401d-4465-b9b4-c02939d57bb5
/dev/rbd0
root@test
ts, and also have a
> > higher
> > than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended
> > 1/3 to 1/6.
> >
> > So, as a conclusion - I'll recommend you to get a bigger budget and buy
> > durable
> > and fast SSDs for Ceph.
>
).
> Those were very cheap but are out of stock at the moment (here).
> Faster than Intels, cheaper, and slightly different technology (3D V-NAND)
> which IMO makes them superior without needing many tricks to do its job.
>
> Jan
>
> On 13 Aug 2015, at 14:40, Voloshanenko Igor
&g
t; use already recommended SSD
>
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>
> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor
> :
>
>> So, after testing SSD (i wipe 1 SSD, and used it for tests)
>>
>> root@ix-s2:~# sudo fio --filename=/dev/sd
ormance went from 11MB/s (with Samsung
> SSD) to 30MB/s (without any SSD) on write performance. This is a very small
> cluster.
>
> Pieter
>
> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor <
> igor.voloshane...@gmail.com> wrote:
>
> Hi all, we have setup CEPH clust
Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 nodes, 12
disks on each, 10 HDD, 2 SSD)
Also we cover this with custom crushmap with 2 root leaf
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-100 5.0 root ssd
-102 1.0 host ix-s2-ssd
2 1.
50 matches
Mail list logo