from:"Rick Jones"

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-02 Thread Rick Jones


On 10/02/2013 02:14 AM, James Page wrote:



I tcpdump'ed the traffic and I see alot of duplicate acks which makes
me suspect some sort of packet fragmentation but its got me puzzled.

Anyone have any ideas about how to debug this further? or has anyone
seen anything like this before?


Duplicate ACKs can be triggered by missing or out-of-order TCP segments. 
 Presumably that would show-up in the tcpdump trace though it might be 
easier to see if you run the .pcap file through tcptrace -G.


Iperf may have a similar option, but if there are actual TCP 
retransmissions during the run, netperf can be told to tell you about 
them (when running under Linux):


netperf -H  -t TCP_STREAM -- -o 
throughput,local_transport_retrans,remote_transport_retrans


will give to 

and

netperf -H  -t TCP_MAERTS -- -o 
throughput,local_transport_retrans,remote_transport_retrans


will give from .  Or you can take snapshots of netstat -s output 
from before and after your iperf run(s) and do the math by hand.


rick jones
if the netperf in multiverse isn't new enough to grok the -o option, you 
can grab the top-of-trunk from http://www.netperf.org/svn/netperf2/trunk 
via svn.


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Network speed issue

2014-12-16 Thread Rick Jones


On 12/16/2014 09:38 AM, Adrián Norte Fernández wrote:

Disable offloading on the nodes with: ethtool -K interfaceName gro off
gso off tso off

And then try it again


That should only make things better if there was some sort of actual 
functional problem no?


When diagnosing "network" performance issues I like to get other things 
out of the way - so get rid of filesystems, encryption, etc.  To that 
end I would suggest running a basic network benchmark.  I have a natural 
bias towards netperf of course :)  http://www.netperf.org/ but even 
iperf would suffice.


I've also found (based on some ancient work back in the whatever the "D" 
release was called time frame that emulated NICs (in my case the 
realtek) are much slower than the virt_io "NIC."  In the case of the 
realteck emulation at least, my recollection was the maximum I got out 
of an instance was on the order of about 250ish Mbit/s as it happens...


happy benchmarking,

rick jones



El 16/12/2014 18:36, "Georgios Dimitrakakis" mailto:gior...@acmac.uoc.gr>> escribió:


Hi all!

In my OpenStack installation (Icehouse and use nova legacy
networking) the VMs are talking to each other over a 1Gbps network link.

My issue is that although file transfers between physical
(hypervisor) nodes can saturate that link transfers between VMs
reach very lower speeds e.g. 30MB/s (approx. 240Mbps).

My tests are performed by scp'ing a large image file (approx. 4GB)
between the nodes and between the VMs.

I have updated my images to use e1000 nic driver but the results
remain the same.

What are any other limiting factors?

Does it has to do with the disk driver I am using? Does it play
significant role the filesystem of the hypervisor node?

Any ideas on how to approach the saturation of the 1Gbps link?


Best regards,


George

_
Mailing list:
http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
Post to : openstack@lists.openstack.org
<mailto:openstack@lists.openstack.org>
Unsubscribe :
http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Network speed issue

2014-12-16 Thread Rick Jones


On 12/16/2014 11:09 AM, Georgios Dimitrakakis wrote:

Changing

gso on
tso on
gro off


got me back to the initial status.


Although now it starts with approximately 65-70MB/s  for a few seconds
but then it drops down to 30MB/s


What do you see if you use a "pure" networking benchmark such as netperf 
or iperf?


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Network speed issue

2014-12-16 Thread Rick Jones


On 12/16/2014 11:37 AM, Adrián Norte Fernández wrote:

Read the names carefully again :)
I was suggesting what I used to do in the past when on a new OpenStack
install I had this problem.


Indeed, I got the names crossed.  Anyway, running netperf is worthwhile 
even in a Neutron environment.  Run it early.  Run it often :)


rick


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Network speed issue

2014-12-16 Thread Rick Jones


On 12/16/2014 11:33 AM, Georgios Dimitrakakis wrote:

Rick,

I haven't tried that yet

I 'll do it asap and post the results.


Can you recommend any specific tests that I should run on netperf?


I would start with this on the VM from which you are executing the scp:

netperf -t TCP_STREAM -H 

If you happen to install from source (preferably top of trunk) I would 
suggest:


./configure --enable-demo

before building the netperf binary and then you can do something like:

netperf -t TCP_STREAM -H  -D 1.0 -l 

where I would make  10 or 20 seconds longer than it takes before 
the scp starts slowing down.


rick




Regards,


George


On 12/16/2014 11:09 AM, Georgios Dimitrakakis wrote:

Changing

gso on
tso on
gro off


got me back to the initial status.


Although now it starts with approximately 65-70MB/s  for a few seconds
but then it drops down to 30MB/s


What do you see if you use a "pure" networking benchmark such as
netperf or iperf?

rick jones

___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


--



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Network speed issue

2014-12-16 Thread Rick Jones


On 12/16/2014 11:28 AM, Adrián Norte Fernández wrote:

I use Neutron so..


I thought you said you were using Nova legacy networking:


In my OpenStack installation (Icehouse and use nova legacy networking)


In any event what I am suggesting is you install either netperf or iperf 
into both of the VMs and run a "pure" networking benchmark between them. 
 That way you do not have to worry about anything like the 4GB file 
transfer filling the destination VM's file cache and then getting stuck 
behind slow flushing out to storage, or any concerns with the encryption 
overhead in scp etc. etc.


If you install netperf into the two VMs, I can talk you through the 
process of how to run some of the more "interesting" tests.


rick



El 16/12/2014 20:27, "Rick Jones" mailto:rick.jon...@hp.com>> escribió:

On 12/16/2014 11:09 AM, Georgios Dimitrakakis wrote:

Changing

gso on
tso on
gro off


got me back to the initial status.


Although now it starts with approximately 65-70MB/s  for a few
seconds
but then it drops down to 30MB/s


What do you see if you use a "pure" networking benchmark such as
netperf or iperf?

rick jones

_
Mailing list:
http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
Post to : openstack@lists.openstack.org
<mailto:openstack@lists.openstack.org>
Unsubscribe :
http://lists.openstack.org/__cgi-bin/mailman/listinfo/__openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Network speed issue

2014-12-16 Thread Rick Jones


Both results show a really high bandwidth but I can't get that speed for
file transfers.


Well, at least we can more or less exonerate networking :)  Though 
frankly I've always felt that virtio was better than an emulated NIC.


How much memory to you have in the VM receiving the 4 GB file being 
transferred?  One hypothesis, and it is only a hypothesis, is that the 
reason the file transfer goes quicker at the beginning (if I recall 
correctly) is the receiving VM is still just writing to filecache within 
the VM itself.  Once the number of ditry pages in the filecache gets to 
some limit (which I can never remember) a flushing daemon will kick-in 
and start pushing pages out to "disc" (well, what the VM thinks of as 
its disc, whatever you've setup there).  At that point there may still 
be something of a "race" between the arriving bytes of the file and the 
flushing and if there ends-up being enough memory consumed by the 
filecache, writes to the file will end-up getting blocked, waiting for 
space to free-up.  And that won't happen any faster than the flusher can 
get them out.


And then there can be the similar thing happening between the VM and the 
host, where in this part we can think of the VM as being akin to the scp 
receiving process.


happy benchmarking,

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Image upload times out ...

2015-01-27 Thread Rick Jones


On 01/27/2015 11:58 AM, Azher Mughal wrote:

Hi,

I just installed OpenStack Juno. Under Images, trying to create a new
Image by uploading from the local system a CentOS 7 generic cloud image
(about 943MB). However it takes hours and nothing happens.

Is there a way to upload the image from the shell ?


I would think the glance cli - glance image-create for that?

rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] ARP Caching

2015-03-13 Thread Rick Jones


On 03/13/2015 04:55 PM, Georgios Dimitrakakis wrote:

The value is 10

Do you believe that it should be bigger?


These accesses of instances which are being delayed by 20 minutes - are 
they delayed for other instances in the subnet, or just for accesses 
from outside the subnet (ie through a real router)?


There may be some rather "conservative" routers out there which might 
not accept gratuitous ARPs, considering it more secure to ARP for those 
IPs explicitly itself.


rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] ARP Caching

2015-03-14 Thread Rick Jones


On 03/13/2015 06:25 PM, Georgios Dimitrakakis wrote:

If I do an :

arping -U -I etho x.x.x.x

where x.x.x.x is the IP address.

I can almost immediately access them outside of the subnet!


I had forgotten that,  Still is that the very same set of options the 
OpenStack code uses?



Do you mean that OpenStack is sending a gratuitous ARP for all
and the router is ignoring them unless it is for a specific IP address?

If this is the case is there anything I can do?


I will second the suggestion of getting a packet trace to see just what 
sort of ARP traffic the compute node(s) send and then compare that with 
the documentation for your router.


rick



Regards,

George



On 03/13/2015 04:55 PM, Georgios Dimitrakakis wrote:

The value is 10

Do you believe that it should be bigger?


These accesses of instances which are being delayed by 20 minutes -
are they delayed for other instances in the subnet, or just for
accesses from outside the subnet (ie through a real router)?

There may be some rather "conservative" routers out there which might
not accept gratuitous ARPs, considering it more secure to ARP for
those IPs explicitly itself.

rick jones


___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] ARP Caching

2015-03-14 Thread Rick Jones


On 03/14/2015 03:32 PM, Georgios Dimitrakakis wrote:

Hello again Rick!



On 03/13/2015 06:25 PM, Georgios Dimitrakakis wrote:

If I do an :

arping -U -I etho x.x.x.x

where x.x.x.x is the IP address.

I can almost immediately access them outside of the subnet!


I had forgotten that,  Still is that the very same set of options the
OpenStack code uses?



I am not aware about that and I would like to know!
What does Openstack does and how ofted does it send the gratuitous ARP
request?


I don't recall off the top of my head (perhaps someone else does) - and 
my corner of the world is Neutron rather than Nova networking.  If there 
is much logging enabled I suspect you could see the commands in the 
logs.  Certainly Neutron is very "chatty" when it comes to logging things.









Do you mean that OpenStack is sending a gratuitous ARP for all
and the router is ignoring them unless it is for a specific IP address?

If this is the case is there anything I can do?


I will second the suggestion of getting a packet trace to see just
what sort of ARP traffic the compute node(s) send and then compare
that with the documentation for your router.

rick




I will try to see if I can get anything but the problem is that the
datacenter hosting the facility is in Japan and there is a huge gap
communicating with them (actually there is no communication).

If I could only find out what does the router accept would be very nice :-)


Nothing beats being able to see the blinking lights :)

rick


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] ARP Caching

2015-03-16 Thread Rick Jones


On 03/16/2015 01:13 PM, Georgios Dimitrakakis wrote:


Hi again! I 've used tcpdump to capture the ARP requests.

This is what I get for ARP requests coming from "send_arp_for_ha" by
OpenStack:

04:56:25.995860 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 tell 15.12.11.226, length 46
04:56:25.995894 ARP, Ethernet (len 6), IPv4 (len 4), Reply 15.12.11.34
is-at 70:e2:84:0b:59:a0 (oui Unknown), length 28
04:56:31.889363 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.228 tell 15.12.11.227, length 46
04:56:32.581746 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.228 tell 15.12.11.227, length 46
04:56:33.482519 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.228 tell 15.12.11.227, length 46
04:56:34.382775 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.228 tell 15.12.11.227, length 46
04:56:35.082594 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.228 tell 15.12.11.227, length 46
04:56:47.121268 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.240 tell 15.12.11.226, length 46
04:56:47.699135 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 tell 15.12.11.227, length 46
04:56:47.699162 ARP, Ethernet (len 6), IPv4 (len 4), Reply 15.12.11.34
is-at 70:e2:84:0b:59:a0 (oui Unknown), length 28


I'm guessing none of that was broadcast.  Which could be a large 
difference between that and what you did manually.


You didn't by any chance happen to capture the packets to a binary file 
so you can re-read the .pcap with the -e option (show ethernet info) and 
perhaps -v ?



rick jones


If I manually try with the


"arping -c 10 -U -I eth0 15.12.11.34" command, tcpdump logs the following:


04:57:15.907743 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:16.907843 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:17.907943 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:18.908068 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:19.908165 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:20.908288 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:21.908375 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:22.908551 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:23.908652 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28
04:57:24.908846 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
15.12.11.34 (Broadcast) tell 15.12.11.34, length 28




The IP addresses ending at *.226 and *.227 are according to the
datacenter spare gateway addresses.


Is it possible that OpenStack ARP requests is just to verify if the IP
address is known and by which machine
rather than sending update requests???



Kind regards,


George





I will try to see if I can get anything from the logs.
If someone else can come to a better suggestion I am all ears.

Not having full access to the DataCenter and the underlying equipment
has caused me enough headaches so far :-(


Best,


George



On 03/14/2015 03:32 PM, Georgios Dimitrakakis wrote:

Hello again Rick!



On 03/13/2015 06:25 PM, Georgios Dimitrakakis wrote:

If I do an :

arping -U -I etho x.x.x.x

where x.x.x.x is the IP address.

I can almost immediately access them outside of the subnet!


I had forgotten that,  Still is that the very same set of options the
OpenStack code uses?



I am not aware about that and I would like to know!
What does Openstack does and how ofted does it send the gratuitous ARP
request?


I don't recall off the top of my head (perhaps someone else does) -
and my corner of the world is Neutron rather than Nova networking. If
there is much logging enabled I suspect you could see the commands in
the logs.  Certainly Neutron is very "chatty" when it comes to logging
things.








Do you mean that OpenStack is sending a gratuitous ARP for all
and the router is ignoring them unless it is for a specific IP
address?

If this is the case is there anything I can do?


I will second the suggestion of getting a packet trace to see just
what sort of ARP traffic the compute node(s) send and then compare
that with the documentation for your router.

rick




I will try to see if I can get anything but the problem is that the
datacenter hosting the facility is in Japan and there is a huge gap
communicating with them (actually there is no communication).

If I could only find out what does the router accept would be very
nice :-)


Nothing be

Re: [Openstack] Disk performances much slower in Centos 6.7 guests than on ubuntu 14.04 guests

2015-08-20 Thread Rick Jones


On 08/20/2015 10:39 AM, J-P Methot wrote:

Hi,

I've made a custom centos 6.7 image on a proxmox with kvm hypervisor
with the goal of using it with openstack. I setup all the necessary
cloud-init packages that are needed by openstack. Everything seems to
work fine on openstack, except that the drive speed in fio is about 50%
of what I can achieve on other images.

Basically, on a ubuntu 14.04 with the following FIO command, I can reach
about 1 GB/sec :

fio --name=testfio --filename=testfio --bs=4m --rw=write --size=5g

However, on my centos 6.7 image, I will only reach about 400 MB/sec. Why
is there such a huge discrepancy between both OS? I've always used
virtio, both on the proxmox hypervisor and on openstack.



I cannot speak to disc performance, but I can say there is a large 
difference in network performance between an instance running CentOS7 
with its 3.10 kernel, and an instance running either 14.04 (3.13) or 
15.04 (3.19).  There were virtio_net driver changes around buffer 
handling which appear to be the cause.  Perhaps something similar 
happened for the disc I/O path and what is likely the even older kernel 
in CentOS6?


happy benchmarking,

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Connection between VMS

2013-10-09 Thread Rick Jones


On 10/09/2013 05:32 PM, Guilherme Russi wrote:

Hello guys,

  I have some VMs and I'd like to connect them through their name, for
example, my VMs are named cloud-vm01 to cloud-vmn but I can't ssh from
cloud-vm01 in cloud-vm02 doing "ssh user@cloud-vm01".
  How can I workaround it?


When you say "can't ssh" can you be a bit more explicit?  What sort of 
error message do you get when you try to ssh?  The answer to that will 
probably guide responses.


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Connection between VMS

2013-10-10 Thread Rick Jones


On 10/09/2013 06:55 PM, Guilherme Russi wrote:

Hello Rick,

Here is the command:

ubuntu@small-vm02:~$ ssh ubuntu@small-vm03
ssh: Could not resolve hostname small-vm03: Name or service not known

My point is, I have my cloud-vm01 with IP 10.5.5.3 and I want ssh to my
cloud-vm02 with IP 10.5.5.4, but I can't simply do "ssh ubuntu@10.5.5.4
<mailto:ubuntu@10.5.5.4>" because the IP 10.5.5.4 can be attached to my
cloud-vm03, for example, so, I want to know if there's a way to ssh
using "ssh ubuntu@cloud-vm02"


Someone else will have to speak to the status of any DNSaaS available in 
OpenStack or your cloud provider.


In the realm of stone knives and bear skins, you could create your 
instances with explicitly assigned private IPs (I'm ass-u-me-ing those 
are private IPs).  If you were consistent about creating your instances 
with explicitly selected private IPs, you could create a static 
/etc/hosts file to place on all your instances to map from hostname to 
IP.  Then you would be able to use names to access them.


rick jones




Regards.


2013/10/9 Rick Jones mailto:rick.jon...@hp.com>>

On 10/09/2013 05:32 PM, Guilherme Russi wrote:

Hello guys,

   I have some VMs and I'd like to connect them through their
name, for
example, my VMs are named cloud-vm01 to cloud-vmn but I can't
ssh from
cloud-vm01 in cloud-vm02 doing "ssh user@cloud-vm01".
   How can I workaround it?


When you say "can't ssh" can you be a bit more explicit?  What sort
of error message do you get when you try to ssh?  The answer to that
will probably guide responses.

rick jones





___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Very slow connectivity from within tenant network - GRE

2013-10-22 Thread Rick Jones

On 10/22/2013 01:32 AM, Martinx - ジェームズ wrote:

Stackers,

I'm trying to put my Havana into production and I'm facing a very
strange problem.

The Internet connectivity from tenant's subnet is very, very slow. It is
useless in fact... I can not even use "apt-get update" from a Instance.

The following command works (apt update from the tenant namespace):

---
root@net-node-1:~# ip netns exec qrouter-X aptitude update
---

But not from the tenant subnet...

I'm following this topology:

http://docs.openstack.org/trunk/install-guide/install/apt/content/section_use-cases-tenant-router.html

Already tried to change MTUs (via DHCP agent)... Nothing had fixed this
weird issue.

Any thoughts?!

Right now, my "aptitude safe-upgrade" will take 2 days to download
60MB... During this network outages, even the SSH session stops
responding for a few seconds...

Everything else seems to be working as expected, as for example, DHCP,
Floating IPs, Security Groups...

Sometimes, even the first ssh connection to the Instance Floating IP,
have a lag.

It is but a guess, but I wonder if, even with changing MTUs (to what
values?) you may still be experiencing a PathMTU+ICMP blackhole problem
accessing nodes on the Internet. Can you access something that is a bit
"closer" but still outside your stack so you have a shot at looking at
netstat statistics on the sender and/or get packet traces on the sender?

You could still try taking packet traces at the instance or perhaps the
namespace and try to discern packet losses at the receiving side, though
it can be a bit more difficult.

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-23 Thread Rick Jones


On 10/23/2013 05:40 PM, Aaron Rosen wrote:

I'm curious if you can do the following tests to help pinpoint the
bottle neck:

Run iperf or netperf between:
two instances on the same hypervisor - this will determine if it's a
virtualization driver issue if the performance is bad.
two instances on different hypervisors.
one instance to the namespace of the l3 agent.


If you happen to run netperf, I would suggest something like:

netperf -H  -t TCP_STREAM -l 30 -- -m 64K -o 
throughput,local_transport_retrans


If you need data flowing the other direction, then I would suggest:

netperf -H  -t TCP_MAERTS -l 30 -- -m ,64K -o 
throughput,remote_transport_retrans



You could add ",transport_mss" to those lists after the -o option if you 
want.


What you will get is throughput (in 10^6 bits/s) and the number of TCP 
retransmissions for the data connection (assuming the OS running in the 
instances is Linux).  Netperf will present 64KB of data to the transport 
in each send call, and will run for 30 seconds.  The socket buffer sizes 
will be at their defaults - which under linux means they will autotune.


happy benchmarking,

rick jones

For extra credit :) you can run:

netperf -t TCP_RR -H  -l 30

if you are curious about latency.

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Rick Jones


On 10/25/2013 08:19 AM, Martinx - ジェームズ wrote:

I think can say... "YAY!!":-D

With "LibvirtOpenVswitchDriver" my internal communication is the double
now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to
*_400Mbit/s_* (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s
(my physical path limit) but, more acceptable now.

The command "ethtool -K eth1 gro off" still makes no difference.


Does GRO happen if there isn't RX CKO on the NIC?  Can your NIC 
peer-into a GRE tunnel (?) to do CKO on the encapsulated traffic?



So, there is only 1 remain problem, when traffic pass trough L3 /
Namespace, it is still useless. Even the SSH connection into my
Instances, via its Floating IPs, is slow as hell, sometimes it just
stops responding for a few seconds, and becomes online again
"out-of-nothing"...

I just detect a weird "behavior", when I run "apt-get update" from
instance-1, it is slow as I said plus, its ssh connection (where I'm
running apt-get update), stops responding right after I run "apt-get
update" AND, _all my others ssh connections also stops working too!_ For
a few seconds... This means that when I run "apt-get update" from within
instance-1, the SSH session of instance-2 is affected too!! There is
something pretty bad going on at L3 / Namespace.

BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE
tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than
a half...


I would suggest checking for individual CPUs maxing-out during the 400 
Mbit/s transfers.


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Rick Jones

capsulation protocol.  Unless the NIC knows about the encapsulation
protocol, all the NIC knows it has is some slightly alien packet.  It
will probably know it is IP, but it won't know more than that.

It could, perhaps, simply compute an Internet Checksum across the entire
IP datagram and leave it to the driver to fix-up.  It could simply punt
and not perform any CKO at all.  But CKO is the foundation of the
stateless offloads.  So, certainly no LRO and (I think but could be
wrong) no GRO.  (At least not until the Linux stack learns how to look
beyond the encapsulation headers.)

Similarly, consider the outbound path.  We could change the constants we
tell the NIC for doing CKO perhaps, but unless it knows about the
encapsulation protocol, we cannot ask it to do the TCP segmentation of
TSO - it would have to start replicating not only the TCP and IP
headers, but also the headers of the encapsulation protocol.  So, there
goes TSO.

In essence, using an encapsulation protocol takes us all the way back to
the days of 100BT in so far as stateless offloads are concerned.
Perhaps to the early days of 1000BT.

We do have a bit more CPU grunt these days,  but for the last several
years that has come primarily in the form of more cores per processor,
not in the form of processors with higher and higher frequencies.  In
broad handwaving terms, single-threaded performance is not growing all
that much.  If at all.

That is why we have things like multiple queues per NIC port now and
Receive Side Scaling (RSS) or Receive Packet Scaling/Receive Flow
Scaling in Linux (or Inbound Packet Scheduling/Thread Optimized Packet
Scheduling in HP-UX etc etc).  RSS works by having the NIC compute a
hash over selected headers of the arriving packet - perhaps the source
and destination MAC addresses, perhaps the source and destination IP
addresses, and perhaps the source and destination TCP ports.  But now
the arrving traffic is all wrapped up in this encapsulation protocol
that the NIC might not know about.  Over what should the NIC compute the
hash with which to pick the queue that then picks the CPU to interrupt?
 It may just punt and send all the traffic up one queue.

There are similar sorts of hashes being computed at either end of a
bond/aggregate/trunk.  And the switches or bonding drivers making those
calculations may not know about the encapsulation protocol, so they may
not be able to spread traffic across multiple links.   The information
they used to use is now hidden from them by the encapsulation protocol.

That then is what I was getting at when talking about NICs peering into GRE.

rick jones
All I want for Christmas is a 32 bit VLAN ID and NICs and switches which
understand it... :)

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Directional network performance issues with Neutron + OpenvSwitch

2013-10-25 Thread Rick Jones


On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:

WOW!! Thank you for your time Rick! Awesome answer!!=D

I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
that this is the main root of the problem?!


I mean, I'm seeing two distinct problems here:

1- Slow connectivity to the External network plus SSH lags all over the
cloud (everything that pass trough L3 / Namespace is problematic), and;

2- Communication between two Instances on different hypervisors (i.e.
maybe it is related to this GRO / CKO thing).


So, two different problems, right?!


One or two problems I cannot say.Certainly if one got the benefit of 
stateless offloads in one direction and not the other, one could see 
different performance limits in each direction.


All I can really say is I liked it better when we were called Quantum, 
because then I could refer to it as "Spooky networking at a distance." 
 Sadly, describing Neutron as "Networking with no inherent charge" 
doesn't work as well :)


rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] MTU 1400? No thanks! - VXLAN users not affected!

2013-10-28 Thread Rick Jones


On 10/27/2013 11:59 AM, Martinx - ジェームズ wrote:

Guys,

I am not detecting any problems related to "MTU = 1500" when using VXLAN!

It is easy to reproduce the "GRE MTU problem", when using GRE tunnels
with MTU = 1500, from a Instance, it is impossible to use RubyGems
(Ubuntu 12.04 Instance), for example, "gem install bundler" (neither
"bundle install" your RoR App) doesn't work with MTU 1500 on top of a
GRE tunnel, with MTU 1400 is fine.

But, now with VXLAN, I'm using MTU 1500 again without any problems!

Honestly, I don't know for sure if RubyGems changed too or, if it is
related to VXLAN being a bit better than GRE, anyway, I'll keep my
Havana cloud running with MTU 1500 from now on.


I'm not certain, but if you dig a little - either a tcpdump trace in one 
of your instances, or a netperf -t TCP_STREAM -H  -- -o 
transport_mss, that in your VXLAN case you may be automagically getting 
a TCP MSS that does not result in fragmentation/etc.


rick


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Auto assign Floating IP

2013-12-10 Thread Rick Jones


On 12/10/2013 12:29 AM, Jitendra Kumar Bhaskar wrote:

I am doing d same but I am looking to attach floating IP at boot time.
And not getting where I need to change ?


It would not be "automagic" but rather than have nova allocate the port 
out of neutron, you can instead allocate the port yourself explicitly 
via neutron, then pass that port id through the floatingip-create and 
then boot the instance with that port id rather than with the network id.


Then I suspect the chances are reasonably good that as the guest OS 
boots in the instance, the floater will be associated.


rick jones
agrees though that "full automagic" would be nice.



Regards*
Jitendra Bhaskar*






On Tue, Dec 10, 2013 at 12:45 PM, 郭龙仓 mailto:guolongcang.w...@gmail.com>> wrote:

floating ip is implemented through NAT , so , once your instance was
assigned an internal ip successfully , then you can assign floating
ip to it . but , in order to assign floating IP to instance at boot
time, maybe you need to modify some code.


2013/12/9 Jitendra Kumar Bhaskar mailto:jitendr...@pramati.com>>

Hi Stackers

Is it possible to assign floating IP to instance at boot time in
Hanana ?

Regards*
Jitendra Bhaskar*





___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
<mailto:openstack@lists.openstack.org>
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack





___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] [Swift] Swift on RHEL

2013-12-10 Thread Rick Jones


On 12/10/2013 03:23 PM, Kotwani, Mukul wrote:

A new piece of data..

We used the disable_fallocate configurable on Ubuntu, and the numbers do 
reduce, but they are nowhere near the numbers for Redhat (5.8).

As an example, for a specific test:
PUT for Ubuntu, default parameters: ~140 ops/sec
PUT for Ubuntu, disable_fallocate=true: ~100 ops/sec
PUT for Redhat 5.8, default parameters: ~15 ops/sec

Has anyone seen this? We are not able to find anything that can
explain this kind of discrepancy. FYI, this is the same hardware(we
are switching between Ubuntu and RH, to keep things the same) and
software(Swift and xfs, same versions for both), same ring
configuration, same test scripts. It seems like Ubuntu is far ahead,
and we don't have any root cause ATM.  PUTs are written though to the
disk and not cached, so I am not sure what to attribute this to.


Are RH 5.8 (2.6.32 with assorted backports) and what you later say is
Ubuntu 13.04 (3.8 kernel?) using the same I/O scheduler in their drivers?

rick jones




Has anyone run Swift on RH and done comparisons with Ubuntu?

Any help/pointers would be great!

Thanks,
Mukul


-Original Message-
From: Kotwani, Mukul
Sent: Thursday, December 05, 2013 6:44 PM
To: Pete Zaitcev
Cc: OpenStack Mailing List
Subject: Re: [Openstack] [Swift] Swift on RHEL

Thanks Pete!

We were using Grizzly, which is over a release old, with Folsom Keystone for 
RHEL. So not recent at all. Is there something that would be missing in RHEL5.x 
which would cause performance issues? I saw some references to fallocate, but 
not much beyond that. Is this something you would expect to see?

Which was the first RHEL release was Swift supported on officially?  I tried to 
find a support matrix or supportable platforms but could not find anything..

Mukul

-Original Message-
From: Pete Zaitcev [mailto:zait...@redhat.com]
Sent: Thursday, December 05, 2013 4:36 PM
To: Kotwani, Mukul
Cc: OpenStack Mailing List
Subject: Re: [Openstack] [Swift] Swift on RHEL

On Thu, 5 Dec 2013 21:19:49 +
"Kotwani, Mukul"  wrote:


Has anyone used and/or deployed RHEL (5.8) with Swift?
I was also looking for a "Supported platforms" for Swift, and I could
not find it.


I don't think an RDO for RHEL 5.x ever existed. First packages were built a 
year after RHEL 6 GA shipped. The oldest build in Koji is 
openstack-swift-1.0.2-5.fc15 (a community build by Silas), and the oldest RHOS 
build is openstack-swift-1.4.8-2.el6 from 2012.

Frankly I'm surprised you managed to get it running at all. (Which Swift 
release is that, BTW? We even require PBR nowadays.)

I dimly remember bothering with XFS for RHEL 5, but it was so long ago that I 
cannot even remember if I got that cluster to do anything useful.

-- Pete

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] [Swift] Swift on RHEL

2013-12-10 Thread Rick Jones


On 12/10/2013 04:12 PM, Rick Jones wrote:

On 12/10/2013 03:23 PM, Kotwani, Mukul wrote:

A new piece of data..

We used the disable_fallocate configurable on Ubuntu, and the numbers
do reduce, but they are nowhere near the numbers for Redhat (5.8).

As an example, for a specific test:
PUT for Ubuntu, default parameters: ~140 ops/sec
PUT for Ubuntu, disable_fallocate=true: ~100 ops/sec
PUT for Redhat 5.8, default parameters: ~15 ops/sec

Has anyone seen this? We are not able to find anything that can
explain this kind of discrepancy. FYI, this is the same hardware(we
are switching between Ubuntu and RH, to keep things the same) and
software(Swift and xfs, same versions for both), same ring
configuration, same test scripts. It seems like Ubuntu is far ahead,
and we don't have any root cause ATM.  PUTs are written though to the
disk and not cached, so I am not sure what to attribute this to.


Are RH 5.8 (2.6.32 with assorted backports)


Or is that 2.6.18 and backports?  I may have been thinking of RHEL6.

rick jones


and what you later say is
Ubuntu 13.04 (3.8 kernel?) using the same I/O scheduler in their drivers?

rick jones




Has anyone run Swift on RH and done comparisons with Ubuntu?

Any help/pointers would be great!

Thanks,
Mukul


-Original Message-
From: Kotwani, Mukul
Sent: Thursday, December 05, 2013 6:44 PM
To: Pete Zaitcev
Cc: OpenStack Mailing List
Subject: Re: [Openstack] [Swift] Swift on RHEL

Thanks Pete!

We were using Grizzly, which is over a release old, with Folsom
Keystone for RHEL. So not recent at all. Is there something that would
be missing in RHEL5.x which would cause performance issues? I saw some
references to fallocate, but not much beyond that. Is this something
you would expect to see?

Which was the first RHEL release was Swift supported on officially?  I
tried to find a support matrix or supportable platforms but could not
find anything..

Mukul

-Original Message-
From: Pete Zaitcev [mailto:zait...@redhat.com]
Sent: Thursday, December 05, 2013 4:36 PM
To: Kotwani, Mukul
Cc: OpenStack Mailing List
Subject: Re: [Openstack] [Swift] Swift on RHEL

On Thu, 5 Dec 2013 21:19:49 +
"Kotwani, Mukul"  wrote:


Has anyone used and/or deployed RHEL (5.8) with Swift?
I was also looking for a "Supported platforms" for Swift, and I could
not find it.


I don't think an RDO for RHEL 5.x ever existed. First packages were
built a year after RHEL 6 GA shipped. The oldest build in Koji is
openstack-swift-1.0.2-5.fc15 (a community build by Silas), and the
oldest RHOS build is openstack-swift-1.4.8-2.el6 from 2012.

Frankly I'm surprised you managed to get it running at all. (Which
Swift release is that, BTW? We even require PBR nowadays.)

I dimly remember bothering with XFS for RHEL 5, but it was so long ago
that I cannot even remember if I got that cluster to do anything useful.

-- Pete

___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack






___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] disassociation of floating IPs from instance remove them from tenant

2013-12-16 Thread Rick Jones


On 12/16/2013 06:35 AM, George Shuklin wrote:

Goog day.

Havanna, nova(kvm) + neutron(OVS).

When user disassociate floating IP from instance, it (floating IP)
disappear from tenant (and no longer available for allocation without
administrator intervention). Is that normal behaviour or this is  a bug?


If all you have done is disassociate the floating IP from the instance 
(port), and it no longer appears in the output of neutron 
floatingip-list for that tenant, that is IMO a bug.


rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] [neutron] More than one floating ip

2013-12-27 Thread Rick Jones


On 12/21/2013 10:55 PM, Cristian Falcas wrote:

No, it's not possible.


Strictly speaking, a floating IP address is associated with a port 
right?  And instances can have multiple ports/vNICs.  So, while one 
might not be able to associate more than one floating IP to a port, it 
does seem possible to have more than one floating IP for an instance.


rick jones



On Fri, Dec 20, 2013 at 11:23 PM, Guillermo Alvarado
 wrote:

Hi everybody,

It is possible to assign two floating ip's to an a single instance in
havana? How can I perform that.

Any help would be appreciated.

Thanks in advance,
~GA

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] VM forwarding packets

2014-01-17 Thread Rick Jones


On 01/17/2014 05:23 AM, Heinonen, Johanna (NSN - FI/Espoo) wrote:

Hi,
I have a need to forward packets through a VM. This means that some
packets have source IP address that is not configured for the VM (ip
spoofing).
I tried to  modify firewall.py  and disable the filters in nova-base, I
also followed instructions from
_https://lists.launchpad.net/openstack/msg20669.html_ but with no effect.
Finally I fixed the problem by manually deleting the rule from iptables.
This is not so elegant solution and therefore I’d like to ask what is
the recommended way to do this.
I am running Havana with ML2 plugin and ovs.
Best regards,
Johanna


--port_security_enabled False  perhaps?  At least when using Neutron...

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Tenant List

2014-06-20 Thread Rick Jones

Might this be an example of different people seeing different things 
because they are looking at different versions of the nova CLI?


rick jones
(In the version of nova I happen to use - 2.17.0.65 - I see a 
--tenant_id option rather than a --tenant option in the output of nova 
help ...)


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Why doesn't suspend release vCPUs/memory?

2014-06-23 Thread Rick Jones



Maybe seperate quotas for active vs suspended?  


That sounds overly complicated.  At the risk of demonstrating some 
ignroance on my part, w have to keep in mind it isn't just quota for the 
tenant, but actual available resource on the compute node yes?  If 
suspended instances are going to be dropped from the tenant's quota, one 
would, presumably, reasonably argue that the compute node on which that 
instance was scheduled should then be marked as having more room, (or 
there is the question of whether the tenant would be able to launch 
anything more in the first place) which then means having to check not 
only the tenant's quota but also the actual available capacity on the 
compute node.


For instances which do not need to be up all that often, but seem to 
need a persistent "state," which seems to be what is driving this 
desire, I would think that volumes would be the better/less complicated 
way to go.  Boot from and store in Cinder volumes and then they can come 
and go as you will.


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] periodic packet loss in openvswitch

2014-10-15 Thread Rick Jones


On 10/14/2014 09:10 PM, Michael Gale wrote:


I have seen something similar in the past under two conditions:
1. When a switch buffers have been overloaded due to excessive UDP
traffic, the switch ended up sending out the data on all ports.


Do you mean when a switch's forwarding table fills because it has 
started seeing more MAC addresses than it is designed to handle?


A switch which started flooding traffic because its packet buffers were 
full would be very, well, interesting :)


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Poor instance network performance

2015-12-17 Thread Rick Jones

On 12/17/2015 08:11 AM, Satish Patel wrote:
> Following is TOP command on guest machine, at this point i am getting
> ping breaks

Just to clarify, when you say "getting ping breaks" you mean that is 
when you start seeing pings not getting responses yes?

> top - 16:10:30 up 20:46,  1 user,  load average: 4.63, 4.41, 3.60
> Tasks: 165 total,   6 running, 159 sleeping,   0 stopped,   0 zombie
> %Cpu0  : 15.1 us, 12.3 sy,  0.0 ni, 60.9 id,  0.0 wa,  0.0 hi,  0.3 
si, 11.4 st
> %Cpu1  : 22.9 us, 17.2 sy,  0.0 ni, 51.4 id,  0.0 wa,  0.0 hi,  0.3 
si,  8.2 st
> %Cpu2  : 28.8 us, 22.4 sy,  0.0 ni, 47.5 id,  0.0 wa,  0.0 hi,  1.0 
si,  0.3 st
> %Cpu3  : 16.6 us, 15.0 sy,  0.0 ni, 66.4 id,  0.0 wa,  0.0 hi,  0.3 
si,  1.7 st
> %Cpu4  :  9.8 us, 11.8 sy,  0.0 ni,  0.0 id, 75.4 wa,  0.0 hi,  0.3 
si,  2.6 st
> %Cpu5  :  7.6 us,  6.1 sy,  0.0 ni, 81.4 id,  0.0 wa,  0.0 hi,  4.2 
si,  0.8 st
> %Cpu6  :  8.1 us,  7.4 sy,  0.0 ni, 83.0 id,  0.0 wa,  0.0 hi,  1.4 
si,  0.0 st
> %Cpu7  : 17.8 us, 17.8 sy,  0.0 ni, 64.1 id,  0.0 wa,  0.0 hi,  0.3 
si,  0.0 st
> KiB Mem :  8175332 total,  4630124 free,   653284 used,  2891924 
buff/cache
> KiB Swap:0 total,0 free,0 used.  7131540 
avail Mem

75% wait time on a vCPU in the guest suggests the application(s) on that 
guest are trying to do a lot of I/O and bottlenecking on it.  I am not 
all that well versed on libvirt/KVM, but if there is just the one I/O 
thread for the VM, and it is saturated (perhaps waiting on disc) doing 
I/O for the VM/instance, that could cause other I/O processing like 
network I/O to be held-off, and it could be that either the transmit 
queue of the interface in the guest is filling as it goes to send the 
ICMP Echo Replies (ping replies), and/or the queue for the instance's 
tap device (the inbound traffic) is filling as the ICMP Echo Requests 
are arriving.

I would suggest looking further into the apparent I/O bottleneck.

Drifting a bit, perhaps...

I'm not sure if it would happen automagically, but if the "vhost_net" 
module isn't loaded into the compute node's kernel you might consider 
loading that.  From that point on, newly launched instances/VM on that 
node will start using it for networking and should get a boost.  I 
cannot say though whether that would bypass the VMs I/O thread. Existing 
instances should pick it up if you "nova reboot" them. (I don't think a 
reboot initiated from within the instance/VM would do it).

Whether there is something similar for disc I/O I don't know - I've not 
had to go looking for that yet.

happy benchmarking,

rick jones
http://www.netperf.org/

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Poor instance network performance

2015-12-18 Thread Rick Jones


On 12/17/2015 11:47 AM, Satish Patel wrote:

Thanks for response,

Yes if i ping from remote machine to that VM i am getting timeout at
some interval.

My application is CPU intensive, it is not doing any disk operation,
actually this application is base on VoIP. so pure RTP traffic flowing
on network. small UDP packets.


Well, something is causing top to report a vCPU at 75% wait time.  If it 
isn't storage I/O, then perhaps something else.


You might look at the top output for the compute node hosting that VM 
and see what that reports while all this is going-on.


happy benchmarking,

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Cells: how experimental?

2016-01-29 Thread Rick Jones


On 01/29/2016 01:35 AM, Hinds, Luke (Nokia - GB/Bristol) wrote:

I am also genuinely intrigued about this. Are your test results publically 
available Clint?


And does this simulation happen to include the effect of WAN latency?-)

That the control plane (aggregate?) bandwidth for 1000 simulated nodes 
is "just" 100 Mbit/s is good, but I suspect it is rather "chatty" and as 
Clint somewhat warned, trying to run that across a WAN with non-trivial 
latency may be "interesting."


rick jones



From: EXT Tomas Vondra [von...@czech-itc.cz]
Sent: Friday, January 29, 2016 9:04 AM
To: openstack@lists.openstack.org
Subject: Re: [Openstack] Cells: *how* experimental?

Clint Byrum  writes:


However, if you have some requirement to have everything under that
one region, I can say that even in a 1000 hypervisor simulation I don't
see more than 100Mbit of traffic to the control plane that all of the
nodes share. I'd expect 30 nodes to be quite a bit less traffic.


Hmm, simulation you say? What do you use to simulate an OpenStack?
Tomas




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Cells: how experimental?

2016-01-29 Thread Rick Jones


On 01/29/2016 10:32 AM, Clint Byrum wrote:

Excerpts from Rick Jones's message of 2016-01-29 09:41:05 -0800:

That the control plane (aggregate?) bandwidth for 1000 simulated nodes
is "just" 100 Mbit/s is good, but I suspect it is rather "chatty" and as
Clint somewhat warned, trying to run that across a WAN with non-trivial
latency may be "interesting."



It's not something I'd try lightly. However, we do want to try it over
city-wide WAN links (so, 20 miles or so), which shouldn't add too much
latency, but certainly isn't _free_.


I would think that netem could be your inexpensive friend here.  Either 
in the control nodes themselves, or in a linux box configured to 
route/bridge between them.


rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] floating ip

2016-02-23 Thread Rick Jones


On 02/23/2016 06:22 AM, Yngvi Páll Þorfinnsson wrote:

Hi

How can I choose a particular flotating IP for an instance?

I just released an floating IP from an tenant, but would like to
associate this particular IP , with another instance.

It‘s not in use, when I list all floating IP‘s , it does not appear in
the list:

neutron floatingip-list | grep IP


I'm not familiar with a command way to request a specific floating IP 
(eg 1.2.3.4) out of the pool(s).  Once you delete a given floating IP, 
it goes back into the free pool, at least after a fashion - as part of 
what I recall is an effort to minimize overheads, it won't actually 
come-up for allocation again until the allocations have cycled through 
all the addresses ahead/behind it.  So, if the pool was IPs:


1 2 3 4

And you happened to get "2" but then deleted that floater, "2" will not 
be allocated again until 3, 4 and 1 have been allocated (and perhaps 
freed).  Allocation goes in a big circle.


If you want to move a floating IP from one instance to another, you 
should just disassociate the floating IP, not also delete it.  Then the 
floating IP remains allocated to your project (IIRC we are all supposed 
to start calling them projects not tenants...) and you can then 
associate the IP with a port belonging to another instance.


rick jones



I‘ve been trying some nova and neturon cmd‘s but without success.

Here is an example of error:

root@opst-ctrl1-dev:/# nova floating-ip-associate
c8443770-276e-44ce-b79d-32965db9465b IP

ERROR (NotFound): floating ip not found (HTTP 404) (Request-ID:
req-1ed31433-2cc3-4e36-ad4c-fc992518baae)

nova floating-ip-pool-list

works fine, and I can see 6 differnt floating ip pools.

It would be nice, if someone can post a cmd procedure for this.

Best regards

Yngvi



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] floating ip

2016-02-23 Thread Rick Jones


On 02/23/2016 10:59 AM, Yngvi Páll Þorfinnsson wrote:

Hi
I tried it again with another instance, and another fl.ip
And this time, the fl.ip became available again... so I could reuse it.
I'm not sure why the first one became not available.


How do you mean became available again?  Are you doing 
floatingip-deletes but still using the UUID of the floater?


rick


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Compute downloading corrupted image from Glance

2016-03-29 Thread Rick Jones


On 03/29/2016 10:17 AM, Kaustubh Kelkar wrote:

Every time I tried to download the image on the compute, I get a new
hash value (albeit, a wrong one).


On the compute node, what is the type of NIC and its driver and such?

lscpi -v | grep -A 1 Ethernet

ethtool -i 

And are any of the stateless offloads enabled?

ethtool -k 

Those would include checksum offload, and things built on top of it like 
TSO, GSO, LRO and/or GRO.


If you find that checksum offload is enabled, and you disable it, does 
the corrupt image download problem go away?  If so, you have a problem 
with your NIC and/or its driver getting the offloads wrong and/or 
corrupting the traffic in a place outside the protection of the 
offloaded checksuming.  One of the central assumptions with the likes of 
checksum offload in a NIC is that anything "above" the checksum offload 
in the NIC has some sort of data protection - at least parity, if not 
ECC.  This includes components in the NIC itself, the I/O bus etc etc.


If disabling checksum offload on the compute node doesn't resolve the 
matter, you might consider the same on the controller.


rick jones

(disabling checksum offload will likely also disable the offloads which 
depend upon it.)


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Compute downloading corrupted image from Glance

2016-03-29 Thread Rick Jones

On 03/29/2016 01:01 PM, Kaustubh Kelkar wrote:

-Original Message-
From: Rick Jones [mailto:rick.jon...@hpe.com]
Sent: Tuesday, March 29, 2016 1:43 PM
To: openstack@lists.openstack.org
Subject: Re: [Openstack] Compute downloading corrupted image from Glance

On 03/29/2016 10:17 AM, Kaustubh Kelkar wrote:

Every time I tried to download the image on the compute, I get a new
hash value (albeit, a wrong one).

On the compute node, what is the type of NIC and its driver and such?
[Kaustubh] It is an Intel X710 NIC with i40e driver. The NIC is part of the 
integrated card on a Dell R730.

lscpi -v | grep -A 1 Ethernet
[Kaustubh] (Output redacted to show only the relevant interface)
01:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X710 Adapter 
(rev 01)
 Subsystem: Dell Device 

It wasn't assigned a sub-device ID? (Device ).  I'm not all that 
familiar with Dell kit but that seems a trifle odd.

ethtool -i 
[Kaustubh] root@dchi:/home/kkelkar# ethtool -i em2
driver: i40e
version: 1.4.25
firmware-version: 4.41 0x80001863 16.5.20
bus-info: :01:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
And are any of the stateless offloads enabled?

ethtool -k 
[Kaustubh] root@dchi:/home/kkelkar# ethtool -k em2
Features for em2:

trimmed...

Those would include checksum offload, and things built on top of it
like TSO, GSO, LRO and/or GRO.

If you find that checksum offload is enabled, and you disable it,
does the corrupt image download problem go away?  If so, you have a
problem with your NIC and/or its driver getting the offloads wrong
and/or corrupting the traffic in a place outside the protection of
the offloaded checksuming.  One of the central assumptions with the
likes of checksum offload in a NIC is that anything "above" the
checksum offload in the NIC has some sort of data protection - at
least parity, if not ECC.  This includes components in the NIC
itself, the I/O bus etc etc.

If disabling checksum offload on the compute node doesn't resolve the
matter, you might consider the same on the controller.

[Kaustubh] I ended up disabling checksumming, TSO, GSO and GRO on
both controller and the compute so the ethtool output looks as above.
Now, the problem can only be reproduced intermittently. At times,
compute node still gets a corrupted image.

Ah, that ethtool -i output was after not before - I was initially 
confused because I'd not expected the offloads to be disabled by default.

If the issue is still intermittent I'd *guess* it was timing related. 
You might see if there are any increases in the back checksum stats in 
netstat.

Other bits of straw-grasping would include, but not be limited to:

*) Transferring the image via scp and see if that always works OK
*) Run something like netperf TCP_STREAM or iperf and see if you see 
checksum errors accumulating.
*) Perhaps create a fake image of the same size with a fixed pattern and 
transfer that via glance and see if it ever complains.  If it does, you 
can look to see where the pattern breaks in terms of offset into the 
file and how it breaks.  If it is then reproducible you can then 
consider getting packet traces at either end and looking through those 
to see if it was indeed good or bad at the sender and such.

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Openstack powered Public cloud

2016-04-26 Thread Rick Jones


On 04/25/2016 11:33 PM, Jaison Peter wrote:


I have many concerns about the scaling and right choices , since
openstack is offering lot of choices and flexibility, especially in
networking side.Our major challenge was choosing between simplicity and
performance offered by Linux bridge and features and DVR offered by
OVS.  We decided to go with OVS, though some were suggesting like OVS is
slow in large deployments. But the distributed L3 agents and bandwidth
offered by DVR inclined us towards OVS. Is it a better decision?

But one of the major drawback we are seeing with DVR is the public IP
consumption.If we have 100 clients and 1 VM per client , eventually
there will be 100 tenants and 100 routers. Since its a public cloud, we
have to offer public IP for each VM. In DVR mode, fip name space in
compute will be consuming one public IP and if 100 VMs are running among
20 computes, then total 20 public IPs will be used among computes. And a
router SNAT name space will be created for each tenant router(Total
100)  and each of it will be consuming 1 public  IP and so total 100
public IPs will be consumed by central SNAT name spaces. So total 100 +
20 = 120 public IPs will be used by openstack components and  100 will
be used as floating IPs (1:1 NAT) by VMs. So we need 220 public IPs for
providing dedicated public IPs for 100 VMs !! Anything wrong with our
calculation?


Have you also looked at the namespaces created on the compute nodes and 
the IP addresses they get?



 From our point of  view 120 IPs used by openstack components in our
case (providing 1:1 NAT for every VM) is wastage of IPs and no any role
in network traffic. Centrallized SNAT is useful , if the client is
opting for VPC like in AWS and he is not attaching floating IPs to all
instances in his VPC.

So is there any option while creating DVR router to avoid creating
central SNAT name space in controller node ? So that we can save 100
public IPs in the above scenario.


That certainly would be nice to be able to do.

DVR certainly does help with instance to instance communication scaling 
- not having to go through the CVR network node(s) is a huge win for 
scaling as far as aggregate network performance is concerned.  But if 
instances are not likely to speak much with one another via floating IPs 
- say because instances on different private networks aren't 
communicating with one another and instances on the same private 
networks can speak with one another via their private IPs then that 
scaling doesn't really matter.  Also if those instances will not be 
communicating (much) with other services in your cloud - Swift comes to 
mind.  So too if all the instances will spend most of their networktime 
speaking with the Big Bad Internet (tm) in which case your cloud's 
connection to the Big Bad Internet will be what gates the aggregate.  In 
all those scenarios you may be just as well-off with CVR.


In terms of other things relating to scale, and drifting a bit from DVR, 
if enable_isolated_metadata is set to true (not sure if that is default 
or not) then there will be two metadata proxies launched for each 
network, in addition to the two metadata proxies launched per router, 
and the metadata proxy launched on each compute node on which an 
instance behind a given DVR router resides.  The latter isn't that big a 
deal for scaling, but the former can be.  Each metadata proxy will want 
~40 MB of RSS, so that is 160 MB of RSS spread across your neutron 
network nodes.  Added to that will be another 10 MB of RSS for a pair of 
dnsmasq processes.


Getting back to DVR and bringing OVS along for the ride, one other 
"cost" is that one still must have Linux bridge in the path to implement 
the security group rules.  Depending on the sort of netperf benchmark 
one runs, that "costs" between 5% and 45% performance.  That was 
measured by setting-up the instances, taking the baseline, and then 
manually "rewiring" bypassing the linux bridge.


happy benchmarking,

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Openstack powered Public cloud

2016-04-26 Thread Rick Jones


On 04/26/2016 10:16 AM, Rick Jones wrote:

On 04/25/2016 11:33 PM, Jaison Peter wrote:

But one of the major drawback we are seeing with DVR is the public IP
consumption.If we have 100 clients and 1 VM per client , eventually
there will be 100 tenants and 100 routers. Since its a public cloud, we
have to offer public IP for each VM. In DVR mode, fip name space in
compute will be consuming one public IP and if 100 VMs are running among
20 computes, then total 20 public IPs will be used among computes. And a
router SNAT name space will be created for each tenant router(Total
100)  and each of it will be consuming 1 public  IP and so total 100
public IPs will be consumed by central SNAT name spaces. So total 100 +
20 = 120 public IPs will be used by openstack components and  100 will
be used as floating IPs (1:1 NAT) by VMs. So we need 220 public IPs for
providing dedicated public IPs for 100 VMs !! Anything wrong with our
calculation?


Have you also looked at the namespaces created on the compute nodes and
the IP addresses they get?


Reading comprehension.  I have heard of it...  Re-reading I see that you 
did.  Clearly still to early in the day for me to avoid an Emily Litella 
moment.


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Does compute node require provider network?

2016-05-19 Thread Rick Jones

On 05/18/2016 09:51 PM, Rui Mao wrote:

http://docs.openstack.org/mitaka/install-guide-ubuntu/environment-networking.html#environment-networking

In the guide, the compute node requires a provider network connection,
and the neutron run in controller node.

But per my understanding, all VMs access the internet via NAT, and the
nova node has no internet access requirement in production environment.

Anything I missed or misunderstood?

I took a quick look at that diagram. It may be assuming DVR
(Distributed Virtual Router) is enabled.

"Before" there would be a neutron private (aka Guest) network running
between all the computes and the Neutron network nodes. An instance
(VM) would access the outside world (Internet, whatnot) by having its
traffic go across the Guest VLAN to a controller, the virtual router on
the controller and such would do the NAT, and off the traffic goes on
the external VLAN.

Today that is called "Central(ized?) Virtual Router or CVR.

Since Liberty (or Kilo if the OpenStack provider backported?) there has
also been support for Distributed Virtual Router (DVR). In this mode,
when a floating IP is associated with a port of the instance, the NAT is
handled on the compute node. This allows traffic levels to scale much,
Much, MUCH better by not having to go through the central Neutron
network node(s). (SNAT for ports/instances without floating IPs still
happens in the Neutron network node).

But it does mean the compute node(s) must also have a connection to the
external VLAN just like a controller node.

I assume that if you do not enable DVR, you also do not need the
external provider network to be populated to the compute nodes.

happy benchmarking,

rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] [Openstack-operators] Reaching VXLAN tenant networks from outside (without floating IPs)

2016-06-30 Thread Rick Jones

On 06/30/2016 08:24 AM, Gustavo Randich wrote:

Mike, as far as I know those routers allow only outgoing traffic, i.e.
VM can see external networks, but those external networks cannot connect
to VM if it doesn't have a FIP, am I right?

That is correct. As Turbo mentioned before, that is kind of the point 
behind the isolation.

It may be more effort than you wish to undertake, but for your "other?" 
question, finding some way to make floating IPs less precious would seem 
to be in order.  IPv6 comes to mind but I cannot speak to how ready 
OpenStack/Neutron is for that.

I suppose, if you were to create an instance with port security disabled 
and one of the precious floating IPs, which sat on all the private 
networks you wanted to not actually be private, and was configured as 
the default router for all the instances on those networks (or at least 
the router for the external subnet(s) you wanted to reach them from), 
and was configured in the external network infrastructure as the router 
for all the private network ranges, you might establish connectivity 
that way.

You would, of course, have to have not-really-private network IP address 
ranges which were compatible (didn't overlap) with the external address 
ranges in the rest of your infrastructure.

rick jones

Thanks!
Gustavo

On Wed, Jun 29, 2016 at 7:24 PM, Mike Spreitzer mailto:mspre...@us.ibm.com>> wrote:

Gustavo Randich mailto:gustavo.rand...@gmail.com>> wrote on 06/29/2016 03:17:54 PM:

> Hi operators...
>
> Transitioning from nova-network to Neutron (Mitaka), one of the key
> issues we are facing is how to reach VMs in VXLAN tenant networks
> without using precious floating IPs.
>
> Things that are outside Neutron in our case are:
>
> - in-house made application orchestrator: needs SSH access to
> instances to perform various tasks (start / shutdown apps, configure
> filesystems, etc.)
>
> - various centralized and external monitoring/metrics pollers: need
> SNMP / SSH access to gather status and trends
>
> - internal customers: need SSH access to instance from non-openstack
> VPN service
>
> - ideally, non-VXLAN aware traffic balancer appliances
>
> We have considered these approaches:
>
> - putting some of the external components inside a Network Node:
> inviable because components need access to multiple Neutron deployments
>
> - Neutron's VPNaaS: cannot figure how to configure a client-to-site
> VPN topology
>
> - integrate hardware switches capable of VXLAN VTEP: for us in this
> stage, it is complex and expensive
>
> - other?

You know Neutron includes routers that can route between tenant
networks and external networks, right?  You could use those, if your
tenant networks use disjoint IP subnets.

Regards,
Mike

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] [Openstack-operators] Reaching VXLAN tenant networks from outside (without floating IPs)

2016-06-30 Thread Rick Jones


On 06/30/2016 10:32 AM, Mike Spreitzer wrote:

No, those routers are routers.  If one of them gets a packet, the router
will forward the packet as usual for a router.

>

You might think they don't handle connections into tenant networks, but
that might be because nothing is trying to use them as routers for the
tenant networks.  That's a question about the routing tables in the rest
of your environment.

If the client has a route to a Neutron tenant network that goes through
a Neutron router, the client is able to connect to a server on the
Neutron tenant network.

The normal configuration for routers on the internet is to not forward
traffic to the RFC 1918 addresses.  I do not recall how the Neutron
routers handle packets addressed to those addresses from sources on the
"outside".


For what it is worth, a quick test with some Mitaka-based bits, using 
192.168.123.0/24 as the private network and ping suggests the neutron 
routers will be willing to forward the traffic just fine.


That would be better than trying to do the same thing with instances as 
I proposed before.


happy benchmarking,

rick jones





___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] [Openstack-operators] Reaching VXLAN tenant networks from outside (without floating IPs)

2016-06-30 Thread Rick Jones


On 06/30/2016 01:05 PM, Turbo Fredriksson wrote:

On Jun 30, 2016, at 7:04 PM, Rick Jones wrote:

For what it is worth, a quick test with some Mitaka-based bits,
using 192.168.123.0/24 as the private network and ping suggests the
neutron routers will be willing to forward the traffic just fine.


Is there anything specific you did to allow this? Because I
accidental "tested" this myself yesterday.


I created a network/subnet/router tuple in a DVR setup (slight chance I 
added --distributed false to the router-create - I've reinstalled the 
setup at this point so cannot check), noted the public IP of the router, 
and the private IP of the instance, then on one of my controllers which 
was connected to the external VM VLAN on which the router is I added a 
host route for the instance's private IP, pointing at the public IP of 
the router and started pinging.  The neutron private network was VxLAN, 
and in my case carried on a separate VLAN from the External VM VLAN.  In 
my case, the instances private IP was a 192.168.123.X, and the router's 
public IP was a 10.249.mutter.


I didn't try anything from farther afield because I don't have control 
of those bits in my particular test environment.


happy benchmarking,

rick jones




I have my external/physical network (192.168.69.0/24) with the
GW/FW/NAT (192.168.69.1) that also do DHCP for that network.

In my tenant network (10.0.0.0/16), when I created a VM, I
choose this network as the primary/first network and then
the tenant network as second.

So when the VM booted, It got a 192.168.69.0/24 address!

However, I could not reach it. And it could not reach anything
else either.




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] UDP issues?

2016-07-18 Thread Rick Jones


On 07/18/2016 02:43 PM, Ken D'Ambrosio wrote:

Hey, all.  We're trying to track down some UDP fragmentation issues, and
I'm trying to fully grasp exactly what goes on.  The tool I'm using is
"iperf."  My first confusion is that when I point iperf (client) to a
host behind a floating IP, that simply doesn't work.  Any ideas what the
issue is, and how to get around it?


What exactly do you mean by "simply doesn't work?"

Have you opened-up the security group rules to allow the port(s) that 
iperf will want to use through to the instance from the outside world?




Next up is this that when I have to VMs talk to each other -- on the
same subnet, using "iperf -c 172.23.244.169 -u -b 100m" -- I wind up
with this:
[  3] Sent 85471 datagrams
[  3] WARNING: did not receive ack of last datagram after 10 tries.

When I go from physical machine to physical machine, it works great,
even though a few datagrams are received out-of-order.  But a flat-out
missing packet does sound a bit like the issue I'm having.

---

Additionally, I'd really like a tool that would allow me to set packet
size for UDP tests; I've poked around, but haven't really found
anything.  If anyone has a suggestion, I'm all ears.


Much as I like to promote netperf :)  I believe there must be a way to 
set the UDP datagram size in iperf.  Perhaps its "-l" option.


When I get ready to run netperf tests, I tend to operate in environments 
where I can enjoy the luxury of:


neutron security-group-rule-create --protocol icmp default
neutron security-group-rule-create --protocol tcp --port-range-min 1 
--port-range-max 65535 default
neutron security-group-rule-create --protocol udp --port-range-min 1 
--port-range-max 65535 default


And open all the ports from anywhere.   Netperf doesn't have an explicit 
equivalent to iperf's bandwidth setting.  Ignoring that for the moment, 
the not-equivalent netperf command, with message size setting would be:


netperf -H 172.23.244.169 -t UDP_STREAM -- -m 1234

where the "test-specific" (after the "--" on the command line) -m option 
sets the size of the buffer pass-in on the "send" calls (bytes).   In a 
UDP_STREAM test that will control the number of payload bytes in the UDP 
datagrams being sent.


Netperf does have ways to limit bandwidth, but it is based on specifying 
a burst size (number of sends of whatever the size) and an inter-burst 
interval.


At the very least, netperf will need two ports open in the security 
group rules.  Port 12865 for the control connection, and then a port for 
the data connection.  Normally that is left to the stack to decide, but 
you can specify it explicitly with another test-specific option:


netperf -H 172.23.244.169 -t UDP_STREAM -- -m 1234 -P ,12867

will cause the remote netserver to bind its data socket to port 12867. 
Omit the comma and both netperf and netserver will bind their data 
sockets to that value.


happy benchmarking,

rick jones



Thanks!

-Ken

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] NTP client in guest not reaching NTP server in host (Neutron/DVR)

2016-07-21 Thread Rick Jones


On 07/21/2016 09:45 AM, Gustavo Randich wrote:

Hi,

With nova-network, we were running NTP in guests against an NTP server
in the host (following this
<http://www.linux-kvm.org/page/FAQ#I.27m_experiencing_timer_drift_issues_in_my_VM_guests.2C_what_to_do.3F>
 advice).

Now tih Neutron/DVR, we are not reaching NTP in the host because it is
listening on interfaces of the root namespace.

Some insight on this?


I rather doubt that Neutron/DVR has things setup to enable accessing the 
host directly from the VMs.  That isn't often considered a good thing to 
do, per my understanding.  You could try packet tracing the external VM 
network to see that the NTP requests from the guests are going-out that way.


At that point, presumably it would be a matter of having the 
infrastructure router be willing to route back to the management network.


Otherwise, having the guests get time from some different "external" 
sources would seem to be in order.


rick jones



Thanks!
Gustavo



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] why baremetal did not send dhcp request in tenant network

2016-10-18 Thread Rick Jones


On 10/18/2016 11:54 AM, Travis Vu wrote:

I used Openstack Ironic to provision baremetal.  baremetal booted up in
provisioning network and loaded user OS.  When baremetal rebooted into
tenant network, it did not send dhcp request.  I cannot ping the tenant
IP address Ironic assigned to baremetal.
Why did baremetal not send dhcp request when booting up in tenant network?
How do get baremetal to dhcp for an IP address in tenant network?

Baremetal booted up without IP address and the keyboard does not work.


Perhaps a silly question, but the user OS booted to the baremetal server 
was setup to do DHCP yes?


rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Urgent! MTU size issue

2016-11-09 Thread Rick Jones


On 11/09/2016 12:56 PM, Satish Patel wrote:

If i enable Jumbo frames on openstack in that case do i need to change
or enable anything on my external/physical network infrastructure?


Probably.

In the most succinct terms, all nodes in the same "broadcast domain" 
(anywhere reachable via layer 2 - aka ethernet - and switching rather 
than layer 3 and routing) must have the same physical MTU.


If you have end-nodes with JumboFrames enabled, the physical 
infrastructure joining them at layer2 must also have JumboFrames enabled.


rick jones



On Wed, Nov 9, 2016 at 3:13 PM, Mohammed Naser  wrote:

Are you using any tunnelling or are you directly attaching to your physical 
network?

If you're using tunnelling, I would recommend running jumbo frames on your 
compute and network nodes so they can transport packets with size of 1500. 
There is plenty of docs about this online that you can check.

Sent from my iPhone


On Nov 9, 2016, at 11:06 AM, Satish Patel  wrote:

We have 3 node cluster on Mikata openstack and we are using DVR
network. Recently we build two puppetmaster server on openstack and
put them behind Lbaasv2 load-balancer to share load but i found my
none of client able to talk to correctly with puppetmaster server on
openstack. After lots of research found openstack VM use mtu 1400 and
my rest of puppet agent server who are not on openstack they used mtu
1500.

as soon as i change puppet agent MTU size to 1400 everything started
working. But just surprised why openstack use 1400 for VM there must
be a reason like vxlan or GRE.

So for experiment i change mtu 1500 on puppetmaster server on
openstack but it didn't help. How do i fix this issue?

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack




___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Urgent! MTU size issue

2016-11-09 Thread Rick Jones


On 11/09/2016 08:06 AM, Satish Patel wrote:

We have 3 node cluster on Mikata openstack and we are using DVR
network. Recently we build two puppetmaster server on openstack and
put them behind Lbaasv2 load-balancer to share load but i found my
none of client able to talk to correctly with puppetmaster server on
openstack. After lots of research found openstack VM use mtu 1400 and
my rest of puppet agent server who are not on openstack they used mtu
1500.

as soon as i change puppet agent MTU size to 1400 everything started
working. But just surprised why openstack use 1400 for VM there must
be a reason like vxlan or GRE.

So for experiment i change mtu 1500 on puppetmaster server on
openstack but it didn't help. How do i fix this issue?


What happens if LBaaS isn't between the client and server?

Does Puppet do anything with UDP or is it all TCP?

rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Urgent! MTU size issue

2016-11-09 Thread Rick Jones


On 11/09/2016 02:25 PM, Satish Patel wrote:

if i send request direct to server1 and server2 everything works!! but
if i point my client to LBaaS VIP it doesn't work.

LBaaS running on Network Node and server1 and server2 running on compute node.

So LbaaS traffic coming from Network node to compute node over VxLAN
(overlay) which has 1450 MTU and that is causing issue. If i change
that MTU to 1500 it works.


So, the question is what is wrong with LBaaS.  I'm not familiar with the 
workings of Octavia but in general terms, I would expect the following 
to happen at the TCP level when there is a load balancer in the middle 
between two end nodes:


1) Client sends TCP connection request (aka TCP SYNchronize segment) to 
the Load Balancer.  That TCP SYN contains a Maximum Segment Size (MSS) 
option based on the client's MTU.
2) The LB sends a TCP connection response - a TCP SYN|ACK to the client. 
 It will have a TCP MSS option based on the LBaaS's egress interface.
3) Between the LBaaS and the client, the smaller of the two MSS options 
will be used.
Meanwhile, I expect the same thing to happen between the LB and the 
back-end server.


In theory then, there could be two MSSes at work here - one on the TCP 
connection between the LBaaS and the client, and one between the LBaaS 
and the server.  And unless there is something amiss with the LBaaS, it 
should be happiness and joy.  Certainly that should be the case if the 
LBaaS was a process with two sockets, moving data between the sockets.


I am speculating wildly, but if the "external" connection had a larger 
MSS than the internal connection, and the LBaaS code somehow tried to 
move a packet "directly" from one connection to another, then that could 
cause problems.


If you can, you might try following the packets.

rick jones



On Wed, Nov 9, 2016 at 4:58 PM, Rick Jones  wrote:

On 11/09/2016 08:06 AM, Satish Patel wrote:


We have 3 node cluster on Mikata openstack and we are using DVR
network. Recently we build two puppetmaster server on openstack and
put them behind Lbaasv2 load-balancer to share load but i found my
none of client able to talk to correctly with puppetmaster server on
openstack. After lots of research found openstack VM use mtu 1400 and
my rest of puppet agent server who are not on openstack they used mtu
1500.

as soon as i change puppet agent MTU size to 1400 everything started
working. But just surprised why openstack use 1400 for VM there must
be a reason like vxlan or GRE.

So for experiment i change mtu 1500 on puppetmaster server on
openstack but it didn't help. How do i fix this issue?



What happens if LBaaS isn't between the client and server?

Does Puppet do anything with UDP or is it all TCP?

rick jones



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack



___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] changing MTU setting not working

2016-11-10 Thread Rick Jones


On 11/10/2016 09:23 AM, Satish Patel wrote:

figure out..  just need to restart neutron-server  and you have to
delete existing network and re-create new network to get those change.

How to change MTU on existing network-subnet ?


You would likely have to find all the virtual interfaces for it and 
alter them manually on the compute/network nodes.


The HPE Helion docs on changing MTU, while describing some things 
specific to HPE Helion, does include the caveat about existing networks 
not being altered:


https://docs.hpcloud.com/hos-3.x/#helion/networking/configure_mtu.html

rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] vxlan issue on compute node

2016-11-28 Thread Rick Jones

In addition to the suggestions from others, you might verify that this 
problematic node is correctly connected to the guest VLAN - including 
the configuration of the switch port(s) to which the node is connected.


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Download file from swift extremely slow

2017-01-04 Thread Rick Jones


On 01/04/2017 12:59 AM, don...@ahope.com.cn wrote:

Hi experts

I finished the swift installation following the install
guide(http://docs.openstack.org/project-install-guide/object-storage/draft/get_started.html),
file upload  is very fast, but file download is extremely slow, why ?

[root@controller admin]#* time openstack object create container1 
cirros-0.3.4-x86_64-disk.img*
+--++--+
| object   | container  | etag |
+--++--+
| cirros-0.3.4-x86_64-disk.img | container1 | ee1eca47dc88f4879d8a229cc70a07c6 |
+--++--+

real0m3.807s
user0m2.127s
sys 0m0.161s


[root@controller /]#* time  openstack object save container1 
cirros-0.3.4-x86_64-disk.img*
real5m51.489s
user5m48.172s
sys 0m2.094s


Are you able to run something like netperf or iperf between your client 
and the swift proxy?  For example:


netperf -H # get a feel for "to swift" basic network perf
netperf -H  -t TCP_MAERTS # get a feel for "from swift"

The idea there is to measure the network separate from the storage and 
swift processing, and go from there.


If there isn't much else happening on your setup at the time, you could 
also look at some snapshots of netstat -s on the proxy when you are 
downloading the object - look to see if there are many TCP 
retransmissions.  You can get something similar "directly" for the 
netperf tests with:


netperf -H  -- -o 
throughput,local_transport_retrans,remote_transport_retrans


netperf -H  -t TCP_MAERTS -- -o 
throughput,local_transport_retrans,remote_transport_retrans


rick jones

___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] Download file from swift extremely slow

2017-01-06 Thread Rick Jones


On 01/05/2017 11:52 PM, don...@ahope.com.cn wrote:

*Also, i can put/get files via dashboard/swift-CLI very quickly.*
*So it is strange why 'openstack object save' so slowly...*


Well, two additional ways to compare might be to run both the openstack 
and swift CLI versions under an strace and compare the system calls they 
are making:


strace -v -f -ttt -o  cli command ...

where -v tells strace to be verbose, the -ttt tells it to timestamp each 
system call with seconds.microseconds since the epoch, the -f tells it 
to follow forks and the -o option gives a filename into which the trace 
should go.


The second comparison would be to take a packet trace for each using 
tcpdump.  So, on the client something like:


tcpdump -s 96 -w .pcap -i  "port  and 
host "


The -s says to capture no more than 96 bytes per packet, the -w puts the 
capture to the named file, the -i selects the network interface on the 
client, and then the last bit is a filter expression to select only 
those packets which are swift and to/from the proxy.


That file can be post-processed in a number of ways, one of which is:

tcpdump -r .pcap -n -ttt > .cooked

where -r selects the file from which to read captured packets, the -n 
says to disable looking up hostnames for IP addresses, and the -ttt says 
to print the time delta for each packet compared to the one before.  I 
happen to follow a convention of calling the resulting output a 
".cooked" file - as in it is a cooked version of a raw (binary) capture.


In both cases, you would be looking for large gaps in time - 
particularly the openstack cli traces.


happy benchmarking,

rick jones


___
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

54 matches

Mail list logo