Hello!
On Wed, Jan 23, 2013 at 07:01:52PM +0400, Andrey Vagin wrote:
> -#define tcp_time_stamp ((__u32)(jiffies))
> +#define tcp_time_stamp(tp) ((__u32)(jiffies) + tp->tsoffset)
This implies that you always have some tp in hands. AFAIK this is not true,
so that I am puzzled how you we
Hello!
> If you overflow the socket's memory bound, it ends up calling
> tcp_clamp_window(). (I'm not sure this is really the right thing to do
> here before trying to collapse the queue.)
Collapsing is too expensive procedure, it is rather an emergency measure.
So, tcp collapses queue, when i
Hello!
> I experienced the very same problem but with window size going all the
> way down to just a few bytes (14 bytes). dump files available upon
> requests :)
I do request.
TCP is not allowed to reduce window to a value less than 2*MSS no matter
how hard network device or peer try to confu
Hello!
> I wonder if clamping the window though is too harsh. Maybe just
> setting the rcv_ssthresh down is better?
It is too harsh. This was invented before we learned how to collapse
received data, that time tiny segments were fatal and clamping was
the last weapon against misbehaving connec
Hello!
> This is where things start going bad. The window starts shrinking from
> 15340 all the way down to 2355 over the course of 0.3 seconds. Notice the
> many duplicate acks that serve no purpose
These are not duplicate, TCP_NODELAY sender just starts flooding
tiny segments, and those are n
Hello!
> Well, take a look at the double acks for 84439343, 84440447 and 84441059,
> they seem pretty much identical to me.
It is just a little tcpdump glitch.
19:34:54.532271 < 10.2.20.246.33060 > 65.171.224.182.8700: . 44:44(0) ack
84439343 win 24544 (DF) (ttl 64, id
60946)
19:34:54.532432
Hello!
> If this packet came in from an 802.1Q VLAN device, the VLAN code already
> has the logic necessary to map the .1q priority to an arbitrary
> skb->priority.
Actually, the patch makes sense when it is straight ethernet bridge
not involving full parsing of VLAN. I guess the case when the f
Hello!
> Send a correct arp reply instead of one with sender ip and sender
> hardware adress in target fields.
I do not see anything more legal in setting target address to 0.
Actually, semantics of target address in ARP reply is ambiguous.
If it is a reply to some real request, it is set to ad
Hello!
> Is there a reason that the target hardware address isn't the target
> hardware address?
It is bound only to the fact that linux uses protocol address
of the machine, which responds. It would be highly confusing
(more than confusing :-)), if we used our protocol address and hardware
addre
Hello!
> This file is so outdated that I can't see any value in keeping it.
Absolutely agree.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please r
Hello!
> Getting an error there is all the more reason to proceed
> with the swapoff, not to give up and break out of it.
Yes, from this viewpoint more reasonable approach would be to untie
corresponding ptes from swap entry and mark them as invalid to trigger
fault on access.
Not even tried sim
Hello!
> Hmm, what means not expected ? -ESRCH is returned, when the owner task
> is not found.
This is not supposed to happen with robust futexes.
glibs aborts (which is correct), or for build with disabled debugging
enters simulated deadlock (which is confusing).
> lock. Also using uval is
Hello!
> We actually need to do something about this, as we might loop for ever
> there. The robust cleanup code can fail (e.g. due to list corruption)
> and we would see exit_state != 0 and the OWNER_DIED bit would never be
> set, so we are stuck in a busy loop.
Yes...
It is possible to take re
Hello!
> the context-switch argument i'll believe if i see numbers. You'll
> probably need in excess of tens of thousands of irqs/sec to even be able
> to measure its overhead. (workqueues are driven by nice kernel threads
> so there's no TLB overhead, etc.)
It was authors of the patch who wer
Hello!
> I find the 4usecs cost on a P4 interesting and a bit too high - how did
> you measure it?
Simple and stupid:
int flag;
static void do_test(unsigned long dummy)
{
flag = 1;
}
static void do_test_wq(void *dummy)
{
flag = 1;
}
static void measure_tasklet0(void)
{
Hello!
> I felt that three calls to tasklet_disable were better than a gazillion calls
> to
> spin_(un)lock.
It is not better.
Actually, it also has something equivalent to spinlock inside.
It raises some flag and waits for completion of already running
tasklets (cf. spin_lock_bh). And if taskl
Hello!
> > The difference between softirqs and hardirqs lays not in their
> > "heavyness". It is in reentrancy protection, which has to be done with
> > local_irq_disable(), unless networking is not isolated from hardirqs.
>
> i know that pretty well ;)
You forgot about this again in the next
Hello!
> Not a very accurate measurement (jiffies that is).
Believe me or not, but the measurement has nanosecond precision.
> Since the work queue *is* a thread, you are running a busy loop here. Even
> though you call schedule, this thread still may have quota available, and
> will not yeild
Hello!
> again, there is no reason why this couldnt be done in a hardirq context.
> If a hardirq preempts another hardirq and the first hardirq already
> processes the 'softnet work', you dont do it from the second one but
> queue it with the first one. (into the already existing
> sd->complet
Hello!
> If I understand correctly, this is because tasklet_head.list is protected
> by local_irq_save(), and t could be scheduled on another CPU, so we just
> can't steal it, yes?
Yes. All that code is written to avoid synchronization as much as possible.
> If we use worqueues, we can change t
Hello!
> What changed?
softirq remains raised for such tasklet. Old times softirq was processed
once per invocation, in schedule and on syscall exit and this was relatively
harmless. Since softirqs are very weakly moderated, it results in strong
cpu hogging.
> And can it be fixed?
With curre
Hello!
> Also, create_workqueue() is very costly. The last 2 lines should be
> reverted.
Indeed.
The result improves from 3988 nanoseconds to 3975. :-)
Actually, the difference is within statistical variance,
which is about 20 ns.
Alexey
-
To unsubscribe from this list: send the line "unsubscri
Hello!
> #2 crash be explained via any of the bugs you fixed? (i.e. memory
> corruption?)
Yes, I found the reason, it is really fixed by taking tasklist_lock.
This happens after task struct with not cleared pi_state_list is freed
and the list of futex_pi_state's is corrupted.
Meanwhile... two m
Hello!
1. New entries can be added to tsk->pi_state_list after task completed
exit_pi_state_list(). The result is memory leakage and deadlocks.
2. handle_mm_fault() is called under spinlock. The result is obvious.
3. State machine is broken. Kernel thinks it owns futex after
it released al
Hello!
> Hmm. Something I don't understand: does the code
> in question not run on *each* device unregister?
It does.
> Why do I only see this under stress?
You should have some referenced destination entries to trigger bad path.
This should happen not only under stress.
F.e. just try to ssh
Hello!
> This is not new code, and should have triggered long time ago,
> so I am not sure how come we are triggering this only now,
> but somehow this did not lead to crashes in 2.6.20
I see. I guess this was plain luck.
> Why is neighbour->dev changed here?
It holds reference to device and p
Hello!
> > It should be cleared and we should be sure it will not be destroyed
> > before quiescent state.
>
> I'm confused. didn't you say dst_ifdown is called after quiescent state?
Quiescent state should happen after dst->neighbour is invalidated.
And this implies that all the users of dst->n
Hello!
> Well I don't think the loopback device is currently but as soon
> as we get network namespace support we will have multiple loopback
> devices and they will get unregistered when we remove the network
> namespace.
There is no logical difference. At the moment when namespace is gone
there
Hello!
> Does this look sane (untested)?
It does not, unfortunately.
Instead of regular crash in infiniband you will get numerous
random NULL pointer dereferences both due to dst->neighbour
and due to dst->dev.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Hello!
> I think the thing to do is to just leave the loopback references
> in place, try to unregister the per-namespace loopback device,
> and that will safely wait for all the references to go away.
Yes, it is exactly how it works in openvz. All the sockets are killed,
queues are cleared, nobo
Hello!
> If a device driver sets neigh_destructor in neigh_params, this could
> get called after the device has been unregistered and the driver module
> removed.
It is the same problem: if dst->neighbour holds neighbour, it should
not hold device. parms->dev is not supposed to be used after
neig
Hello!
> infiniband sets parm->neigh_destructor, and I search for a way to prevent
> this destructor from being called after the module has been unloaded.
> Ideas?
It must be called in any case to update/release internal ipoib structures.
The idea is to move call of parm->neigh_destructor from n
Hello!
> This might work. Could you post a patch to better show what you mean to do?
Here it is.
->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(),
which is called when neighbor entry goes to dead state. At this point
everything is still valid: neigh->dev, neigh->parms e
33 matches
Mail list logo