Re: Reserving a (large) memory block

2000-08-31 Thread coder

On Thu, 31 Aug 2000 10:54:56 +0100 (BST) 
Alan Cox <[EMAIL PROTECTED]> wrote:

>> I'm working on a device driver for a device that sits on the PC
>> memory bus.  I need to reserve/protect the memory range that the
>> device occupies from the rest of the kernel/system.  How do I do
>> that?  I think I see how I can mark blocks that are never to be
>> touched, but in this case the driver (obviously) needs to be able
>> to touch them, but the rest of the kernel must be hands-off.

> If its in the ISA or PCI space then we wont touch it. If its
> actually mapped as if it was part of RAM then the BIOS is
> responsible for reporting top of memory below it in the memory
> sizing calls, and optionally reporting it reserved as hole in the
> newer E820 call.

> We then just follow the bios. You can also reserve blocks of
> memory by hacking arch/i386/mm/init.c and marking them reserved

Ahh, that there is the problem.  To explain I'd better start with
basically what the device is:

  The device is a 168 pin SDRAM card which has a bunch of RAM on it
and also has a fat FPGA in the middle.  The FPGA can be programmed
to do various interesting things like the mod exponentiation for
SSL, image processing (think photoshop plugins), etc.  You can have
multiple firmware cores on the card(s) that are context shifted
among at runtime by the driver etc etc.  API semantics for the
driver are the predictable open, read, write, close, trigger (tell
card to start processing written data), and wait (wait for driver to
tell its done).

  Now the device behaves just like memory to the BIOS during POST
etc, and is in fact, exactly memory if no device drivers are loaded.
If a device driver is loaded and it detects one or more of these
devices then they and their memory ranges become obviously special.
Now, we can detect the devices and where their address ranges are
via the SMBUS and some careful probing so we know what we are trying
to grab.  The problem is just telling the rest of the kernel that in
a clean VM&heap-happy manner.

-- 
J C Lawrence Home: [EMAIL PROTECTED]
-(*)   Other: [EMAIL PROTECTED]
http://www.kanga.nu/~claw/Keys etc: finger [EMAIL PROTECTED]
--=| A man is as sane as he is dangerous to his environment |=--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Reserving a (large) memory block

2000-08-31 Thread coder


On Thu, 31 Aug 2000 11:32:21 -0500 
Timur Tabi <[EMAIL PROTECTED]> wrote:

> ** Reply to message from [EMAIL PROTECTED] on Thu, 31 Aug 2000
> 08:57:20 -0700
>> Now the device behaves just like memory to the BIOS during POST
>> etc, and is in fact, exactly memory if no device drivers are
>> loaded.  If a device driver is loaded and it detects one or more
>> of these devices then they and their memory ranges become
>> obviously special.  Now, we can detect the devices and where
>> their address ranges are via the SMBUS and some careful probing
>> so we know what we are trying to grab.  The problem is just
>> telling the rest of the kernel that in a clean VM&heap-happy
>> manner.

> How do you know what SMBUS address to use?  Probing the SMBUS
> looking for devices has a tendency to lock the SMBUS and require a
> complete power down to restore.

I didn't write the SMBUS code (the author is not on this list),
however, the basic SMBUS work was based off here:

  http://www.netroedge.com/%7elm78/

At this point we've only tried/tested/looked_at GX chipset based SMP
MBs.  Loosely we hit the chipset and get the specs for every
individual memory card, and look for something that looks like one
of our devices as per the SPDs (we have a mini in-driver
eeprom-style client setup to get the SPDs).  The cards are then
identified as per a signature in the SPD.

-- 
J C Lawrence Home: [EMAIL PROTECTED]
-(*)Work: [EMAIL PROTECTED]
http://www.kanga.nu/~claw/Keys etc: finger [EMAIL PROTECTED]
--=| A man is as sane as he is dangerous to his environment |=--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Reserving a (large) memory block

2000-09-01 Thread coder

On Thu, 31 Aug 2000 17:12:03 +0100 (BST) 
Alan Cox <[EMAIL PROTECTED]> wrote:

>> Now the device behaves just like memory to the BIOS during POST
>> etc, and is in fact, exactly memory if no device drivers are
>> loaded.  If a device driver is loaded and it detects one or more
>> of these devices then they and their memory ranges become
>> obviously special.  Now, we can detect the devices and where
>> their address ranges are via the SMBUS and some careful probing
>> so we know what we are trying to grab.  The problem is just
>> telling the rest of the kernel that in a clean VM&heap-happy
>> manner.

> Basically your base driver will have to do it at boot up
> time. Once that memory is allocated to someone you may not be able
> to move the memory and borrow the pages.

Aye, that's what we're doing ATM.  We find the board and its size,
and then go and edit mem_map and mark it reserved and uncacheable.
It doesn't seem the most graceful approach and I was hoping for
something cleaner (tho that looks like 2.4 per Ingo's comments which
I still have to look into).

> You don't neccessarily need the whole driver in the main kernel
> but you will need to grab the devices, reserve the memory pages in
> question and mark them as reserved before Linux gets going
> properly. Your actual users of these pages can then be dynamically
> loaded.

Yeah, that's what I'm looking at right now: how early I have to get
in to be safe.

-- 
J C Lawrence Home: [EMAIL PROTECTED]
-(*)Work: [EMAIL PROTECTED]
http://www.kanga.nu/~claw/Keys etc: finger [EMAIL PROTECTED]
--=| A man is as sane as he is dangerous to his environment |=--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Reserving a (large) memory block

2000-09-05 Thread coder

On Thu, 31 Aug 2000 14:09:48 +0200 (CEST) 
Ingo Molnar <[EMAIL PROTECTED]> wrote:

> On Thu, 31 Aug 2000, Alan Cox wrote:

>> We then just follow the bios. You can also reserve blocks of
>> memory by hacking arch/i386/mm/init.c and marking them reserved

> in 2.4 there is an explicit interface for this that also
> guarantees that the allocation consists of fully valid RAM (no
> matter how complex the RAM map): alloc_bootmem(). We allocate
> 300MB+ worth of mem_map[] with this on multi-gigabyte boxes.

I don't see that alloc_bootmem() and friends do what I want under
2.4 in that they don't allow me to require that the
allocation/reservation occur from an explicit physical address
(there's no promise in the "goal" handling of
__alloc_bootmem_core()).  It seems that the *bootmem() calls are
intended to provide a lightweight kernel heap with alloc/free
semantics rather than an interface to explicit physical memory
reservation -- no? (am I misunderstanding how bdata is handled by
__alloc_bootmem_core()?)

  Am I correct that at init time I don't have access to the (root)
filesystem?  This makes total sense to me, but I'm not familiar
enough with the boot path to know for sure.

The problem I'm trying to solve:

  I have a device that sits on the memory bus.  It looks like RAM
until a (module) device driver gets at it.  At that point I want it
to be reserved memory (private to driver).  Now I can do this in
init if I know the location of the device in memory and its size.
The problem is that to detect the device(s) and their size I use the
i2c and smbus modules.  Ergo, to reserve the physical memory I need
a kernel which is pretty well fully booted (ie the heap etc is
already built) so I can load those modules and find the devices,
which means that grabbing and reserving bits of physical memory is
unsafe (because the heap etc is already built).  However, if I had
access to the filesystem at init time, I could go read a file that
told me where the device(s) are and how big they are, do the
reservations, and then have the module double check the reservations
against the reality of what's installed.  

  Problem is: I don't (think I) have filesystem access at init time,
and can't safely reserve specific physical memory after init which
seems to leave my only option being to pass in the reservation specs
from the bootloader, which is what rather I'm trying to avoid.

  Or am I missing something?

-- 
J C Lawrence Home: [EMAIL PROTECTED]
-(*)Work: [EMAIL PROTECTED]
http://www.kanga.nu/~claw/Keys etc: finger [EMAIL PROTECTED]
--=| A man is as sane as he is dangerous to his environment |=--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Reserving a (large) memory block

2000-09-06 Thread coder

On 06 Sep 2000 13:54:49 +0800 
Ryan Cumming <[EMAIL PROTECTED]> wrote:

>> Problem is: I don't (think I) have filesystem access at init
>> time, and can't safely reserve specific physical memory after
>> init which seems to leave my only option being to pass in the
>> reservation specs from the bootloader, which is what rather I'm
>> trying to avoid.  

> Possibly... would it be pratical to have part of the module kernel
> resident, and the pull its memory location/size off the kernel
> command line? 

Is that possible?  How?  Modules seem to come a long ways after
init (certainly after the memory stuff is built).

-- 
J C Lawrence Home: [EMAIL PROTECTED]
-(*)Work: [EMAIL PROTECTED]
http://www.kanga.nu/~claw/Keys etc: finger [EMAIL PROTECTED]
--=| A man is as sane as he is dangerous to his environment |=--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Question: Using floating point in the kernel

2000-09-19 Thread Lyle Coder

Hello,
You cannot use MMX registers in the kernel either, since the kernel doesen't 
save and restore FX state (fxsave, fxrstor) either (just like 
(fsave/frstor).

Best Wishes,
Lyle

** Reply to message from "Richard B. Johnson" <[EMAIL PROTECTED]> on 
Tue,
19 Sep 2000 11:58:34 -0400 (EDT)


>Tell the driver maintainer that you found a BUG. There is no floating-
>point allowed in the kernel because the state of the FP Unit is
>undefined in the kernel. If you 'define' it, i.e., `finit` then you
>will mess up somebody who was using the FP Unit in user-mode.
>
>Also, the '386 FP emulation, which is still supported, can produce a
>double-fault if you try to use it (at some places) in kernel-mode
>code.
>
>Basically, there is nothing in the kernel that will ever require
>floating point. Use fixed point if you need 'decimals' and stuff for
>printing.

What about MMX?  It uses floating point registers, but it's not technically
floating point.



--
Timur Tabi - [EMAIL PROTECTED]
Interactive Silicon - http://www.interactivesi.com

When replying to a mailing-list message, please don't cc: me, because then 
I'll just get two copies of the same message.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at 
http://profiles.msn.com.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Question: Using floating point in the kernel

2000-09-20 Thread Lyle Coder

Hi
The real issue is that if you use MMX or FP state, the kernel _must_ save
and restore the original state other wise user programs will see corruption.
We all know this too well since redhat's 6.1 (I think) kernel had this
optimized MMX functions that _screwed_ up user programs.  The fact is... it
is tricky to save and restore state (device not available and all).
The basic kernel itself does not provide support for kernel code to use
these registers.  If some device drivers or some modifications to the kernel
are using it.. then I hope they have the save/restore path right

- Lyle

- Original Message -
From: "Ricky Beam" <[EMAIL PROTECTED]>
To: "Lyle Coder" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Wednesday, September 20, 2000 9:13 PM
Subject: Re: Question: Using floating point in the kernel


> On Wed, 20 Sep 2000, Lyle Coder wrote:
> >You cannot use MMX registers in the kernel either, since the kernel
doesen't
> >save and restore FX state (fxsave, fxrstor) either (just like
> >(fsave/frstor).
>
> You might want to tell the software RAID maintainers that... RAID5 CRC
> calculations can be done with MMX. (I'm sure they save and restore the
> FPU state, however.  Yes, the save/restore cycle is _damned_ expensive.)
>
> --Ricky
>
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Clear interrupts on a SMP machine?

2000-10-18 Thread Lyle Coder

Hello,
I am still not sure why you cannot use an IPI for this... on the CPU that 
you want to access this resource, send an IPI to all other CPUs, and add 
code in handling that IPI that they should spin and wait till you are done 
with accessing the chip... then let the other CPUs continue.

Best Wishes,
Lyle


On Wed, 18 Oct 2000, Alan Cox wrote:

> > spin_lock_irqsave(&local_lock, flags);
> > Muck_With_The_RTC_Chip();
> > spin_unlock_irqrestore(&local_lock, flags);
> > > This protects only the local procedure. In the meantime, somebody
> > else, using another CPU is mucking with the same RTC Chip. The
>
>You need to put the spinlock in for every other use of the chip
>

I can't control somebody else's use of `hwclock` or even some future
kernel module.

> > the data register. The "somebody else" is a realtime-clock ISR.
>
>Thata fine. You can compute worst case accesses for your Muck_with_..
>
>Alan
>

Hmmm. Care to explain? What I am expected to do is to make a log
entry showing the time at which a system crashed (if it crashed).
The log entry must be time-stamped within one second of the crash
time. This is so the customer can correlate this event with some
other events such as a power failure, etc.

This seems 'impossible' but it's not. Even though the machine is
dead (power is lost), it can retain information that will allow
the previously described specification to be met.

What I do is have a daemon which wakes up at 1 second intervals
and writes 'time_t' to a spare group of CMOS registers and then
re-checksums the CMOS. If the normal shutdown occurs, this value
is set to 0L and the CMOS is re-checksummed.

Access to the realtime clock is made through an installed module
driver with appropriate locking.

Upon startup, the system time is set to the stored CMOS time value
if it is non-zero. The appropriate log entries are made, then the
system time is set to the real CMOS clock time before the usual startup
log entries are made. So, even though the crash occurred several
days ago (the system crashed on a weekend), the log entries show,
within a second, the time at which the actual crash occurred.

It is 'impressive' to hit the power switch of a server and, upon
reboot, see the actual time at which the power was lost in a log
file.

Oct 12 10:35:29 chaos monitor[11]: Power failure

This all runs fine except when a SMP machine is used that has
the RTC interrupt enabled for time-keeping. Since I can't control
all the applications that a user might run on this server, I need
to disable all access to that chip during the time that I am
using it.

Currently, I cache any contents of command and status register 'B',
disable any periodic interrupt, do my thing, then put it back. However,
although this 'seems' to work, there is still room for a race because
these things can't be done simultaneously. What I really need is
a way of (effectively) disabling all interrupts. I note that the
common interrupt code could (but does not) contain a semaphore that
could be used to cause a spin when some driver needs all interrupts
disabled.

This might be very useful for problems like this.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at 
http://profiles.msn.com.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



No Subject

2000-10-23 Thread Lyle Coder


_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at 
http://profiles.msn.com.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



No Subject

2000-10-23 Thread Lyle Coder

Hi,
If you have a similar machine (in terms machine configuration) for both your 
solaris and linux machines... could you tell us what the difference in total 
time for 100 and 1 was?  i.e... dont compare solaris with 100 
descripters vs solaris with 1 descriptors, but rather
Linux 100 descripters Vs. Solaris 100 descriptors  AND
Linux 1 descriptors Vs. Solaris 1 descriptors.

That would be useful informatio... I think.

Thanks
Lyle

Re: Linux's implementation of poll() not scalable?

[ Small treatize on "scalability" included. People obviously do not
  understand what "scalability" really means. ]

In article <[EMAIL PROTECTED]>,
Dan Kegel  <[EMAIL PROTECTED]> wrote:
>I ran a benchmark to see how long a call to poll() takes
>as you increase the number of idle fd's it has to wade through.
>I used socketpair() to generate the fd's.
>
>Under Solaris 7, when the number of idle sockets was increased from 100 to 
>1, the time to check for active sockets with poll() increased by a 
>factor of only 6.5.  That's a sublinear increase in time, pretty spiffy.

Yeah. It's pretty spiffy.

Basically, poll() is _fundamentally_ a O(n) interface. There is no way
to avoid it - you have an array, and there simply is _no_ known
algorithm to scan an array in faster than O(n) time. Sorry.

(Yeah, you could parallellize it.  I know, I know.  Put one CPU on each
entry, and you can get it down to O(1).  Somehow I doubt Solaris does
that.  In fact, I'll bet you a dollar that it doesn't).

So what does this mean?

Either

(a) Solaris has solved the faster-than-light problem, and Sun engineers
 should get a Nobel price in physics or something.

(b) Solaris "scales" by being optimized for 1 entries, and not
 speeding up sufficiently for a small number of entries.

You make the call.

Basically, for poll(), perfect scalability is that poll() scales by a
factor of 100 when you go from 100 to 1 entries. Anybody who does
NOT scale by a factor of 100 is not scaling right - and claiming that
6.5 is a "good" scale factor only shows that you've bought into
marketing hype.

In short, a 6.5 scale factor STINKS. The only thing it means is that
Solaris is slow as hell on the 100 descriptor case.

>Under Linux 2.2.14 [or 2.4.0-test1-pre4], when the number of idle sockets 
>was increased from  100 to 1, the time to check for active sockets with 
>poll() increased by a factor of 493 [or 300, respectively].

So, what you're showing is that Linux actually is _closer_ to the
perfect scaling (Linux is off by a factor of 5, while Solaris is off by
a factor of 15 from the perfect scaling line, and scales down really
badly).

Now, that factor of 5 (or 3, for 2.4.0) is still bad.  I'd love to see
Linux scale perfectly (which in this case means that 1 fd's should
take exactly 100 times as long to poll() as 100 entries take).  But I
suspect that there are a few things going on, one of the main ones
probably being that the kernel data working set for 100 entries fits in
the cache or something like that.

>Please, somebody point out my mistake.  Linux can't be this bad!

I suspect we could improve Linux in this area, but I hope that I pointed
out the most fundamental mistake you did, which was thinking that
"scalability" equals "speed".  It doesn't.

Scalability really means that the effort to handle a problem grows
reasonably with the hardness of the problem. And _deviations_ from that
are indications of something being wrong.

Some people think that super-linear improvements in scalability are
signs of "goodness".  They aren't.  For example, the classical reason
for super-linear SMP improvement (with number of CPU's) that people get
so excited about really means that something is wrong on the low end.
Often the "wrongness" is lack of cache - some problems will scale better
than perfectly simply because with multiple CPU's you have more cache.

The "wrongess" is often also selecting the wrong algorithm: something
that "scales well" by just being horribly slow for the small case, and
being "less bad" for the big cases.

In the end, the notion of "scalability" is meaningless. The only
meaningful thing is how quickly something happens for the load you have.
That's something called "performance", and unlike "scalability", it
actually has real-life meaning.

Under Linux, I'm personally more worried about the performance of X etc,
and small poll()'s are actually common. So I would argue that the
Solaris scalability is going the wrong way. But as performance really
depends on the load, and maybe that 1 entry load is what you
consider "real life", you are of course free to disagree (and you'd be
equally right ;)

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
_
Get Your Private, Free E-mail from MSN Hotmail at http://www.ho

Re: Linux's implementation of poll() not scalable?

2000-10-23 Thread Lyle Coder

Hi,
If you have a similar machine (in terms machine configuration) for both your
solaris and linux machines... could you tell us what the difference in total
time for 100 and 1 was?  i.e... dont compare solaris with 100
descripters vs solaris with 1 descriptors, but rather
Linux 100 descripters Vs. Solaris 100 descriptors  AND
Linux 1 descriptors Vs. Solaris 1 descriptors.

That would be useful informatio... I think.

Thanks
Lyle

Re: Linux's implementation of poll() not scalable?

[ Small treatize on "scalability" included. People obviously do not
  understand what "scalability" really means. ]

In article <[EMAIL PROTECTED]>,
Dan Kegel  <[EMAIL PROTECTED]> wrote:
>I ran a benchmark to see how long a call to poll() takes
>as you increase the number of idle fd's it has to wade through.
>I used socketpair() to generate the fd's.
>
>Under Solaris 7, when the number of idle sockets was increased from 100 to 
>1, the time to check for active sockets with poll() increased by a 
>factor of only 6.5.  That's a sublinear increase in time, pretty spiffy.

Yeah. It's pretty spiffy.

Basically, poll() is _fundamentally_ a O(n) interface. There is no way
to avoid it - you have an array, and there simply is _no_ known
algorithm to scan an array in faster than O(n) time. Sorry.

(Yeah, you could parallellize it.  I know, I know.  Put one CPU on each
entry, and you can get it down to O(1).  Somehow I doubt Solaris does
that.  In fact, I'll bet you a dollar that it doesn't).

So what does this mean?

Either

(a) Solaris has solved the faster-than-light problem, and Sun engineers
 should get a Nobel price in physics or something.

(b) Solaris "scales" by being optimized for 1 entries, and not
 speeding up sufficiently for a small number of entries.

You make the call.

Basically, for poll(), perfect scalability is that poll() scales by a
factor of 100 when you go from 100 to 1 entries. Anybody who does
NOT scale by a factor of 100 is not scaling right - and claiming that
6.5 is a "good" scale factor only shows that you've bought into
marketing hype.

In short, a 6.5 scale factor STINKS. The only thing it means is that
Solaris is slow as hell on the 100 descriptor case.

>Under Linux 2.2.14 [or 2.4.0-test1-pre4], when the number of idle sockets 
>was increased from  100 to 1, the time to check for active sockets with 
>poll() increased by a factor of 493 [or 300, respectively].

So, what you're showing is that Linux actually is _closer_ to the
perfect scaling (Linux is off by a factor of 5, while Solaris is off by
a factor of 15 from the perfect scaling line, and scales down really
badly).

Now, that factor of 5 (or 3, for 2.4.0) is still bad.  I'd love to see
Linux scale perfectly (which in this case means that 1 fd's should
take exactly 100 times as long to poll() as 100 entries take).  But I
suspect that there are a few things going on, one of the main ones
probably being that the kernel data working set for 100 entries fits in
the cache or something like that.

>Please, somebody point out my mistake.  Linux can't be this bad!

I suspect we could improve Linux in this area, but I hope that I pointed
out the most fundamental mistake you did, which was thinking that
"scalability" equals "speed".  It doesn't.

Scalability really means that the effort to handle a problem grows
reasonably with the hardness of the problem. And _deviations_ from that
are indications of something being wrong.

Some people think that super-linear improvements in scalability are
signs of "goodness".  They aren't.  For example, the classical reason
for super-linear SMP improvement (with number of CPU's) that people get
so excited about really means that something is wrong on the low end.
Often the "wrongness" is lack of cache - some problems will scale better
than perfectly simply because with multiple CPU's you have more cache.

The "wrongess" is often also selecting the wrong algorithm: something
that "scales well" by just being horribly slow for the small case, and
being "less bad" for the big cases.

In the end, the notion of "scalability" is meaningless. The only
meaningful thing is how quickly something happens for the load you have.
That's something called "performance", and unlike "scalability", it
actually has real-life meaning.

Under Linux, I'm personally more worried about the performance of X etc,
and small poll()'s are actually common. So I would argue that the
Solaris scalability is going the wrong way. But as performance really
depends on the load, and maybe that 1 entry load is what you
consider "real life", you are of course free to disagree (and you'd be
equally right ;)

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotma

Linus's poll variation

2000-10-31 Thread Lyle Coder

Hello,
Is someone working on Linus's poll variation discussed in this list a week
ago?

Thanks
Lyle

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: malloc(1/0) ??

2000-11-07 Thread Lyle Coder

When a program does a malloc... the glibc gets atleast on page (brk)
[actually, glibs determins of it needs to brk more memory from the kernel...
because it maintains it;s own pool].. so if you malloc 4 byts, you can copy
to that pointer more than 4 bytes (upto a page size, ex 4K)... hope that
answers one of your questions... as far as why malloc(0) works... I dunno

Best Wishes,
Lyle
- Original Message -
From: "David Schwartz" <[EMAIL PROTECTED]>
To: "RAJESH BALAN" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, November 06, 2000 11:54 PM
Subject: RE: malloc(1/0) ??


> > hi,
> > why does this program works. when executed, it doesnt
> > give a segmentation fault. when the program requests
> > memory, is a standard chunk is allocated irrespective
> > of the what the user specifies. please explain.
> >
> > main()
> > {
> >char *s;
> >s = (char*)malloc(0);
> >strcpy(s,"f");
> >printf("%s\n",s);
> > }
> >
> > NOTE:
> >   i know its a 'C' problem. but i wanted to know how
> > this works
>
> The program does not work. A program works if it does what it's supposed
to
> do. If you want to argue that this program is supposed to print "ff"
> then explain to me why the 'malloc' contains a zero in parenthesis.
>
> The program can't possibly work because it invokes undefined behavior. It
> is impossible to determine what a program that invokes undefined behavior
is
> 'supposed to do'.
>
> DS
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Pentium 4 and 2.4/2.5

2000-11-07 Thread Lyle Coder

Alan,
are you saying that rep;nop is not needed in the spinlocks? (because they
are for P4)

Thanks
Lyle
- Original Message -
From: "Alan Cox" <[EMAIL PROTECTED]>
To: "Andre Hedrick" <[EMAIL PROTECTED]>
Cc: "Frank Davis" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Tuesday, November 07, 2000 4:13 AM
Subject: Re: Pentium 4 and 2.4/2.5


> > Not to worry, some of us are working with the 'I' guys to do proper P4
> > detection.
>
> Be careful with the intel patches. The ones I've seen so far tried to call
the
> cpu 'if86' breaking several tools that do cpu model checking off uname.
They
> didnt fix the 2GHz CPU limit, they use 'rep nop' in the locks which is
> explicitly 'undefined behaviour' for non intel processors and they use the
> TSC without checking it had one.
>
> Hopefully they have improved since
>
> Alan
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: SSE instructions

2000-09-28 Thread Lyle Coder

Hello,
2.4.0-test1 and higher.  make sure you select PIII as the CPU in the config.

Best Wishes,
Lyle

--
Which version of the kernel is needed in order to run the following
program on an PIII?

void main()
{
__asm__ __volatile__("xorps %%xmm0, %%xmm1" ::: "memory");
}

astor

--
Alexander KjeldaasMail:  [EMAIL PROTECTED]
finger [EMAIL PROTECTED] for OpenPGP key.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/
_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at 
http://profiles.msn.com.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



cann't dump info to user file from kernel

2007-10-02 Thread kernel coder
hi,
  I'm trying to dump some information from dev.c to user space
file.Following is the code which i'm using to write to user spcae
file.I'm using 2.6.22.x86_64 kernel.


#define _write(f, buf, sz) (f->f_op->write(f, buf, sz, &f->f_pos))
#define WRITABLE(f) (f->f_op && f->f_op->write)

int write_to_file(char *logfile, char *buf,int size)
{
int ret = 0;
struct file *f=NULL;
mm_segment_t old_fs = get_fs();
set_fs(get_ds());
 f = filp_open(logfile, O_CREAT|O_APPEND,00600);
if(IS_ERR(f)){
DPRINT("Error %ld openeing %s\n",-PTR_ERR(f), logfile);
ret = -1;
} else {
if (WRITABLE(f))
_write(f, buf, size);
else {
DPRINT("%s does not have a write method\n",
logfile);
ret = -1;
}

if ((ret = filp_close(f,NULL)))
DPRINT("Error %d closing %s\n", -ret, logfile);
}
END_KMEM;

return ret;
}


I'm calling this function from netif_recieve_skb in dev.c

int netif_recieve_skb(struct sk_buff *skb){

-
write_to_file("/root/kernel_log","hello_world",12);
--

}

But whenever this function is called ,the kernel simply halts.Please
tell me what might be the reason.

I just want to dump some information to user spcace file from dev.c
.Is there some better way to do it.


thanks,
shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


profile code added to netif_receive_skb function

2007-11-25 Thread kernel coder
hi,

I have added some code to netif_receive_skb function.As linux kernel
is multhreaded , so there is no gaurantee than mine code is completely
executed without being disturbed by any other process .Timer interrupt
handler is an example of code which might interrupt execution of mine
code.

I just want to observe which processes are disturbing mine code .I
think i need to print EIP register values .How can i print cache
contents as well in linux kernel .Are there any tools available for
such purpose


thanks,
shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


increased number of cycles

2007-11-17 Thread kernel coder
hi,
I'm trying to add some code to netif_receive_skb function in
dev.c file . The cycles consumed by that code was around 16 cycles on
Dual Core Opetron machine.I'm working on that code for last 6 months
now and the consumed cycles have always been around 16 cycles .I don't
touch any other part of kernel .

But for last 4 days the consumed cycles have suddenly increased to
around 35 cycles . I'm using RDTSC instruction to profile the
code.There is no change in code and the kernel version is also the
same .I am assuming that there  must be something wrong with hardware.

Please guide me how can i figure out the root cause.What areas should
i look at to find out the reason for increased number of cycles.I
don't think that there is any issue in kernel because the kernel
version and code  is same. Can the the log messages during system
bootup help me to diagnose the problem


shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


tcp/ip stack question

2007-06-07 Thread kernel coder

hi,
  I am recieveing the packet on eth1 and want to send it through eth2.

I've written code in netif_recieve_skb function .This code changes the
mac header in sk_buff structure so that it can be send through other
interface card.But when i call ip_dev_find fucntion to get the second
interface structure ,NULL is returned.I checked the ip of second
ethernet card  and it was similar to one passed to ip_dev_find
fucntion,then why NULL is being returned?


Actually if i get the correct dev structure from ip_dev_find fucntion
then i'll assign that dev structure to current skbuff->dev  and call
dev_queue_xmit fucntion,so that it transmitted through second
interface card.Is mine approach correct?


shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


system call implementation for x86_64

2007-05-19 Thread kernel coder

hi,

I'm trying to implement a system call for x86_64. Mine processor is
dual core opetron.There is very little material on web for
implementing system calls for x86_64 processor for 2.6 series kernel.I
tried to implement a new system call by observing the existing
implementation but to no success.Following are files names and changes
made.

//
file-> include/asm-x86_64/unistd.h

#define __NR_newcall273
__SYSCALL(__NR_newcall, sys_newcall)

#define __NR_syscall_max __NR_newcall

//
file-> include/linux/syscalls.h

asmlinkage unsigned long sys_newcall(char __user *buf);

/
file--> fs/read_write.c

asmlinkage unsigned long sys_newcall(char __user * buf){

printk("new system call \n");
ret 0;
}

EXPORT_SYMBOL_GPL(sys_write)


Please let me know where i'm doing wrong .Following is program which
is calling mine system call


#include 
#include 
#include 
#include 

 long int ret;
  int num = 243;
 char  buffer=[20];

int main() {


 asm ("syscall;"
  : "=a" (ret)
  : "0" (num),
"D" (buffer),
 );
return ret;
}

When i call this ,nothing gets printed in file /var/log/messages.Am i
missing something ?

Actually i wana pass a pointer to kernel from user space.Later on data
will be copied to that memory location .i am thinking of using
copy_to_user for copying data.Buffer passed through system call will
be used by kernel function as circular ring.And portions of this ring
will get updated frequently even after system call has returned.

Is there any better way to do this?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


AMD dual core opetron optimization

2007-04-30 Thread kernel coder

hi,

I'm doing trying to write some optimized code  for AMD dual core
opetron processor.But things are getting no where.I've installed
Fedora 5 with 2.6 series Linux kernel and 4 series GCC

Following are few lines of code which are consuming close to 100
cycles.Yes this is not the forum for such questions but i think people
on linux kernel and GCC are best to answer such type of questions.I'm
realy getting frustated and helpless ,that's why i've put question on
this forum.

/***/
/* these variables will be used for RDTSC instrucion */
uint64_t before, overhead, clocks;


/*ReadTsc funtion is given below */
before=ReadTsc();
before=ReadTsc();
before=ReadTsc();

overhead=ReadTsc()-before;

printf(" ReadTSC overerhead is %lu ",overhead);


unsigned int test;
unsigned long buffer [128];
buffer[12]=08;
buffer[13]=00;
buffer[23]=06;

/* starting cycles */

before=ReadTsc();

/**Start of Targeted code
**/

test= buffer[12] | buffer[13] | buffer[23] ;

switch( test )

{

case  12:
   asm(" jmp proc_1");

case  13:
asm("jmp proc_2");
case  14:
asm("jmp proc_3");
case  15:
asm("jmp proc_4");
default :
asm("jmp proc_5");

}



asm(" proc_5:");

/**End of Targeted code **/
  /*current cycles */
  clocks=ReadTsc() ;

clocks=clocks - before;
printf("\n cycles consumed %lu \n",clocks - overhead);

/**/

The overhead varies from generally 360  to 395 cycles .Sometimes it
also reduces close to 270 cycles.

Cycles consumed by the targetd code varies from 20 to 100
cycles.Theoratically i thing cycles consumed should be less than
20.Then why so many cycles  ? and the output vary from 20 to 100
cycles .Sometimes it crosses 100 cycles as well.

Sometimes the cycles consumed by targetted code become far less that
the RDTSC instrucion overhead.

Is there better way to write above code.I even used the prefetch
instruction  before the targeted code to make sure that buffer is in
the L1 cache but no success.

The code for ReadTsc() is as follows.Please also tell me if its
correct way to measure cycles .


/*/

typedef long long __int64;

__int64 ReadTSC() {
  int res[2];  // store 64 bit result here
  #if defined(__GNUC__) && !defined(__INTEL_COMPILER)
  // Inline assembly in AT&T syntax
  #if defined (_LP64)  // 64 bit mode
 __asm__ __volatile__  (   // serialize (save rbx)
 "xorl %%eax,%%eax \n push %%rbx \n cpuid \n"
  ::: "%rax", "%rcx", "%rdx");
 __asm__ __volatile__  (   // read TSC, store edx:eax in res
 "rdtsc\n"
  : "=a" (res[0]), "=d" (res[1]) );
 __asm__ __volatile__  (   // serialize again
 "xorl %%eax,%%eax \n cpuid \n pop %%rbx \n"
  ::: "%rax", "%rcx", "%rdx");
  #else// 32 bit mode
 __asm__ __volatile__  (   // serialize (save ebx)
 "xorl %%eax,%%eax \n pushl %%ebx \n cpuid \n"
  ::: "%eax", "%ecx", "%edx");
 __asm__ __volatile__  (   // read TSC, store edx:eax in res
 "rdtsc\n"
  : "=a" (res[0]), "=d" (res[1]) );
 __asm__ __volatile__  (   // serialize again
 "xorl %%eax,%%eax \n cpuid \n popl %%ebx \n"
  ::: "%eax", "%ecx", "%edx");
  #endif
  #else
  // Inline assembly in MASM syntax
 __asm {
xor eax, eax
cpuid  // serialize
rdtsc  // read TSC
mov dword ptr res, eax // store low dword in res[0]
mov dword ptr res+4, edx   // store high dword in res[1]
xor eax, eax
cpuid  // serialize again
 };
  #endif   // __GNUC__
  return *(__int64*)res;   // return result
}


/*/


thanks,
shahzad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


bechmarking kernel code

2007-05-03 Thread kernel coder

hi,
I'm profiling some part of kernel code.Mine profiling mechanism
is based on rdtsc instruction.

Please tell me if i'm profiling correctly.I'm teting linux kernel
2.6.15 and mine system is P4.


function(){

unsigned long long c1,c2,c3,c4,c5;
before=readtsc();
before=readtsc();
before=readtsc();
overhead=readtsc()-before;
c1=testfunc();
c2=testfunc();
c3=testfunc();
c4=testfunc();
c5=testfunc();

printk(" \n *c1 = %llu c2= %llu c3 = %llu c4 = %llu c5 = %llu overhead
=%llu",cycles1,cycles2,cycles3,cycles4,cycles5,overhead);


}

/***/


unsigned long long readtsc(){
unsigned long res[2]={0,0};
 __asm__ __volatile__  (
  "rdtsc\n"
   : "=a" (res[0]), "=d" (res[1]) );

return *((unsigned long long *)res);

}

/***/

unsigned long long testfunc(){
unsigned long start_cycles[2]={0,0};
unsigned long end_cycles[2]={0,0};
unsigned long long total_cycles=0;
int i;
 __asm__ __volatile__  (   // read TSC, store edx:eax in res
  "rdtsc\n"
   : "=a" (start_cycles[0]), "=d" (start_cycles[1]) );


  /*  CODE TO BE PROFILED */


 __asm__ __volatile__  (   // read TSC, store edx:eax in res
  "rdtsc\n"
   : "=a" (end_cycles[0]), "=d" (end_cycles[1]) );
total_cycles=*((unsigned long long *)end_cycles) - *((unsigned long
long *)start_cycles);
return total_cycles;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH] xen: remove XEN_PRIVILEGED_GUEST

2014-02-24 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 24.02.2014 19:39, Konrad Rzeszutek Wilk wrote:
> On Tue, Feb 18, 2014 at 11:14:27AM +0100, Paul Bolle wrote:
>> On Mon, 2014-02-17 at 09:43 -0500, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Feb 17, 2014 at 02:03:17PM +0100, Paul Bolle wrote:
 On Mon, 2014-02-17 at 07:23 -0500, Konrad Rzeszutek Wilk wrote:
> On Feb 16, 2014 3:07 PM, Paul Bolle  wrote:
> Please look in the grub git tree. They have fixed their code to not do
> this anymore. This should be reflected in the patch description.

 Thanks, I didn't know that. That turned out to be grub commit
 ec824e0f2a399ce2ab3a2e3353d372a236595059 ("Implement grub_file tool and
 use it to implement generating of config"), see
 http://git.savannah.gnu.org/cgit/grub.git/commit/util/grub.d/20_linux_xen.in?id=ec824e0f2a399ce2ab3a2e3353d372a236595059
>>
>> And that commit was reverted a week later in grub commit
>> faf4a65e1e1ce1d822d251c1e4b53d96ec7faec5 ("Revert grub-file usage in
>> grub-mkconfig."), see
>> http://git.savannah.gnu.org/cgit/grub.git/commit/util/grub.d/20_linux_xen.in?id=faf4a65e1e1ce1d822d251c1e4b53d96ec7faec5
>>  .
>>
>> That commit has no explanation (other than its one line summary). So
>> we're left guessing why this was done. Luckily, it doesn't matter here,
>> because the test for CONFIG_XEN_PRIVILEGED_GUEST is superfluous.
> 
> How about we ask Vladimir?
> 
> Vladimir - could you shed some light on it? Thanks!
> 
CONFIG_XEN_PRIVILEGED_GUEST is not present on Linux even though it
should be. The test was removed to accomodate this.
The usage of grub-file was removed because it wasn't release-ready.
>>
>> Anyhow, I hope to submit a second version of this patch later this day.
>>
>>
>> Paul Bolle
>>
>>
>> ___
>> Xen-devel mailing list
>> xen-de...@lists.xen.org
>> http://lists.xen.org/xen-devel
> 




signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-28 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko

>>> Will a multiboot2 tag with whole EFI memory map solve your problem?
>> I added such a tag in documentation and wrote a patch for it (attached).
>> Awaiting for someone to test it to commit
> 
> Great! I think from Xen perspective we first need to have Xen be able
> to understand multiboot2 - that is something Daniel had been working on.
> I will let Daniel talk more about it.
> 
> Seth, would you have any time to test the patch against Solaris to
> make sure it works?
> 
I've committed that patch. BTW do you want protected mode or long mode
entry point for x86_64 variant? Currently it's protected mode but I
planned to add long mode possibility but it wasn't a priority.




signature.asc
Description: OpenPGP digital signature


Re: Is: Wrap-up Was: Re: EFI and multiboot2 devlopment work for Xen

2013-10-30 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 30.10.2013 12:19, Daniel Kiper wrote:
> Hi,
> multiboot2 protocol requires some more changes. However, about 80% of code
> is ready. In this case Xen and modules are loaded by GRUB2 itself. It means
> that all images could be placed on any filesystem recognized by GRUB2. Options
> for Xen and modules are passed separately which simplifies command line 
> editing
> in boot loader and parsing. multiboot2 protocol is very flexible and could be
> easily extended in the future if a need arises. Support for secure boot and
> shim loader could be added. However, it was not implemented yet. Probably
> linuxefi module could be used as a reference or even as a base for 
> development.
> However, I do not know are there plans to support such solution by GRUB2
> community. Currently, support for native PE images signatures and GPG 
> signatures
> is under development for GRUB2 upstream.
> 
GPG signatures are supported already. My plan is as follows:
- Implement PE signatures upstream.
- Uplift as much of secureboot to upstream as policy permits. I would
like to be in partnership over this with some distro people so that they
can carry remaining part (unless FSF allows secureboot per policy)
> There is still open question that ExitBootServices() should be called by GRUB2
> loader or by loaded image itself on EFI platform. UEFI spec 2.4 states in many
> places that it is "OS loader" or "Operating System" responsibility. However,
> I think that "OS loader" should be understood as a integral piece of 
> "Operating
> System" responsible for its load into memory without usage of any additional
> loader like GRUB2.
"Operating system" isn't just kernel. Everything you get in base install
is "Operating system" including i.a. shell or bootloader.
However this is kind of decision that couldn't be taken based on spec
alone. The bugs in real-world EFI implementations play more role in
design solutions that EFI specification.
> There is also third solution for issues with ExitBootServices(). In case
> of multiboot2 protocol OS could request that EFI should be left as is.
> Solution was proposed by Vladimir and I think that it makes sense.
I will write the specification draft for it then but probably not today.
> However,
> this does not solve problem with ExitBootServices() in case of other
> boot loaders/protocols.
multiboot2 was designed in a way not to be limited to GRUB2. It can be
added to other bootloaders as well.
> So we should take a decision accordingly to above
> considerations in regards to linux, chainloader and similar stuff.
> 
> Daniel
> 




signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-21 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
Mail is big, I think I got your essential points but I didn't read it whole.
On 21.10.2013 14:57, Daniel Kiper wrote:
> Hi,
> 
> During work on multiboot2 protocol support for Xen it was discovered
> that memory map passed via relevant tag could not represent wide range
> of memory types available on EFI platforms. Additionally, GRUB2
> implementation calls ExitBootServices() on them just before jumping
> into loaded image. In this situation loaded system could not clearly
> identify reserved memory regions, EFI runtime services regions and others.
> 
Will a multiboot2 tag with whole EFI memory map solve your problem?
> Additionally, it should be mentioned that there is no possibility or it could
> be very difficult to implement secure boot on EFI platforms using GRUB2 as 
> boot
> loader because, as it was mentioned earlier, it calls ExitBootServices().
> 
GRUB has generic support for signing kernels/modules/whatsoever using
GnuPG signatures. You'd just have to ship xen.sig and kernel.sig. This
method doesn't have any controversy associated with EFI stuff but at
this particular case does exactly the same thing: verify signature.
multiboot2 is mainly memory structure specification so probably how the
files are checked is outside of its scope. But it's possible to add
specification on how to embed signatures in kernel.



signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-21 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 21.10.2013 22:53, Seth Goldberg wrote:
> 
> 
> Quoting Daniel Kiper, who wrote the following on Mon, 21 Oct 2013:
> 
>> Hi,
>>
>> During work on multiboot2 protocol support for Xen it was discovered
>> that memory map passed via relevant tag could not represent wide range
>> of memory types available on EFI platforms. Additionally, GRUB2
>> implementation calls ExitBootServices() on them just before jumping
>> into loaded image. In this situation loaded system could not clearly
>> identify reserved memory regions, EFI runtime services regions and
>> others.
> 
>   Yes, that is exactly why we added full support to pass the entire UEFI
> memory map via a new tag.
> 
Can you send this patch? Or provide a link to publically available
source? I think we can accept it with probably just minor changes.



signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-22 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 21.10.2013 23:16, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> Mail is big, I think I got your essential points but I didn't read it whole.
> On 21.10.2013 14:57, Daniel Kiper wrote:
>> Hi,
>>
>> During work on multiboot2 protocol support for Xen it was discovered
>> that memory map passed via relevant tag could not represent wide range
>> of memory types available on EFI platforms. Additionally, GRUB2
>> implementation calls ExitBootServices() on them just before jumping
>> into loaded image. In this situation loaded system could not clearly
>> identify reserved memory regions, EFI runtime services regions and others.
>>
> Will a multiboot2 tag with whole EFI memory map solve your problem?
I added such a tag in documentation and wrote a patch for it (attached).
Awaiting for someone to test it to commit

=== modified file 'grub-core/loader/i386/multiboot_mbi.c'
--- grub-core/loader/i386/multiboot_mbi.c	2013-10-14 14:33:44 +
+++ grub-core/loader/i386/multiboot_mbi.c	2013-10-22 06:57:45 +
@@ -36,6 +36,10 @@
 #include 
 #include 
 
+#ifdef GRUB_MACHINE_EFI
+#include 
+#endif
+
 /* The bits in the required part of flags field we don't support.  */
 #define UNSUPPORTED_FLAGS			0xfff8
 
@@ -579,6 +583,12 @@
   ptrdest += sizeof (struct grub_vbe_mode_info_block);
 #endif
 
+#ifdef GRUB_MACHINE_EFI
+  err = grub_efi_finish_boot_services (NULL, NULL, NULL, NULL, NULL);
+  if (err)
+return err;
+#endif
+
   return GRUB_ERR_NONE;
 }
 

=== modified file 'grub-core/loader/multiboot.c'
--- grub-core/loader/multiboot.c	2013-09-23 11:35:33 +
+++ grub-core/loader/multiboot.c	2013-10-22 06:51:30 +
@@ -131,12 +131,6 @@
   if (err)
 return err;
 
-#ifdef GRUB_MACHINE_EFI
-  err = grub_efi_finish_boot_services (NULL, NULL, NULL, NULL, NULL);
-  if (err)
-return err;
-#endif
-
 #if defined (__i386__) || defined (__x86_64__)
   grub_relocator32_boot (grub_multiboot_relocator, state, 0);
 #else

=== modified file 'grub-core/loader/multiboot_mbi2.c'
--- grub-core/loader/multiboot_mbi2.c	2013-10-14 14:33:44 +
+++ grub-core/loader/multiboot_mbi2.c	2013-10-22 06:57:58 +
@@ -295,9 +295,55 @@
 #endif
 }
 
+#ifdef GRUB_MACHINE_EFI
+
+static grub_efi_uintn_t efi_mmap_size = 0;
+
+/* Find the optimal number of pages for the memory map. Is it better to
+   move this code to efi/mm.c?  */
+static void
+find_efi_mmap_size (void)
+{
+  efi_mmap_size = (1 << 12);
+  while (1)
+{
+  int ret;
+  grub_efi_memory_descriptor_t *mmap;
+  grub_efi_uintn_t desc_size;
+  grub_efi_uintn_t cur_mmap_size = efi_mmap_size;
+
+  mmap = grub_malloc (cur_mmap_size);
+  if (! mmap)
+	return;
+
+  ret = grub_efi_get_memory_map (&cur_mmap_size, mmap, 0, &desc_size, 0);
+  grub_free (mmap);
+
+  if (ret < 0)
+	return;
+  else if (ret > 0)
+	break;
+
+  if (efi_mmap_size < cur_mmap_size)
+	efi_mmap_size = cur_mmap_size;
+  efi_mmap_size += (1 << 12);
+}
+
+  /* Increase the size a bit for safety, because GRUB allocates more on
+ later, and EFI itself may allocate more.  */
+  efi_mmap_size += (3 << 12);
+
+  efi_mmap_size = ALIGN_UP (efi_mmap_size, 4096);
+}
+#endif
+
 static grub_size_t
 grub_multiboot_get_mbi_size (void)
 {
+#ifdef GRUB_MACHINE_EFI
+  if (!efi_mmap_size)
+find_efi_mmap_size ();
+#endif
   return 2 * sizeof (grub_uint32_t) + sizeof (struct multiboot_tag)
 + (sizeof (struct multiboot_tag_string)
+ ALIGN_UP (cmdline_size, MULTIBOOT_TAG_ALIGN))
@@ -318,6 +364,10 @@
 + ALIGN_UP (sizeof (struct multiboot_tag_old_acpi)
 		+ sizeof (struct grub_acpi_rsdp_v10), MULTIBOOT_TAG_ALIGN)
 + acpiv2_size ()
+#ifdef GRUB_MACHINE_EFI
++ ALIGN_UP (sizeof (struct multiboot_tag_efi_mmap)
+		+ efi_mmap_size, MULTIBOOT_TAG_ALIGN)
+#endif
 + sizeof (struct multiboot_tag_vbe) + MULTIBOOT_TAG_ALIGN - 1
 + sizeof (struct multiboot_tag_apm) + MULTIBOOT_TAG_ALIGN - 1;
 }
@@ -760,6 +810,28 @@
   }
 #endif
 
+#ifdef GRUB_MACHINE_EFI
+  {
+struct multiboot_tag_efi_mmap *tag = (struct multiboot_tag_efi_mmap *) ptrorig;
+grub_efi_uintn_t efi_desc_size;
+grub_efi_uint32_t efi_desc_version;
+
+tag->type = MULTIBOOT_TAG_TYPE_EFI_MMAP;
+tag->size = sizeof (*tag) + efi_mmap_size;
+
+err = grub_efi_finish_boot_services (&efi_mmap_size, tag->efi_mmap, NULL,
+	 &efi_desc_size, &efi_desc_version);
+if (err)
+  return err;
+tag->descr_size = efi_desc_size;
+tag->descr_vers = efi_desc_version;
+tag->size = sizeof (*tag) + efi_mmap_size;
+
+ptrorig += ALIGN_UP (tag->size, MULTIBOOT_TAG_ALIGN)
+  / sizeof (grub_properly_aligned_t);
+  }
+#endif
+
   {
 struct multiboot_tag *tag = (struct multiboot_tag *) ptrorig;
 tag->type = MULTIBOOT_TAG_TYPE_END;

=== modified file 'include/mu

Re: EFI and multiboot2 devlopment work for Xen

2013-10-22 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 22.10.2013 16:51, Konrad Rzeszutek Wilk wrote:
> If you use 'linux' module, it will call ExitBootService.
> If you use 'multiboot' module, it will call ExitBootService too.
> 
> So if you don't want to the module to call 'grub_efi_finish_boot_services'
> you need to use 'linuxefi' :-)
That's a very limited logic. Commands can be modified and protocols can
be extended.
There was only one e-mail explaining the needs and I answered with
proposing possible solutions yet the 2 e-mails in question were
completely ignored.
What's the need behind not calling ExitBootService? This is a point
which was never really explained to me. EFI specification specifically
tells to call ExitBootService.



signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-22 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 22.10.2013 18:01, Daniel Kiper wrote:
> On Tue, Oct 22, 2013 at 03:42:42PM +, Woodhouse, David wrote:
>> On Tue, 2013-10-22 at 16:32 +0100, Matthew Garrett wrote:
>>>
>>> There are two problems with this:
>>>
>>> 1) The kernel will only boot if it's signed with a key in db, not a key
>>> in MOK.
>>> 2) grub will read the kernel, but the kernel will have to read the
>>> initramfs using EFI calls. That means your initramfs must be on a FAT
>>> partition.
>>>
>>> If you're happy with those limitations then just use the chainloader
>>> command. If you're not, use the linuxefi command.
>>
>> Well, we're talking about booting the Xen hypervisor aren't we?
>>
>> So yes, there are reasons the Linux kernel uses the 'boot stub' the way
>> it does, but I'm not sure we advocate that Xen should emulate that in
>> all its 'glory'?
> 
> Right, I think that sensible mixture of multiboot2 protocol (it is needed
> to pass at least modules list to Xen; IIRC, linuxefi uses Linux Boot protocol
> for it) with extension proposed by Vladimir and something similar to linuxefi
> command will solve our problem (I proposed it in my first email). Users which
> do not need SB may use upstream GRUB2 and others could use
> 'multiboot2efi extension'.
I think it's possible to handle secureboot with same multiboot2 base.
Correct me if I'm wrong but secureboot doesn't specify format of
signaatures, only that they should be present and checked.
So why not to make that the only difference between secureboot-enabled
and not-secureboot-enabled versions is that former enforces signatures
even against user will. This will reduce the policy-charger patch to
about 100 lines.
The signature format to use can be discussed as well. My main problem
with pe signatures as used for EFI is their apparent complexity but I
haven't looked in them yet.
> 
> Daniel
> 
> ___
> Grub-devel mailing list
> grub-de...@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
> 




signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-22 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 22.10.2013 18:14, Daniel Kiper wrote:
>> > Are you (going to be) in Edinburgh? Matthew was just explaining a bunch
>> > of this stuff to me, it might be useful for you to get it from the
>> > horses mouth instead of laundered through my brain (which is a bit
>> > addled afterwards ;-)).
What and when happens in Edinburgh? It's close enough to me and I might
be able to free myself if it's for collaboration with xen. I'd also like
to discuss grant tables version issue for pvgrub2 (which is almost ready)



signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-22 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 22.10.2013 18:51, Daniel Kiper wrote:
> On Tue, Oct 22, 2013 at 04:36:04PM +, Maliszewski, Richard L wrote:
>> I may be off-base, but when I was wading through the grub2 code earlier
>> this year, it looked to me like it was going to refuse to launch anything
>> via MB1 or MB2 if the current state was a secure boot launch.
> 
> Are you talking about upstream GRUB2 or GRUB2 with tons of distros
> patches including linuxefi one. If later one it could be the case.
> 
> Daniel
> 
secureboot patch in its current state has only one goal: make microsoft
sign existing image and load linux. If we integrate it with GRUB
signatures check (as far as GNU policy permits but rest would be tiny)
then it will be a matter of choosing which way xen is going to be
signed. I'd recommend GnuPG detached signature (xen and xen.sig) but
don't insist on it.



signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-22 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 22.10.2013 19:12, Andrey Borzenkov wrote:
> В Mon, 21 Oct 2013 23:16:24 +0200
> Vladimir 'φ-coder/phcoder' Serbinenko  пишет:
> 
>> GRUB has generic support for signing kernels/modules/whatsoever using
>> GnuPG signatures. You'd just have to ship xen.sig and kernel.sig. This
>> method doesn't have any controversy associated with EFI stuff but at
>> this particular case does exactly the same thing: verify signature.
>> multiboot2 is mainly memory structure specification so probably how the
>> files are checked is outside of its scope. But it's possible to add
>> specification on how to embed signatures in kernel.
>>
> 
> I'm a bit skeptical here. Given that
> 
> - EFI secure boot will still be needed to handle Windows
> - kernel can be launched directly as EFI application
> - there are other bootloaders with secure boot support
> 
> distributions will likely need to carry on EFI secure boot support. At
> which point it is not clear what advantages second, parallel,
> infrastructure for the sake of single application will bring.
> 
Using PE signatures is possible as I already said which invalidates your
points.
> The most compelling reason would be allowing module loading (which is
> currently disabled by secure boot patches).
> 




signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-23 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 23.10.2013 09:43, Daniel Kiper wrote:
> On Mon, Oct 21, 2013 at 11:16:24PM +0200, Vladimir 'φ-coder/phcoder' 
> Serbinenko wrote:
>> Mail is big, I think I got your essential points but I didn't read it whole.
>> On 21.10.2013 14:57, Daniel Kiper wrote:
>>> Hi,
>>>
>>> During work on multiboot2 protocol support for Xen it was discovered
>>> that memory map passed via relevant tag could not represent wide range
>>> of memory types available on EFI platforms. Additionally, GRUB2
>>> implementation calls ExitBootServices() on them just before jumping
>>> into loaded image. In this situation loaded system could not clearly
>>> identify reserved memory regions, EFI runtime services regions and others.
>>>
>> Will a multiboot2 tag with whole EFI memory map solve your problem?
>>> Additionally, it should be mentioned that there is no possibility or it 
>>> could
>>> be very difficult to implement secure boot on EFI platforms using GRUB2 as 
>>> boot
>>> loader because, as it was mentioned earlier, it calls ExitBootServices().
>>>
>> GRUB has generic support for signing kernels/modules/whatsoever using
>> GnuPG signatures. You'd just have to ship xen.sig and kernel.sig. This
>> method doesn't have any controversy associated with EFI stuff but at
>> this particular case does exactly the same thing: verify signature.
>> multiboot2 is mainly memory structure specification so probably how the
>> files are checked is outside of its scope. But it's possible to add
>> specification on how to embed signatures in kernel.
> 
> I think that EFI signatures should be supported because they are quite
> common right now. However, I think that it is also worth to support
> GnuPG signatures. This way anybody will be able to choose good solution
> for a given case.
> 
Agreed.

> Daniel
> 




signature.asc
Description: OpenPGP digital signature


Re: EFI and multiboot2 devlopment work for Xen

2013-10-23 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 23.10.2013 09:05, Daniel Kiper wrote:
> Thanks. Could you send me a pointer to current multiboot2 protocol docs?
It's managed as "multiboot2" branch in our repo:
http://git.savannah.gnu.org/cgit/grub.git
Note: we're in process of moving from bzr to git which may cause the
link to change.



signature.asc
Description: OpenPGP digital signature


Re: [Xen-devel] EFI and multiboot2 devlopment work for Xen

2013-10-23 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 23.10.2013 15:13, Konrad Rzeszutek Wilk wrote:
>  - not make an ExitBootServices call - which it does right now in the Solaris
>GRUB2 case and in the Fedora GRUB2 case.
What about having a special tag in multiboot2 file header "RKEBSIHE":
"request to keep EFI boot services" and then bootloader will pass
another (empty, other than header) info tag: "Beware of EFI"
>  - Do the signature verification (hand-waving which one - probably both).
Can someone throw me the link on the EFI signature specification? Can't
really find it now.
>  - Pack the right bits in the multiboot2 structure.
As soon as my patch is tested for compatibility with Solaris, I commit
it. Tell me if you need sth else.



signature.asc
Description: OpenPGP digital signature


Re: [Xen-devel] EFI and multiboot2 devlopment work for Xen

2013-10-23 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
> GrUB - which iiuc stays in memory
> after transferring control - could export its file system support to its
> descendants).

Xen shouldn't need to load any file after multiboot2 entry point. The
needed files would already be in memory with pointers to them passed.
If you insist on being able to load directly from EFI, then IMO the best
way is to have a PE executable with one of sections containing Xen and
code which would load remaining files to memory and call common entry point.




signature.asc
Description: OpenPGP digital signature


Re: [coreboot] [PATCH] x86: add coreboot framebuffer support

2014-09-05 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko

> I'm not a fan of Coreboot having invented its own nonstandard hacks, but
> I guess it is pretty much unavoidable.
It's completely avoidable. The stub can copy this information to
standard framebuffer info structure. The only missing thing is to apply
patch by cjwatson or mjg59 (I'm not sure now who wrote it) for having an
ID for linear framebuffer which implies no specific hardware (it can be
any) or firmware (coreboot doesn't provide nay additional info or callback).

Please don't apply this patch it will break SeaBIOS booting with VGABIOS.



signature.asc
Description: OpenPGP digital signature


Re: [coreboot] [PATCH] x86: add coreboot framebuffer support

2014-09-05 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 06.09.2014 00:18, ron minnich wrote:
> Vladimir can you point me to that patch? This sounds interesting.
> 
https://lkml.org/lkml/2010/8/25/190

> ron
> 




signature.asc
Description: OpenPGP digital signature


Re: [coreboot] [PATCH] x86: add coreboot framebuffer support

2014-09-05 Thread Vladimir &#x27;φ-coder/phcoder' Serbinenko
On 06.09.2014 00:31, H. Peter Anvin wrote:
> On 09/05/2014 02:23 PM, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
>> On 06.09.2014 00:18, ron minnich wrote:
>>> Vladimir can you point me to that patch? This sounds
>>> interesting.
>>>
>> https://lkml.org/lkml/2010/8/25/190
>>
> 
> I believe *most* of this patch has already gotten merged as we now use
> simplefb on x86 as well.  So all we need is probably the ID.
>
Yes, efifb has the same semantics. We just need some ID with clear
documentation saying sth like "implies framebuffer without anything
else" so that noone will get an idea to plug e.g. vga hooks into it.

>   -hpa
> 
> 




signature.asc
Description: OpenPGP digital signature