On 2/25/2012 3:30 PM, sandeep kiran p wrote:
MSDN says

" To enumerate the heap or module states for all processes, specify TH32CS_SNAPALL and set /th32ProcessID/ to zero. "

So it presumably does the heap and module walk for all processes and not only for the current process.

Aha!  Missed that detail in this hard-to-read code.  I had
enough trouble untangling the crazy run-on lines and the
unconventional naming of function pointers very differently
than the pointed-to functions, not to mention the lack of
comments clarifying why it doesn't check for lack of a
pointer to the snapshot close function (there is a reason,
several pages further down in the code, but still no comment).
Do you think *CreateToolhelp32Snapshot's* lock on the read-only snapshot could be a possible culprit?
That was the guess, but just a guess, hard to know without
spending several days reverse engineering that particular
version of the heap code in ntdll .

I am now thinking about removing the calls to Heap32First and Heap32Next in rand_win.c and look for alternate sources of entropy.

Thanks for you help.

Regards
Sandeep

On Sat, Feb 25, 2012 at 2:38 AM, Jakob Bohm <jb-open...@wisemo.com <mailto:jb-open...@wisemo.com>> wrote:

    On 2/24/2012 2:14 PM, sandeep kiran p wrote:

        You mentioned that OpenSSL is holding a "snapshot" lock in
        rand_win.c. I couldn't find anything like that in that file.
        Can you specifically point me to the code that you are
        referring to? I would also like to get an opinion on possible
        workarounds that I can enforce to avoid the deadlock.

    In OpenSSL 1.0.0 it is line 486 which says

            module_next && (handle = snap(TH32CS_SNAPALL,0))

    where snap is a pointer to KERNEL32.CreateToolhelp32Snapshot()


        1. Can I remove the heap traversal routines Heap32First and
        Heap32Next? Will it badly affect the PRNG output later on?

    It depends how good the other sources of random numbers are,
    more below.


        2. Can I replace Heap32First and Heap32Next calls with any
        other sources of entropy? What if I make a call to
        CryptGenRandom again in place of the heap traversal routines?

    Calling CryptGenRandom() twice isn't going to help much.

    If CryptGenRandom() is as good as it is "supposed to" be,
    the other entropy sources are not really needed.  But if
    CryptGenRandom() is somehow broken or untrustworthy,
    calling it a million times wouldn't help.

    Anyway, I have my doubts about the value of using the local
    heap walking functions as a source of entropy, as they
    reflect only the state of your own process.  Pretending that
    the address and size of each malloc()-ed memory block in
    your process contributes 3 to 5 bytes of additional entropy
    (which is what the comments say) is wildly optimistic and
    quite unrealistic.

    In a long-running web browser or a similarly long running
    web server, the net total of the memory layout effects of
    thousands of semi-chaotic previous network requests and
    user actions might contribute a total of 10 to 50 bits of
    entropy.  But in a typical freshly started process, the
    layout is going to be pretty deterministic (if the OS
    uses address layout randomization, it probably does so
    based on entropy sources already incorporated into its
    standard random source, i.e. CryptGenRandom() on Windows).


        3. Any other possible ways out?

        Thanks,
        Sandeep

        On Thu, Feb 23, 2012 at 10:08 PM, Jakob Bohm
        <jb-open...@wisemo.com <mailto:jb-open...@wisemo.com>
        <mailto:jb-open...@wisemo.com <mailto:jb-open...@wisemo.com>>>
        wrote:

           From the evidence given, I would *almost* certainly
        characterize
           this as a deadlock bug in ntdll.dll, the deepest, most trusted
           user mode component of Windows!

           Specifically, nothing should allow regular user code such as
           OpenSSL to hold onto NT internal critical sections while not
           running inside NTDLL, and NTDLL should be designed not to
           deadlock against itself.

           There is one other possibility though:

           The OpenSSL code in rand_win.c holds on to a "snapshot" lock
           on some of the heap data while walking it.  It may be doing
           this in a way not permitted by the rules that are presumed
           by the deadlock avoidance design of the speed critical heap
           locking code.


           On 2/23/2012 2:11 PM, sandeep kiran p wrote:

               Hi,

               OpenSSL Version: 0.9.8o
               OS : Windows Server 2008 R2 SP1

               I am seeing a deadlock in a windows application between two
               threads, one thread calling Heap32First from OpenSSL's
               RAND_poll and the other that allocates memory over the
        heap.

               Here is the relevant stack trace from both the threads
               involved in deadlock.

               Thread 523
               ----------------
               ntdll!ZwWaitForSingleObject+a
               ntdll!RtlpWaitOnCriticalSection+e8
               ntdll!RtlEnterCriticalSection+d1
               ntdll!RtlpAllocateHeap+18a6
               ntdll!RtlAllocateHeap+16c
               ntdll!RtlpAllocateUserBlock+145
               ntdll!RtlpLowFragHeapAllocFromContext+4e7
               ntdll!RtlAllocateHeap+e4
               ntdll!RtlInitializeCriticalSectionEx+d2
               ntdll!RtlpActivateLowFragmentationHeap+181
               ntdll!RtlpPerformHeapMaintenance+27
               ntdll!RtlpAllocateHeap+1819
               ntdll!RtlAllocateHeap+16c


               Thread 454
               -----------------
               ntdll!NtWaitForSingleObject+0xa
               ntdll!RtlpWaitOnCriticalSection+0xe8
               ntdll!RtlEnterCriticalSection+0xd1
               ntdll!RtlLockHeap+0x3b
               ntdll!RtlpQueryExtendedHeapInformation+0xf4
               ntdll!RtlQueryHeapInformation+0x3c
               ntdll!RtlQueryProcessHeapInformation+0x3ad
               ntdll!RtlQueryProcessDebugInformation+0x3b0
               kernel32!Heap32First+0x71

               WinDBG reports that thread 523 and 454 both hold locks
        and are
               waiting for each other locks thereby resulting in a
        deadlock.

               On searching, I have found a couple instances where such an
               issue has been reported with Heap32Next on Windows 7 but
               haven't found anything that helps me solve the problem.
        Most
               of the references I found conclude that this could be
        because
               of a possible bug in heap traversal APIs. If someone
        has faced
               a similar problem, can you guide me to possible
        workarounds by
               which I can avoid the deadlock? Can I remove the heap
               traversal routines and find some other sources of entropy?

               Thanks for your help.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  http://www.wisemo.com
Transformervej 29, 2730 Herlev, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to