On 2018-04-23 18:24, Hac4u wrote:
I have a raw data of size nearly 10GB. I would like to find a text string and 
print the memory address at which it is stored.

This is my code

import os
import re
filename="filename.dmp"
read_data=2**24
searchtext="bd:mongo:"
he=searchtext.encode('hex')
with open(filename, 'rb') as f:
     while True:
         data= f.read(read_data)
         if not data:
             break
         elif searchtext in data:
             print "Found"
             try:
                 offset=hex(data.index(searchtext))
                 print offset
             except ValueError:
                 print 'Not Found'
         else:
             continue


The address I am getting is
#0x2c0900
#0xb62300

But the actual positioning is
# 652c0900
# 652c0950

Here's a version that handles overlaps.

Try to keep in mind the distinction between bytestrings and text strings. It doesn't matter as much in Python 2, but it does in Python 3.


filename = "filename.dmp"
chunk_size = 2**24
search_text = b"bd:mongo:"
chunk_start = 0
offset = 0
search_length = len(search_text)
overlap_length = search_length - 1
data = b''

with open(filename, 'rb') as f:
    while True:
        # Read in more data.
        data += f.read(chunk_size)
        if not data:
            break

        # Search this chunk.
        while True:
            offset = data.find(search_text, offset)
            if offset < 0:
                break

            print "Found at", hex(chunk_start + offset)
            offset += search_length

        # We've searched this chunk. Discard all but a portion of overlap.
        chunk_start += len(data) - overlap_length

        if overlap_length > 0:
            data = data[-overlap_length : ]
        else:
            data = b''

        offset = 0

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to