On 2018-04-23 18:24, Hac4u wrote:
I have a raw data of size nearly 10GB. I would like to find a text string and
print the memory address at which it is stored.
This is my code
import os
import re
filename="filename.dmp"
read_data=2**24
searchtext="bd:mongo:"
he=searchtext.encode('hex')
with open(filename, 'rb') as f:
while True:
data= f.read(read_data)
if not data:
break
elif searchtext in data:
print "Found"
try:
offset=hex(data.index(searchtext))
print offset
except ValueError:
print 'Not Found'
else:
continue
The address I am getting is
#0x2c0900
#0xb62300
But the actual positioning is
# 652c0900
# 652c0950
Here's a version that handles overlaps.
Try to keep in mind the distinction between bytestrings and text
strings. It doesn't matter as much in Python 2, but it does in Python 3.
filename = "filename.dmp"
chunk_size = 2**24
search_text = b"bd:mongo:"
chunk_start = 0
offset = 0
search_length = len(search_text)
overlap_length = search_length - 1
data = b''
with open(filename, 'rb') as f:
while True:
# Read in more data.
data += f.read(chunk_size)
if not data:
break
# Search this chunk.
while True:
offset = data.find(search_text, offset)
if offset < 0:
break
print "Found at", hex(chunk_start + offset)
offset += search_length
# We've searched this chunk. Discard all but a portion of overlap.
chunk_start += len(data) - overlap_length
if overlap_length > 0:
data = data[-overlap_length : ]
else:
data = b''
offset = 0
--
https://mail.python.org/mailman/listinfo/python-list