On Tuesday, April 24, 2018 at 12:54:43 AM UTC+5:30, MRAB wrote: > On 2018-04-23 18:24, Hac4u wrote: > > I have a raw data of size nearly 10GB. I would like to find a text string > > and print the memory address at which it is stored. > > > > This is my code > > > > import os > > import re > > filename="filename.dmp" > > read_data=2**24 > > searchtext="bd:mongo:" > > he=searchtext.encode('hex') > > with open(filename, 'rb') as f: > > while True: > > data= f.read(read_data) > > if not data: > > break > > elif searchtext in data: > > print "Found" > > try: > > offset=hex(data.index(searchtext)) > > print offset > > except ValueError: > > print 'Not Found' > > else: > > continue > > > > > > The address I am getting is > > #0x2c0900 > > #0xb62300 > > > > But the actual positioning is > > # 652c0900 > > # 652c0950 > > > Here's a version that handles overlaps. > > Try to keep in mind the distinction between bytestrings and text > strings. It doesn't matter as much in Python 2, but it does in Python 3. > > > filename = "filename.dmp" > chunk_size = 2**24 > search_text = b"bd:mongo:" > chunk_start = 0 > offset = 0 > search_length = len(search_text) > overlap_length = search_length - 1 > data = b'' > > with open(filename, 'rb') as f: > while True: > # Read in more data. > data += f.read(chunk_size) > if not data: > break > > # Search this chunk. > while True: > offset = data.find(search_text, offset) > if offset < 0: > break > > print "Found at", hex(chunk_start + offset) > offset += search_length > > # We've searched this chunk. Discard all but a portion of overlap. > chunk_start += len(data) - overlap_length > > if overlap_length > 0: > data = data[-overlap_length : ] > else: > data = b'' > > offset = 0
Thanks alot for the code. I have two questions 1. Why did u use overlap. And, In what condition it can be counted on? 2. Your code does not end. It keep on looking for sth ..Though it worked well. So, Thanks alot for the code. Here is my modified code(taken help from your code) import os import re filename="filename.dmp" read_data=2**24 offset=0 chunk_start=0 searchtext=b"bd:mongo:" search_length=len(searchtext) overlap_length = search_length - 1 he=searchtext.encode('hex') with open(filename, 'rb') as f: while True: data= f.read(read_data) if not data: break while True: offset=data.find(searchtext,offset) # print offset if offset < 0: break print "Found at",hex(chunk_start+offset) offset+=search_length chunk_start += len(data) data=data[read_data:] offset=0 -- https://mail.python.org/mailman/listinfo/python-list