Hi,

On 14/02/2023 21:16, Flávio. wrote:
Thanks, i'm still trying to figure it out. It seems when the PIL image is saved, unfortunately it saves a temp file to disk. My goal is to not write to disk, because this application will read a lot of files and I want to spare my SSD. My code receives byte data from a Dart program (I checked it is correct).   So far the py file looks like this but i'm not getting anything in return.

At this point I maybe ought to reply off list, but you could also save the images to a "tmpfs" on Linux if you don't want to deal with stdin and have the images never hit the disk/ssd: https://www.kernel.org/doc/html/v5.7/filesystems/tmpfs.html

In any case - when you write that PIL 'saves a temp file to disk', what lead you to that conclusion? The code below really shouldn't do that.

Regards,
Merlijn

def main():
     base64_image = sys.stdin.read()
     image_bytes = base64.b64decode(base64_image)
     with io.BytesIO(image_bytes) as input:
         pil_image = Image.open(input)
         with io.BytesIO() as output:
            pil_image.save(output, format='PNG', compress=0, compress_level=0)  # using disk!
             output.seek(0)

             env = os.environ.copy()
             env['OMP_THREAD_LIMIT'] = '1'

             p = subprocess.Popen([tesseractPath, '-', '-','-l','por'],
                                  stdin=subprocess.PIPE,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE,
                                  env=env)
             output, stderr = p.communicate(output.read())
             stderr = stderr.decode('utf-8')

             if stderr:
                 logger.warning('tesseract_baselines stderr: %s', stderr)
             else:
                 sys.stdout(output.encode('utf-8').strip())


if __name__ == '__main__':
      main()


On Tuesday, February 14, 2023 at 4:11:13 PM UTC-3 Merlijn Wajer wrote:


    Hi,

    On 14/02/2023 19:10, Flávio. wrote:
     > Sorry, how can I do that?  I'm trying to send image binary data,
    not a
     > path. The goal is to not write a file to disk and use only
    memory. Could
     > you please write a code that sends the data (binary) to the stdin of
     > tesseract? it can be in Python, Dart or Java :(  I've tried
    ChatGPT but
     > it is wrong and gets lost

    Normally I'd say 'left as an exercises to the reader' but I so
    happen to
    have a snippet around that ought to give you a general idea.

    This uses io.BytesIO in Python 3 to save the image (stream) to, it
    contains an uncompressed PNG (compression will just slow things down).
    It assumes that the variable "pil_image" contains a PIL.Image object.

    The code to use just one core in Tesseract is of course entirely
    optional. I didn't *test* this to work (I modified it a bit - it works
    in another setting), but it should work in theory:

     > with io.BytesIO() as output:
     > pil_image.save(output, format='PNG', compress=0, compress_level=0)
     > output.seek(0)
     >
     > # Let's just use one core in tesseract
     > env = os.environ.copy()
     > env['OMP_THREAD_LIMIT'] = '1'
     >
     > p = subprocess.Popen(['tesseract', '-', '-'],
     > stdin=subprocess.PIPE,
     > stdout=subprocess.PIPE,
     > stderr=subprocess.PIPE,
     > env=env)
     > output, stderr = p.communicate(output.read())
     > stderr = stderr.decode('utf-8')
     >
     > if stderr:
     > logger.warning('tesseract_baselines stderr: %s', stderr)


    Regards,
    Merlijn

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com <https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/eefc7136-dd31-cdc8-4a54-4e69c265c1df%40archive.org.

Reply via email to