Hi,

On 14/02/2023 21:59, Flávio. wrote:
I'll look into that Linux option :)  as for the save method, I used the show method on the object and it had a path in the temp directory. So I asked ChatGPT how the file could be on disk and it told me that the save method created it. If you're right then it's just another hallucination by the model. I use it to teach me, as I'm learning alone. Thanks for the replies 🙂

The show() method very likely saves it to a temporary path just for the purpose of showing you the image. I'm pretty certain that the code you mailed (modified from mine) doesn't save the file to disk.

And yes, tmpfs is another option.

Let's take it off list if you have any further questions not related specifically to Tesseract. :-)

Regards,
Merlijn

On Tuesday, February 14, 2023 at 5:49:45 PM UTC-3 Merlijn Wajer wrote:

    Hi,

    On 14/02/2023 21:16, Flávio. wrote:
     > Thanks, i'm still trying to figure it out. It seems when the PIL
    image
     > is saved, unfortunately it saves a temp file to disk. My goal is
    to not
     > write to disk, because this application will read a lot of files
    and I
     > want to spare my SSD. My code receives byte data from a Dart
    program (I
     > checked it is correct).   So far the py file looks like this but
    i'm not
     > getting anything in return.

    At this point I maybe ought to reply off list, but you could also save
    the images to a "tmpfs" on Linux if you don't want to deal with stdin
    and have the images never hit the disk/ssd:
    https://www.kernel.org/doc/html/v5.7/filesystems/tmpfs.html
    <https://www.kernel.org/doc/html/v5.7/filesystems/tmpfs.html>

    In any case - when you write that PIL 'saves a temp file to disk', what
    lead you to that conclusion? The code below really shouldn't do that.

    Regards,
    Merlijn

     > def main():
     >     base64_image = sys.stdin.read()
     >     image_bytes = base64.b64decode(base64_image)
     >     with io.BytesIO(image_bytes) as input:
     >         pil_image = Image.open(input)
     >         with io.BytesIO() as output:
     >             pil_image.save(output, format='PNG', compress=0,
     > compress_level=0)  # using disk!
     >             output.seek(0)
     >
     >             env = os.environ.copy()
     >             env['OMP_THREAD_LIMIT'] = '1'
     >
     >             p = subprocess.Popen([tesseractPath, '-',
    '-','-l','por'],
     >                                  stdin=subprocess.PIPE,
     >                                  stdout=subprocess.PIPE,
     >                                  stderr=subprocess.PIPE,
     >                                  env=env)
     >             output, stderr = p.communicate(output.read())
     >             stderr = stderr.decode('utf-8')
     >
     >             if stderr:
     >                 logger.warning('tesseract_baselines stderr: %s',
    stderr)
     >             else:
     >                 sys.stdout(output.encode('utf-8').strip())
     >
     >
     > if __name__ == '__main__':
     >      main()
     >
     >
     > On Tuesday, February 14, 2023 at 4:11:13 PM UTC-3 Merlijn Wajer
    wrote:
     >
     >
     > Hi,
     >
     > On 14/02/2023 19:10, Flávio. wrote:
     > > Sorry, how can I do that?  I'm trying to send image binary data,
     > not a
     > > path. The goal is to not write a file to disk and use only
     > memory. Could
     > > you please write a code that sends the data (binary) to the
    stdin of
     > > tesseract? it can be in Python, Dart or Java :(  I've tried
     > ChatGPT but
     > > it is wrong and gets lost
     >
     > Normally I'd say 'left as an exercises to the reader' but I so
     > happen to
     > have a snippet around that ought to give you a general idea.
     >
     > This uses io.BytesIO in Python 3 to save the image (stream) to, it
     > contains an uncompressed PNG (compression will just slow things
    down).
     > It assumes that the variable "pil_image" contains a PIL.Image
    object.
     >
     > The code to use just one core in Tesseract is of course entirely
     > optional. I didn't *test* this to work (I modified it a bit - it
    works
     > in another setting), but it should work in theory:
     >
     > > with io.BytesIO() as output:
     > > pil_image.save(output, format='PNG', compress=0, compress_level=0)
     > > output.seek(0)
     > >
     > > # Let's just use one core in tesseract
     > > env = os.environ.copy()
     > > env['OMP_THREAD_LIMIT'] = '1'
     > >
     > > p = subprocess.Popen(['tesseract', '-', '-'],
     > > stdin=subprocess.PIPE,
     > > stdout=subprocess.PIPE,
     > > stderr=subprocess.PIPE,
     > > env=env)
     > > output, stderr = p.communicate(output.read())
     > > stderr = stderr.decode('utf-8')
     > >
     > > if stderr:
     > > logger.warning('tesseract_baselines stderr: %s', stderr)
     >
     >
     > Regards,
     > Merlijn
     >
     > --
     > You received this message because you are subscribed to the Google
     > Groups "tesseract-ocr" group.
     > To unsubscribe from this group and stop receiving emails from it,
    send
     > an email to [email protected]
     > <mailto:[email protected]>.
     > To view this discussion on the web visit
     >
    
https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com
    
<https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com>

     >
    
<https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com?utm_medium=email&utm_source=footer
    
<https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com?utm_medium=email&utm_source=footer>>.


--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/14c7e227-7f84-4e0d-91c6-f5f257a07cd6n%40googlegroups.com <https://groups.google.com/d/msgid/tesseract-ocr/14c7e227-7f84-4e0d-91c6-f5f257a07cd6n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/776cad87-9eb8-d6d8-7caf-c4fef89c3ef1%40archive.org.

Reply via email to