Hi,
On 14/02/2023 21:59, Flávio. wrote:
I'll look into that Linux option :) as for the save method, I used the
show method on the object and it had a path in the temp directory. So I
asked ChatGPT how the file could be on disk and it told me that the save
method created it. If you're right then it's just another hallucination
by the model. I use it to teach me, as I'm learning alone. Thanks for
the replies 🙂
The show() method very likely saves it to a temporary path just for the
purpose of showing you the image. I'm pretty certain that the code you
mailed (modified from mine) doesn't save the file to disk.
And yes, tmpfs is another option.
Let's take it off list if you have any further questions not related
specifically to Tesseract. :-)
Regards,
Merlijn
On Tuesday, February 14, 2023 at 5:49:45 PM UTC-3 Merlijn Wajer wrote:
Hi,
On 14/02/2023 21:16, Flávio. wrote:
> Thanks, i'm still trying to figure it out. It seems when the PIL
image
> is saved, unfortunately it saves a temp file to disk. My goal is
to not
> write to disk, because this application will read a lot of files
and I
> want to spare my SSD. My code receives byte data from a Dart
program (I
> checked it is correct). So far the py file looks like this but
i'm not
> getting anything in return.
At this point I maybe ought to reply off list, but you could also save
the images to a "tmpfs" on Linux if you don't want to deal with stdin
and have the images never hit the disk/ssd:
https://www.kernel.org/doc/html/v5.7/filesystems/tmpfs.html
<https://www.kernel.org/doc/html/v5.7/filesystems/tmpfs.html>
In any case - when you write that PIL 'saves a temp file to disk', what
lead you to that conclusion? The code below really shouldn't do that.
Regards,
Merlijn
> def main():
> base64_image = sys.stdin.read()
> image_bytes = base64.b64decode(base64_image)
> with io.BytesIO(image_bytes) as input:
> pil_image = Image.open(input)
> with io.BytesIO() as output:
> pil_image.save(output, format='PNG', compress=0,
> compress_level=0) # using disk!
> output.seek(0)
>
> env = os.environ.copy()
> env['OMP_THREAD_LIMIT'] = '1'
>
> p = subprocess.Popen([tesseractPath, '-',
'-','-l','por'],
> stdin=subprocess.PIPE,
> stdout=subprocess.PIPE,
> stderr=subprocess.PIPE,
> env=env)
> output, stderr = p.communicate(output.read())
> stderr = stderr.decode('utf-8')
>
> if stderr:
> logger.warning('tesseract_baselines stderr: %s',
stderr)
> else:
> sys.stdout(output.encode('utf-8').strip())
>
>
> if __name__ == '__main__':
> main()
>
>
> On Tuesday, February 14, 2023 at 4:11:13 PM UTC-3 Merlijn Wajer
wrote:
>
>
> Hi,
>
> On 14/02/2023 19:10, Flávio. wrote:
> > Sorry, how can I do that? I'm trying to send image binary data,
> not a
> > path. The goal is to not write a file to disk and use only
> memory. Could
> > you please write a code that sends the data (binary) to the
stdin of
> > tesseract? it can be in Python, Dart or Java :( I've tried
> ChatGPT but
> > it is wrong and gets lost
>
> Normally I'd say 'left as an exercises to the reader' but I so
> happen to
> have a snippet around that ought to give you a general idea.
>
> This uses io.BytesIO in Python 3 to save the image (stream) to, it
> contains an uncompressed PNG (compression will just slow things
down).
> It assumes that the variable "pil_image" contains a PIL.Image
object.
>
> The code to use just one core in Tesseract is of course entirely
> optional. I didn't *test* this to work (I modified it a bit - it
works
> in another setting), but it should work in theory:
>
> > with io.BytesIO() as output:
> > pil_image.save(output, format='PNG', compress=0, compress_level=0)
> > output.seek(0)
> >
> > # Let's just use one core in tesseract
> > env = os.environ.copy()
> > env['OMP_THREAD_LIMIT'] = '1'
> >
> > p = subprocess.Popen(['tesseract', '-', '-'],
> > stdin=subprocess.PIPE,
> > stdout=subprocess.PIPE,
> > stderr=subprocess.PIPE,
> > env=env)
> > output, stderr = p.communicate(output.read())
> > stderr = stderr.decode('utf-8')
> >
> > if stderr:
> > logger.warning('tesseract_baselines stderr: %s', stderr)
>
>
> Regards,
> Merlijn
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it,
send
> an email to [email protected]
> <mailto:[email protected]>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com
<https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com>
>
<https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com?utm_medium=email&utm_source=footer
<https://groups.google.com/d/msgid/tesseract-ocr/ebd9d42d-244c-4a6b-8ab8-c1efd87db501n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/14c7e227-7f84-4e0d-91c6-f5f257a07cd6n%40googlegroups.com
<https://groups.google.com/d/msgid/tesseract-ocr/14c7e227-7f84-4e0d-91c6-f5f257a07cd6n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/776cad87-9eb8-d6d8-7caf-c4fef89c3ef1%40archive.org.