You can use a user-friendly frontend for Tesseract on Linux.
http://vietocr.sf.net
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroup
I successfully built the tesseract.sln solution without any problem
using VS2008. The size of resultant tesseract.exe is 918KB. It runs
faster than previous versions.
I did not try the tesseract_2008.sln. Is it used in any situation? If
not, consider its removal to avoid any confusion.
On Jun 3,
Tessnet2 2.04 works fine.
On Jun 5, 9:20 am, Remi Thomas wrote:
> Hi,
>
> You can take the .NET wrapper based on version
> 2.04http://www.pixel-technology.com/freeware/tessnet2/bin.zip
> Two modifications.
>
> SetRootPath has been removed and merge with Init
> Init(string tessdataPath, string
10:20 pm, 74yrs old wrote:
> Perhaps you tesseract.exe of 918kb is "debug?" and not "release". I also
> compiled using VS2008,by clicking on tesseract_2008.sln of SVN tree with
> setting to" release". Result is tesseract.exe of 735kb.
>
> On Sa
Sorry, the FontDialog was specifically designed to show only fonts
that contain Vietnamese glyphs. I'll remove the filter to show all
system fonts in the next release.
It's not surprising since VietOCR GUI design was "inspired" by
FreeOCR. ;-)
By the way, what OS and version are you running it o
You can try VietOCR, a frontend program to Tesseract OCR engine.
http://vietocr.sf.net
On Jun 14, 8:00 am, eddiec wrote:
> Hi,
>
> I have some scanned images I am trying to OCR. The images are
> grayscale and scanned from microfilm. The packages I have tried to OCR
> the images with first conve
In what area can VietOCR be improved to make it more suitable? Please
be as specific as possible. Thanks.
On Jun 17, 8:24 am, svaram wrote:
> well, assuming that its for English text , why not to
> suggest the both and leave the choice to the user !!!
>
> personally, I feel vietOCR in its presen
(76yrsold)
>
> On Mon, Jun 15, 2009 at 1:37 AM, nguyenq wrote:
>
> > Sorry, the FontDialog was specifically designed to show only fonts
> > that contain Vietnamese glyphs. I'll remove the filter to show all
> > system fonts in the next release.
>
> > It
ses.
6. You can use the scrollbars of the picturebox to scroll part of the
image into visible view.
Thank you for your comments. I'll try to incorporate any enhancements,
if possible.
Quan
On Jun 19, 1:00 am, svaram wrote:
> ^ Many Thanks nguyenq for the quick response .
>
> And wil
fine.
> With Best Wishes.
> -sriranga(76yrsold)
>
> On Sun, Jun 21, 2009 at 9:19 AM, nguyenq wrote:
>
> > I've just released a new update with numerous improvements. The
> > FontDialog now shows all system fonts. Please check it out.
>
> >http://vietocr.sf.n
You may want to add an entry in ISO639-3.xml file under 'data' folder
to have a user-friendly name for your language code.
On Jun 21, 8:07 am, nguyenq wrote:
> I had no problems recognizing the given image, kannada-
> texttoimage.tif. I can't verify the text because my s
Perfectly understood. I was just trying to understand the requirements
and determine whether they should be implemented or implementable.
On Jun 21, 2:03 am, seshumiyapu...@yahoo.co.in wrote:
> Many Thanks Mr Nguyenq for the detailed response .
>
> perhaps I should have clarified that
layed.
> Even previous version tested - which also gives same problem. In fact I have
> installed jdk 6,.014 and jre6 but still gives trouble? Inspite of added in
> xml as test "entrykey="tel">Telugu does not appear as "Telugu" in
> ocr lang's. window.
&g
Does anyone know where I can find all the possible return (or error)
codes that Tesseract may output, and their corresponding descriptions?
For instance, 0 is returned for successful recognition. What about 1,
29, 31, or others? What do they mean? I want to display meaningful
error messages when T
A Java/.NET GUI frontend for Tesseract OCR engine.
Features:
* Multi-platform (Java version only)
o Windows
o Solaris
o Linux/Unix
o Mac OS X
o Others
* TIFF, JPEG, GIF, PNG, BMP image formats
* Multi-page images
* Selection box
Version 1.2 Beta just uploaded provides support for PDF. The new
feature requires GPL Ghostscript. Please help test and post your
feedback here. Thanks.
http://sourceforge.net/projects/vietocr/files/
On Aug 1, 5:36 pm, nguyenq wrote:
> A Java/.NET GUI frontend for Tesseract OCR eng
The Java frontend should run on any OS that has Java 6 support. It has
been tested on Windows and Linux (Ubuntu and Fedora).
Some images would give Tesseract engine problems; it may be due to
defects in the images.
On Sep 8, 2:26 am, ashwani rawat wrote:
>
> Hello nguyenq
A little bit more info on your configuration (OS, Ghostscript version
installed, etc.) would be helpful in diagnosing the problem. Tks.
Quan
On Sep 8, 7:33 am, ashwani rawat wrote:
>
> Hello,
> I have test this application. But when I give the path of PDF file it give
> me error.
>
> error is
>
RTM.
http://vietocr.sourceforge.net/usage.html
On Sep 9, 10:49 am, Naga raja wrote:
> can any one plz say how to run vietocr in UBUNTU .. just the steps ..
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
"tesseract
Version 1.2 Java & .NET have been released with PDF support
integrated. The feature requires GPL Ghostscript.
http://vietocr.sf.net
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to th
Delete the two small boxes and bring up the top of the big box to
enclose the diacritical mark. The end result would be a single
bounding box.
On Sep 12, 8:49 am, "M. Bashir Al-Noimi" wrote:
> Hi All,
>
> I tried to train Tesseract for new language but I noticed that all
> punctuation characters
Version 1.3 Java & .NET have been released with numerous new features
and improvements.
http://vietocr.sf.net
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email t
Ashwani,
I'm not sure if the program can be configured to work with Java Web
Start. Did you verify that the entire package, including Tesseract and
language data, was downloaded to the target machine? From the error
message, it seems that it had problem loading JNA.
Hope someone with JNLP experi
Please help test version 1.4 Beta (both Java and .NET), which includes
following new features:
* Publish OCR interim results to produce more responsive UI
performance, improving user experience
* Support for cancellation of running OCR tasks
* Merge PDF functionality
http://vietocr.sf.net
--~--~
You can use Notepad++ or jEdit to find and replace characters using
Regex, as with Find "(.)" and Replace "$1 " or "\1 ", without the
quotes.
On Nov 2, 1:14 am, 74yrs old wrote:
> Hi,
>
> I am facing problem how to *automate the function* of inserting space(white
> space) between characters in t
If you look in the tessdata folder at
http://tesseract-ocr.googlecode.com/svn/trunk/
, there currently are more than two dozens.
On Dec 2, 7:44 am, andmor wrote:
> Hi,
>
> Which languages will be supported with the 3.0 release ?
> On the TessearctProjects page, Ray says that Google are working o
University of Nevada, Las Vegas, USA
http://www.isri.unlv.edu/
On Dec 14, 1:13 am, raakeshvara rao wrote:
> Hi,
>
> If I am looking at a career in OCR and Vision research,
> which universities or research institutes in the world would be good
> places to look at (say for a PhD)?
> Especially for
A Java/.NET GUI frontend for Tesseract OCR engine.
The releases include support for:
- execution from command line
- paste image from clipboard (Note: Screenshots or screen captures are
typically only of 96 DPI, a resolution insufficient for OCR
requirements.)
- JPEG2000 and PNM image types (Java
>From the Downloads page:
http://code.google.com/p/tesseract-ocr/downloads/list
On Jan 3, 5:30 pm, Rick wrote:
> Hello. I would like to use freeocr.net (which uses tesseract) for
> German documents. The program only came with files for English.
> Additionally, the link to download more languag
No, you would have to convert PDF to an image before feeding it to the
OCR engine. Ghostscript supports such PDF conversion tasks.
On Jan 4, 7:09 am, Eitan wrote:
> Hi
>
> I am a newbie...
> Is there a standard way to extract text from PDF using tesseract-ocr ?
>
> Thanks
--
You received this m
I think you can set the location of tessdata via the environment
variable TESSDATA_PREFIX for each instance.
On Jan 4, 2:11 am, 76yrsold wrote:
> I have downloaded tesseract versions(all) viz 2.03 , 2,04 and 3.0.
> and all compiled in ubuntu 9.04 individually. and also installed
> Kannada tessd
t; version?
>
> On Tue, Jan 5, 2010 at 12:57 AM, nguyenq wrote:
> > I think you can set the location of tessdata via the environment
> > variable TESSDATA_PREFIX for each instance.
>
> > On Jan 4, 2:11 am, 76yrsold wrote:
> > > I have downloaded tesseract version
tessdata_prefix = "* Sriranga/ *"
> ./configure
> make
> sudo make.
>
> 3)*for version 3.0*
> tessdata_prefix = " *Sriranga/* "
> ./configure
> make
> sudo make.
>
> With regards,
> -sriranga(77yrsold)
>
> On Wed, Jan 6, 2010 at 7:16 A
http://www.pixel-technology.com/freeware/tessnet2/
On Mar 9, 3:07 pm, mouthpiec wrote:
> Hi,
>
> Can you please help me from where I can download the tesseract
> ocr .net wrapper demo?
>
> I need to test this demo because currently I am developing a web-
> service that take an image that contain
I think I heard that this language auto-detection would be available
in v3.0, but I could be wrong.
In 2.0x, you ought to pick the correct language to have good
recognition rates.
On Mar 5, 2:29 am, Tule wrote:
> I'm sorry to hear that.
> This project really has potential. If the project is dead
I don't think tesseract has this capability. Having a language data
specifically trained to a font would absolutely help better the
recognition rates.
On Mar 7, 8:55 am, Jonah wrote:
> Hi,
>
> If we know that our text is all, say, Arial, is it possible to tell
> tesseract this to improve accuracy
This is a known problem for 2.0.x. It's been mentioned numerous times
in the group.
On Mar 8, 11:12 am, Greg wrote:
> Hello -- a fabulous program. One tiny problem: if a filename isn't
> Hello -- a fabulous program. One tiny problem: if a filename isn't
> specified with the extension '.tif', ev
Take a look at this working application which uses tessnet2 library.
http://vietocr.sf.net
On Mar 16, 2:30 pm, dataintelligence wrote:
> You need to install the language data and put that in the 'tessdata
> path' text box. Look in the downloads for the language pack you want
> (the sample image
tessnet2 has a method that accepts as an argument a Rectangle object
that defines a region that you want to recognize. There's no need to
generate subimages.
tessnet2.Tesseract.DoOCR(image, rect)
On Mar 17, 10:47 am, dataintelligence wrote:
> I just realized that I have put this in the wrong gro
The source is available -- you did not look far enough.
On Mar 18, 5:48 am, Sandro Zahra wrote:
> Thanks but this contains only an installer :(
>
> On 18 March 2010 03:37, nguyenq wrote:
>
> > Take a look at this working application which uses tessnet2 library.
>
>
Those services are likely running Tesseract 3.0, whose supported
language packs are located in tessdata folder @
http://tesseract-ocr.googlecode.com/svn/trunk/
On Mar 21, 4:53 pm, elninom wrote:
> I need more language pack like that:
> Bulgarian, Catalan, Czech, Danish, Dutch, English, Finnish,
A Java/.NET GUI frontend for Tesseract OCR engine.
The latest versions have been released with an emphasis on increased
usability, making the application more user friendly.
* Add provision to load UTF-8 text file into textbox
* Add Recent Files submenu
* Add Save button on toolbar
* Fix scale fa
>From the page http://www.pixel-technology.com/freeware/tessnet2/
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "0123456789"); // If digit
only
which is consistent with that described in
http://code.google.com/p/tesseract-ocr/wiki/FAQ
On Apr 16, 4
Hi Pierre,
The FAQ states that the SetVariable must be called before the Init
function.
Regards,
Quan
On Apr 18, 12:50 pm, MARTIN Pierre wrote:
> Dear NGuyenQ,
>
> > From the pagehttp://www.pixel-technology.com/freeware/tessnet2/
> > tessnet2.Tesseract ocr = new
VietOCR v2.0 Beta has been released with Tesseract 3.0 Beta.
http://sourceforge.net/projects/vietocr/files/
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe fro
You can perform some text manipulations in post-processing steps to
strip out diacritical marks to leave only the base ASCII characters
behind.
On May 25, 3:34 pm, haratron wrote:
> http://www.linux.com/archive/feed/57222
> "Also, it can generate output only in the US-ASCII character set, so
> gl
Has anyone had successes in getting the latest baseline, r376, to run
on Windows? The code has compiled clean; however, the program would
crash upon execution.
I incorporated dtorne's patch to tesseractmain.cpp (Issue 304) and
have been able to get it to run and recognize images successfully, but
The problem is usually the other way around, but I'll need the
problematic image to investigate the issue. VietOCR 2.0 (currently in
beta) starts supporting Tesseract 3.0, which supports multi-column
documents. You might want to try that version to see if it would work
better for you.
Quan
On Jun
If your images are at least 200 DPI, you can use VietOCR, which can
accept various common image formats as input -- no conversion to TIFF
is needed.
http://vietocr.sf.net
On Jul 4, 4:47 pm, "fontenot.1031" wrote:
> > You'll need to upscale the image. Before reducing it to
>
> Thanks for respond
After splitting your CamusLetranger.pdf file into 50-page sections, I
fed into VietOCR (2.0 Beta), which uses GhostScript to convert PDF to
PNG format, and got this result which seems acceptable:
I
AuJoURD'HU1, maman est morte. Ou peut-être
hier, je ne sais pas. fai reçu un télégramme
de l'asi1e :
It has been mentioned in one of previous posts that the solution to
this problem is to install Microsoft Visual C++ 2005 SP1
Redistributable Package.
http://www.microsoft.com/downloads/details.aspx?familyid=200b2fd9-ae1a-4a14-984d-389c36f85647&displaylang=en
On Oct 17, 11:26 pm, Hasnat <[EMAIL P
I read somewhere that screenshots would not be good enough for any OCR
because of their low resolutions, 96 DPI I think. Try images with 200
DPI or higher.
On Oct 10, 12:51 am, benj588 <[EMAIL PROTECTED]> wrote:
just screenshots of messageboxes from my system. I was using these
to> > test with a
VietOCR bundles a copy of Tesseract-2.03 Windows executable.
http://sourceforge.net/project/showfiles.php?group_id=230717&package_id=279650
On Oct 26, 12:24 pm, spackmann <[EMAIL PROTECTED]>
wrote:
> All,
>
> I've been searching the group and I found a few (what seemed to be)
> relevant posts on
You will need to first convert PDF to one of the supported image
formats (tif, bmp, etc.).
On Nov 3, 12:30 am, ABB <[EMAIL PROTECTED]> wrote:
> Hi
>
> I have given one pdf image as input ..i got this error in log file:
>
> "read_variables_file:Can't open C:/Documents and Settings/Desktop/
> VietO
Java & .NET GUI frontend for Tesseract OCR engine. Provides character
recognition support for TIFF, JPEG, GIF, PNG, BMP image formats, and
multi-page images.
The latest releases include:
* Integrated scanning support via WIA Automation Library v2.0
* Localization of UI
* Fixed an err
gt; Is it ;possible to use your Java GUI frontend for Kannada also and if so
> how to do?. This is first time I am using Java - which is new to me.
> With Regards,
> -sriranga(76yrsold)
>
>
> On Tue, Nov 4, 2008 at 4:56 AM, nguyenq <[EMAIL PROTECTED]> wrote:
>
>
WinMerge can compare two UTF-8-encoded text files with ease.
http://sourceforge.net/projects/winmerge/
On Nov 3, 11:23 pm, "74yrs old" <[EMAIL PROTECTED]> wrote:
> Hi,
> Whether compare two text files(utf-8) program is available?
> In the output text mispelling of characters with reference to i
so we too will
> benefit from the installation of Tesseract GUI in Java.
>
> Hussein Al-Hussein
>
> Date: Tue, 4 Nov 2008 10:29:15 +0530From: [EMAIL PROTECTED]: [EMAIL
> PROTECTED]: Re: VietOCR v0.9.6 & VietOCR.NET v0.6 ReleasesHi Nguyenq,Thanks
> for the information. I shall
It's bundled with VietOCR (http://sf.net/projects/vietocr), or you can
download just it from
http://vietocr.svn.sourceforge.net/viewvc/vietocr/VietOCR/trunk/tesseract/
.
On Nov 16, 7:22 am, Vetkop <[EMAIL PROTECTED]> wrote:
> Hi do you guys know where I can get the Tesseract2.03 exe? I don't
> k
Take a look at http://vietocr.sf.net and http://code.google.com/p/tesjeract/
.
On Nov 17, 6:04 am, sam <[EMAIL PROTECTED]> wrote:
> i am new to ocr tesseract. i want know how to install tesseract-2.03.
> how to java application.
>
> thanks
> sam
--~--~-~--~~~---~--~---
Some people want to do OCR for Vietnamese classic script, Chữ Nôm,
also, but Tesseract has limitations that make it not possible for CJKV
characters, according to:
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
On Dec 12, 3:24 pm, project2501 wrote:
> Hi all,
> I'm new to this
You can try FreeOCR (.NET) or VietOCR (Java & .NET).
http://www.softi.co.uk/freeocr.htm
http://vietocr.sourceforge.net/
On Jan 4, 2:57 am, sai wrote:
> Hi all
>
> I'm not quite familiar with the pluggin usage of tesseract in my own
> project.
> Including the tesseract's .dll ifle is enough to e
Java & .NET GUI frontend for Tesseract OCR engine, providing character
recognition support for TIFF, JPEG, GIF, PNG, BMP image formats, and
multi-page images.
These releases add watch folder monitor for batch processing.
http://vietocr.sf.net
--~--~-~--~~~---~--~~
Take a look at VietOCR.NET, a C# GUI frontend integrating
Tessnet2 .NET wrapper for Tesseract 2.03 OCR engine.
http://vietocr.sf.net
On Feb 15, 10:47 am, Waspinator wrote:
> Hi,
>
> I would like to create a simple GUI for tesseract-ocr using C#.
>
> I'm not sure how to include the tesseract-ocr
Yes, increase memory for the JVM, something like:
java -Xms128m -Xmx512m -jar yourprogram.jar
Take a look at VietOCR, a Java GUI frontend for Tesseract 2.03 OCR
engine.
http://vietocr.sf.net
On Feb 9, 3:07 pm, Mike wrote:
> I have the following code in Java running the prebuilt tesseract
> wi
gt; help
> Hiral
>
> On Feb 16, 8:28 am, nguyenq wrote:
>
> > Take a look at VietOCR.NET, a C# GUI frontend integrating
> > Tessnet2 .NET wrapper for Tesseract 2.03 OCR engine.
>
> >http://vietocr.sf.net
>
> > On Feb 15, 10:47 am, Waspinator wrote:
>
VietOCR program includes a feature which monitors a watch folder for
new image (not PDF) files and automatically convert them to text
files. You can take a step further by converting those text files to
PDF, which should be searchable.
http://vietocr.sf.net
On Feb 24, 7:44 am, myworld wrote:
>
67 matches
Mail list logo