Re: pdf to spreadsheet advice

2010-09-06 Thread Jim Gibson
At 5:30 PM -0700 9/6/10, Matt Johnson wrote: Hello, I periodically receive pdf's with a table of member names, addresses, etc in a badly formated hard to read pdf. I would like to open the pdf, extract the data, do a little re-organizing and write it to an excel spreadsheet. Perl seems like the

pdf to spreadsheet advice

2010-09-06 Thread Matt Johnson
Hello, I periodically receive pdf's with a table of member names, addresses, etc in a badly formated hard to read pdf. I would like to open the pdf, extract the data, do a little re-organizing and write it to an excel spreadsheet. Perl seems like the best way to do this. I have searched CPAN and

Re: Workaround to a Unicode bug needed

2010-09-06 Thread Pierre Nugues
Dear Shawn, Thank you for you answer. However, this does not seem to work. I used two versions of Perl, the standard Mac installation 5.8.8 and the Active Perl 5.12.1 and neither produces the correct output. Here is what the output should be, one word per line. I only show the first words. Some

Re: Workaround to a Unicode bug needed

2010-09-06 Thread Shawn H Corey
On Mon, 2010-09-06 at 15:10 +0200, Pierre Nugues wrote: > > I wrote a simple tokenizer for texts containing Latin9 characters. It > does not behave as expected with the Swedish text below and I would > like to find a workaround. Add these lines to top of your program: use strict; use warnings;

Workaround to a Unicode bug needed

2010-09-06 Thread Pierre Nugues
Dear All, I wrote a simple tokenizer for texts containing Latin9 characters. It does not behave as expected with the Swedish text below and I would like to find a workaround. More precisely, Perl does not remove properly the Swedish quotes: ยป (RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK, U+00BB

Re: Set time in the file

2010-09-06 Thread John W. Krahn
sekhar wrote: Hi All, Hello, I am searching regular expression for below problem. I have a large input file, contains lines of this sort 00:02:58,262 --> 00:03:01,473 00:03:05,561 --> 00:03:07,771 ie. hh:mm:ss,no --> hh:mm:ss,no Problem is here, there is a time difference so I need to