On Jul 7, 2012, at 11:11 AM, David M. wrote:

> I've got a web-app currently partially working. The user uploads a .txt,
> .docx or .doc file to the server.
> 
> Currently the model handles those files, saves some metadata (the
> extention and orig filename) then saves the file to the hard drive. Next
> it converts the doc and docx files to plain text and saves the output to
> a txt file.
> 
> My problem is I want to copy the plain text contents of those txt files
> to the :body field in my database, but by the time those files are
> written no more changes can be sent to the data base (because all the
> file handling is done in after_save)
> 
> Where or how do I sanely get the contents of those TXT files into the
> database?

I built this feature in my first commercial Rails app. I used Paperclip for my 
file storage, which offers its own callback called 'after_post_process' that 
worked out perfectly for me.

First, I created a Paperclip processor to extract the text version of the 
uploaded file (mine were all PDF). 

# /lib/paperclip_processors/text.rb

module Paperclip
  # Handles extracting plain text from PDF file attachments
  class Text < Processor

    attr_accessor :whiny

    # Creates a Text extract from PDF
    def make
      src = @file
      dst = Tempfile.new([@basename, 'txt'].compact.join("."))
      command = <<-end_command
        "#{ File.expand_path(src.path) }"
        "#{ File.expand_path(dst.path) }"
      end_command

      begin
        success = Paperclip.run("/usr/bin/pdftotext -nopgbrk", 
command.gsub(/\s+/, " "))
        Rails.logger.info "Processing #{src.path} to #{dst.path} in the text 
processor."
      rescue PaperclipCommandLineError
        raise PaperclipError, "There was an error processing the text for 
#{@basename}" if @whiny
      end
      dst
    end
  end
end

Then in my document.rb (model for the file attachment), I added the following 
bits:

  has_attached_file :pdf,:styles => { :text => { :fake => 'variable' } }, 
:processors => [:text]

  after_post_process :extract_text


  private
  def extract_text
    file = File.open("#{pdf.queued_for_write[:text].path}","r")
    plain_text = ""
    while (line = file.gets)
      plain_text << Iconv.conv('ASCII//IGNORE', 'UTF8', line)
    end
    self.plain_text = plain_text
  end

And that was that. 

Walter

> 
> See model attached:
> 
> Attachments:
> http://www.ruby-forum.com/attachment/7574/doc_file.rb
> 
> 
> -- 
> Posted via http://www.ruby-forum.com/.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Ruby on Rails: Talk" group.
> To post to this group, send email to rubyonrails-talk@googlegroups.com.
> To unsubscribe from this group, send email to 
> rubyonrails-talk+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/rubyonrails-talk?hl=en-US.
> 

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to 
rubyonrails-talk+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en-US.

Reply via email to