Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

Dr Eberhard Lisse Wed, 03 Jan 2024 04:26:57 -0800

If you do something like this

        for i in  $(pandoc --list-output-formats);
                do pandoc -f docx -t $i -o test.$i Now\ they\ want\ us\ to\ 
charge\
        our\ electric\ cars\ from\ litter\ bins.docx;
        done

you get approximately 65 formats, from which you can pick one which you can
write a little parser for. The dokuwiki one for example uses long lines
which
makes parsing easier.

el

On 2023-12-30 13:57 , Andy wrote:
> Good idea, El - thanks.
>
> The link is
> https://docs.google.com/document/d/1QwuaWZk6tYlWQXJ3WLczxC8Cda6zVERk/edit?usp=sharing&ouid=103065135255080058813&rtpof=true&sd=true
>
>  This is helpful.
>
> From the article, which is typical of Lexis+ output, I want to
> extract the following fields and append to a Calc/ Excel spreadsheet.
> Given the volume of articles I have to work through, if this can be
> iterative and semi-automatic, that would be a god send and I might be
> able to do some actual research on the articles before I reach my
> pensionable age. :-)
>
> Title Newspaper Date Section and page number Length Byline Subject
> (only if the threshold of coverage for a specific subject is
>> =50% is reached (e.g. Greenwashing (51%)) - if not, enter 'nil' and
>>
> move onto the next article in the folder
>
> This is the ambition. I am clearly a long way short of that though.
>
> Many thanks. Andy

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

Reply via email to