If you do something like this for i in $(pandoc --list-output-formats); do pandoc -f docx -t $i -o test.$i Now\ they\ want\ us\ to\ charge\ our\ electric\ cars\ from\ litter\ bins.docx; done
you get approximately 65 formats, from which you can pick one which you can write a little parser for. The dokuwiki one for example uses long lines which makes parsing easier. el On 2023-12-30 13:57 , Andy wrote: > Good idea, El - thanks. > > The link is > https://docs.google.com/document/d/1QwuaWZk6tYlWQXJ3WLczxC8Cda6zVERk/edit?usp=sharing&ouid=103065135255080058813&rtpof=true&sd=true > > This is helpful. > > From the article, which is typical of Lexis+ output, I want to > extract the following fields and append to a Calc/ Excel spreadsheet. > Given the volume of articles I have to work through, if this can be > iterative and semi-automatic, that would be a god send and I might be > able to do some actual research on the articles before I reach my > pensionable age. :-) > > Title Newspaper Date Section and page number Length Byline Subject > (only if the threshold of coverage for a specific subject is >> =50% is reached (e.g. Greenwashing (51%)) - if not, enter 'nil' and >> > move onto the next article in the folder > > This is the ambition. I am clearly a long way short of that though. > > Many thanks. Andy ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.