Hi Eric, Peter wrote...
>> The Journal site (journal.code4lib.org) is a lightly modified WordPress >> site... In case it's at all helpful, that means you can access the site via the wordpress-api: <https://developer.wordpress.org/rest-api/> ...which shows you can do things like: <http://journal.code4lib.org/wp-json/> ...and: <http://journal.code4lib.org/wp-json/wp/v2/posts> ...and likely oodles of other things I'm not familiar with. -b --- Birkin James Diana Digital Technologies Developer Brown University Library [email protected] > On May 28, 2019, at 11:15 AM, Eric Lease Morgan <[email protected]> wrote: > > On May 28, 2019, at 11:01 AM, Peter Murray <[email protected]> wrote: > >> The Journal site (journal.code4lib.org) is a lightly modified WordPress >> site, and the indexing is whatever comes with WordPress. (I would guess it >> renders the HTML to flat text with no regard for authorship and reference >> sections.) The issue is a WordPress category, the date is the WordPress >> post date (I think), and Title is the WordPress title. Author is a field we >> added to WordPress, and it is just a text field (authors are undistinguished >> in the field). Abstract is the WordPress summary. I think the RSS feed >> from the Journal might be a good place to get much of the information, >> although in some cases (like Author), further processing would be required. >> We also submit metadata to DOAJ (https://doaj.org/toc/1940-5758), the basis >> of which comes from a custom plugin; see, for example, >> http://journal.code4lib.org/issues/issue44/feed/doaj. (The coordinating >> editor downloads that file, manually checks/corrects XML errors, and uploads >> it too DOAJ.) > > > Peter, thank you, and at first glance a more through indexing process would > be to: > > 1. regularly retrieve the feed/doaj file > 2. parse it > 3. save the result as metadata > 4. harvest full text > 5. index full text this way, that way, and the other way > 6. associate the result of Step #5 with the result of Step #3 > 7. present the result > > Hmmm... Interesting, and again, thank you. > > -- > Eric Morgan
