Hi Eric,

Peter wrote...

>> The Journal site (journal.code4lib.org) is a lightly modified WordPress 
>> site...

In case it's at all helpful, that means you can access the site via the 
wordpress-api:
<https://developer.wordpress.org/rest-api/>

...which shows you can do things like:
<http://journal.code4lib.org/wp-json/>

...and:
<http://journal.code4lib.org/wp-json/wp/v2/posts>

...and likely oodles of other things I'm not familiar with.

-b
---
Birkin James Diana
Digital Technologies Developer
Brown University Library
[email protected]


> On May 28, 2019, at 11:15 AM, Eric Lease Morgan <[email protected]> wrote:
> 
> On May 28, 2019, at 11:01 AM, Peter Murray <[email protected]> wrote:
> 
>> The Journal site (journal.code4lib.org) is a lightly modified WordPress 
>> site, and the indexing is whatever comes with WordPress. (I would guess it 
>> renders the HTML to flat text with no regard for authorship and reference 
>> sections.)  The issue is a WordPress category, the date is the WordPress 
>> post date (I think), and Title is the WordPress title.  Author is a field we 
>> added to WordPress, and it is just a text field (authors are undistinguished 
>> in the field).  Abstract is the WordPress summary.  I think the RSS feed 
>> from the Journal might be a good place to get much of the information, 
>> although in some cases (like Author), further processing would be required.  
>> We also submit metadata to DOAJ (https://doaj.org/toc/1940-5758), the basis 
>> of which comes from a custom plugin; see, for example, 
>> http://journal.code4lib.org/issues/issue44/feed/doaj. (The coordinating 
>> editor downloads that file, manually checks/corrects XML errors, and uploads 
>> it too DOAJ.)
> 
> 
> Peter, thank you, and at first glance a more through indexing process would 
> be to:
> 
>  1. regularly retrieve the feed/doaj file
>  2. parse it
>  3. save the result as metadata
>  4. harvest full text
>  5. index full text this way, that way, and the other way
>  6. associate the result of Step #5 with the result of Step #3
>  7. present the result
> 
> Hmmm... Interesting, and again, thank you.
> 
> --
> Eric Morgan

Reply via email to