Adventures in RSS Curation
#I worked more on the RSS triage tool, abstracting both the star/save targets (currently have Inkwell API, pinboard, and Readwise) and the feed sources (currently just Inkwell API and Readwise feed items).
Readwise is a nice addition, because its API tracks your percent completed in an article, and the triage tool is watching for that to go to “100” before it considers an article a good recommendation. Inkwell just had “read/unread” via whatever RSS reader, which sets things to “read” on open (or requires a manual mark-read action). There’s also some interesting signal to be mined from Readwise partial reads, too, so that’s on the list.
Honestly, this could just be called “The Oregonianator,” because most feeds I bother with aren’t so godawful on signal-to-noise. I might skip things because they’re not my cup of tea on another feed, but the Oregonian’s catch-all RSS seems designed to get people to swear off RSS: So much filler, junk, clickbait, and syndicated content. When I wrote the outgoing EiC last year she offered a half-hearted defense of it all that sounded more like a hostage statement, vaguely alluding to the idea that the O’s web operation isn’t beholden to the editorial side. It feels like it’s all premised on a 15-year-old conception of SEO that requires high-churn content. It feels familiar to me because I had to do it for a while.
Architecturally, I could reduce this to a worker on a cron job that passes each article through user rules, then passes the survivors through to beefed up inference. Unlike RSS feeds, which have no state, Readwise would support this approach because things it finds in RSS become articles that can be removed from your queue. The other idea I considered was creating and serving a shadow Oregonian feed that’s been through triage.
For now, though, after a few days of training, I’ve pivoted the tool’s primary interface to a review system: There’s enough training data that the tool’s 0.1 - 1.0 confidence scale is pretty good at south of 0.15 and north of 0.7. So I let the app automatically remove articles at 0.15 or lower, and automatically promote 0.7 or higher. I’ll keep on training and see what I can do to narrow the band. There’s also a queue where I can fish a mistaken deletion out within 24 hours. The traditional triage queue remains, as well, for things that fall in between those two scores.
It makes me want to add more sources, because I’ve got a better way to triage them.