Suggest Edits

Asset Scraping


By default, Assets in Talk have their metadata scraped when they are loaded. This provides the easiest way for newsrooms to integrate their CMS’s into Talk in a simple way. We use the following meta tags on the target pages that allow us to extract some properties.

Asset scraping is performed by the scraper job which is enabled by default when you launch Talk. If your production site is behind a paywall or otherwise prevents scraping, you might need to confiugre a TALK_SCRAPER_PROXY_URL or custom TALK_SCRAPER_HEADERS.

Asset Property Selector
title See metascraper-title
description See metascraper-description
image See metascraper-image
author See metascraper-author
publication_date See metascraper-date
modified_date meta[property="article:modified"]
section meta[property="article:section"]

You can use the ./bin/cli assets debug <url> command to print the scraped metadata from that URL. For example:

 $ ./bin/cli assets debug https://www.washingtonpost.com/technology/2018/10/30/apple-event-october-ipad-pro-macbook-air/
┌──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Property         │ Value                                                                                                                                                                            │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ title            │ Apple redesigns the iPad Pro, breathes new life in the MacBook Air                                                                                                               │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ description      │ Apple is unveiling new iPads and MacBooks at an event in New York starting at 10 a.m. Fowler is there and will report in with the news and hands-on analysis throughout the day. │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ image            │ https://www.washingtonpost.com/resizer/JAwNQE2alL2JjiWrbXeJ46wZHqA=/1484x0/arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/G5TWBFW4LAI6RC5MX7QB7TODUY.jpg          │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ author           │ Geoffrey A. Fowler                                                                                                                                                               │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ publication_date │ 2018-10-30T10:40:00.000Z                                                                                                                                                         │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ modified_date    │                                                                                                                                                                                  │
├──────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ section          │                                                                                                                                                                                  │
└──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

You can use the ./bin/cli assets refresh [age] to trigger scraping or rescrape assets where the scraper job was unsuccessful.