Facebook is demoting trashy news publishers and other websites that illicitly scrape and republish content from other sources with little or no modification. Today it exclusively told TechCrunch that it will show links less prominently in the News Feed if they have a combination of this new signal about content authenticity along with either clickbait headlines or landing pages overflowing with low-quality ads. The move comes after Facebook’s surveys and in-person interviews discovered that users hate scraped content.
If ill-gotten intellectual property gets less News Feed distribution, it will receive less referral traffic, earn less ad revenue and there’ll be less incentive for crooks to steal articles, photos and videos in the first place. That could create an umbrella effect that improves content authenticity across the web.
And just in case the scraped profile data stolen from 29 million users in Facebook’s recent massive security breach ended up published online, Facebook would already have a policy in place to make links to it effectively disappear from the feed.
Here’s an example of the type of site that might be demoted by Facebook’s latest News Feed change. “Latest Nigerian News” scraped one of my recent TechCrunch articles, and surrounded it by tons of ads.
“Starting today, we’re rolling out an update so people see fewer posts that ink out to low quality sites that predominantly copy and republish content from other sites without providing unique value. We are adjusting our Publish Guidelines accordingly,” Facebook wrote in an addendum to its May 2017 post about demoting sites stuffed with crappy ads. Facebook tells me the new publisher guidelines will warn news outlets to add original content or value to reposted content or invoke the social network’s wrath.
Personally, I think the importance of transparency around these topics warrants a new blog post from Facebook as well as an update to the original post linking forward to it.
So how does Facebook determine if content is stolen? Its systems compare the main text content of a page with all other text content to find potential matches. The degree of matching is used to predict that a site stole its content. It then uses a combined classifier merging this prediction with how clickbaity a site’s headlines are plus the quality and quantity of ads on the site.