New API versions released: Blog Search API v3 and Blog LiveFeed API v5

We have released new versions of our Blog APIs. Let me introduce:

Common improvements

Although both Search and LiveFeed have always delivered the same blog posts, either by exposing a search index or a “firehose”, there have been slight differences between the data. With these new versions we have tried to make the APIs more similar and easier to use.

LiveFeed API v5 is now a plain HTTP GET API, just like Search API. No more SOAP.

The post element now includes the same tags for both APIs. It also contains more information about the blog post, than the previous API versions. The new fields added to post are:

  • id - our unique ID of the blog post (new only for Search v3)
  • blogId - our unique ID of the blog
  • author - the author of the blog post
  • locationCode - the location of the blog
  • inlinksCount - number of links found in other blog posts (only posts that are indexed by Twingly)
  • reindexedAt - timestamp when the post last was changed in our database/index
  • links - all links from the blog post to other resources
  • tags - blog post tags (may also be referred to as categories)

We have prepared fields for future data, these are currently not populated but it will be possible to start delivering this data in the future without breaking any implementations:

  • images - image URLs from the posts
  • coordinates - Geographical coordinates from blog post (a few posts do have this information)

Blog LiveFeed API

In addition to the common improvements, the following is now true for LiveFeed API:

  • Instant access to 30 days of data, no delays to accumulate data for backfill
  • Prepared nextTimestamp, no need to calculate the next timestamp
  • Impossible to “get stuck” if there’s data ingestion issues
  • Our Ruby API client has gained support for Blog LiveFeed v5

Blog Search API

Besides the new data that Search API is now delivering, we’ve also made the following improvements.

The .NET client hasn’t yet been updated. If you have the need for a .NET client, don’t hesitate to write about it in the issue. We also gladly accept pull requests.

Migration

For each API, we have documented the changes from the previous version, and listed the steps required to move to the new version. Please check it out:

These new versions are available for all existing, and new, customers. Please upgrade to take advantage of all the improvements. If you need any assistance, don’t hesitate to contact us!

BlogRank changes

We have made a few changes to our BlogRank algorithm in order to make it easier to distinguish between the blogs at the top of the BlogRank scale.

The previous version of the algorithm had the trait of slowly collecting more and more blogs at the top of the BlogRank scale as our blog index grew larger.

This has now been adjusted, making sure only the few most popular blogs has BlogRank 10. For blogs having a low BlogRank this change won’t be noticeable, but a significant part of the high-ranking blogs will get a lower BlogRank, which can be seen in the graphs below:

BlogRank 2-6 distribution for Swedish blogs before and after the change

Distribution of Swedish blogs with BlogRank 2-6 before and after the change.

BlogRank 6-10 distribution for Swedish blogs before and after the change

Distribution of Swedish blogs with BlogRank 6-10 before and after the change.

For the exact numbers on how much the BlogRank distribution has changed for all Swedish blogs, see the table below:

BlogRank Number of blogs before Number of blogs after
1 101,380 104,262
2 4,295 3,995
3 2,081 1,435
4 1,170 609
5 694 279
6 429 129
7 299 54
8 177 19
9 141 13
10 137 8

For details about BlogRank, authority and top authority see the ranking documentation.

API improvements

In both of our APIs, LiveFeed and Search, we truncate certain post’s summary field for stability and performance reasons. It was recently discovered that the truncation had two problems. The first, which would leave undesired HTML entities in the document under certain circumstances. The second, which could potentially cut off the last word when truncation occurred.

On 2016-10-11 we rolled out a fix which remedies both of these flaws. You should not find HTML entities in the summary field anymore (please tell us if you still find any!) and the last word of a truncated summary should be left intact. Note that the vast majority of posts are not, and will never be, truncated.

Another recent improvement is also related to undesired HTML entities. The tags field, only present in Search API, should not contain HTML as of 2016-09-08.

TLS 1.1 and TLS 1.2 fix

Due to a misconfiguration in our retrieval system we have not been able to ingest feeds using only the TLS 1.1 or TLS 1.2 encryption protocols. Since most encrypted sites still serve TLS 1.0, or even SSL 3.0, the misconfiguration has most likely not been noticeable in our ingestion statistics, but we expect to see more sites deprecating older protocols in favor of TLS 1.2.

The configuration has been updated and we are all feeling a bit more secure.

.NET client for Search API

We have published a .NET client for the Search API, grab it from NuGet.