Monolith to Microservice: Architecture Behind V2 Webhooks

Paulius Jakimavičius
Teamwork Engine Room
6 min readSep 11, 2017

--

We’ve recently released an update to the Teamwork Projects webhooks feature and I was asked to do a tech talk on the topic, a recording of which can be found below. This article is a rough summary of the talk with further elaboration of the thought process behind some of our decisions, but in summary:

  • We have a microservice written in Go.
  • We now support multiple webhooks per-event
  • We send full datasets in chosen format (JSON, XML or Form)
  • We offer better security and integrity with sha256 checksums
  • We have more frequent retry intervals.

What are Webhooks?

The basic idea of webhooks is simple — it’s a HTTP POST call triggered when some event occurs. You can think of it like a JavaScript callback, except it’s a HTTP POST that can be sent anywhere, to anyone — not limited to a single codebase or domain.

In the case of Teamwork Projects, we have a set list of possible events you can subscribe to, denoted in OBJECT.ACTION style, so for example TASK.CREATED or MESSAGE.DELETED.

Previous Approach (v1)

Our original approach was no more complicated than the concept of webhooks itself — when any of the supported actions are taken in Projects, a “fireWebhook” function is called, which checks if they’re enabled and if there is a webhook set up for that event. If all is good — we make a cfhttp call to the relevant URL with the item ID, item type and the action taken, all encoded as form data.

Now there are a few limitations with this approach which showed up as the number of webhook users increased over time:

  1. Only one webhook per event — although it made sense as a sort of anti-spam measure initially, this resulted in a few inconveniences for our customers. For example, a user sets up a TASK.CREATED webhook that notifies a custom service which adds the task ID to their internal database. This would make it impossible for them to later set up, say a Zapier zap using TASK.CREATED event as it is already “in use”.
  2. Limited dataset — sending only the event, object ID and user/account ID might be a great way to shave off a few bytes of data, however in most cases such limited information would result in the customers making additional API calls to get the object data.
  3. Long retry intervals — if a webhook fails, we try it again only an hour later, after 3 total unsuccessful attempts, the webhook is deactivated. This means that if there was a temporary issue with the service, you’d have to wait a whole hour for you notification to come in, you could have hundreds of new events that have successfully gone through in that time and the order can get quite confusing on the receiving end.

Cue Webhooks v2

So the whole idea of a webhooks rework was to resolve these issues, without disturbing the existing implementation, hence webhooks v2. The idea was to trigger a RabbitMQ event alongside when v1 webhooks are to be fired and have a separate service listening to these events and making the relevant HTTP POST calls as needed.

Of course, there is no point reinventing the wheel, so a lot of the code has been “borrowed” from Teamwork Desks webhook implementation, and by “borrowed” I mean I copy-pasted their code base and wrangled it into shape to work with Projects. So out-right their implementation already solved half of our issues and brought extras to the table:

  1. We are sending a whole data object as part of the webhook payload (with the exception of DELETED events), so in most cases no further API calls are needed. Furthermore we can offer the payload not just in form data but also in JSON and XML formats.
  2. Failed webhooks are retried 3 times with exponential backoff, so the first retry is after 1 minute, the next is after 5 and the last is 10 minutes later. This is implemented using RabbitMQ queues with different TTLs that send the events to specific dead-letter queues.
  3. Improved security/integrity — we generate sha256 checksums using user-provided tokens and include them in the headers, meaning they can verify the integrity of the payload on their end.

Aside from the Go service, not many changes were needed on the Projects side. We already had RabbitMQ set up that triggered on all webhook-related events so it was just a matter of sending a few extra headers to get it working.

Now we’re still left with one-webhook-per-event problem, which I first had to resolve on the database side — dropping an extra unique index and then removing the limitation on the Projects ColdFusion codebase. We still have a limit of one-webhook-per-event-per-URL, but that can easily be removed in the future. The great thing about this change is that it does not just affect v2 webhooks, users who are still using v1 webhooks can also make new webhooks for the same event, whichever version they choose.

Lastly, Desk had a nice feature of being able to redeliver a specific webhook delivery and it didn’t feel like the project would be complete if I didn’t implement it in the projects side. Again, this involved triggering a RabbitMQ event marked as a “redelivery” sent off to the retries queue.

In terms of deployment, once again we copied the Desk implementation — once something is pushed or merged to master, Codeship tests are run and if they pass, an image is uploaded to Docker Hub and deployed to Amazon ECS with a new task definition on both EU and US servers. It’s been a smooth process so far, in fact the service was running for 2 months in both regions before v2 webhooks were even released, it’s just no-one (apart from Beta testers) could create v2 webhooks yet.

Possible Improvements & Future Plans

As for the future, there are a few possible improvements/changes:

  • Per-project webhooks — an option to only fire a webhook if the event is from within a specific project
  • Different payload structures — for example specific formats that 3rd party APIs expect (Microsoft, Google, etc.) as well as API-consistent format once we release a new API version.
  • Sending more metadata in AMQP events — we already send some settings like “webhookEnabled” which is cached on the Projects side and saves us an extra query on the Go side, but we could definitely reduce the number of queries and joins in the webhooks service if we more send data like project and installation IDs
  • Endpoint-based structure — we could restructure Projects webhooks to be even more like Desk by revolving them around the endpoint URL rather than the event — so the user enters a URL and then chooses the relevant events (and projects). This was the main difference in Projects and Desk implementations, but was skipped due to major changes needed to the database, which would break backwards compatibility with v1 webhooks.
  • Deprecation of v1 webhooks — although this will be a slow and gradual process, we will eventually disable v1 webhooks. This will most likely start with any new accounts being limited to v2 and existing accounts only being able to add v2 but still edit v1 webhooks. This will make sure v1 webhooks are phased out smoothly and new users are discouraged from using the legacy option.

Resources

If you have any questions, ask in the comments below or feel free to get in touch via social media. :)

If you like the sound of working at Teamwork, get in touch — we’re hiring.

--

--