Design Real Time Notification System (From Alex Xu)
date
Mar 28, 2024
slug
design-notification-system
status
Published
tags
System Design
Technical Interview
type
Post
URL
summary
Understand RequirementsHigh Level DesignContact info gathering flowNotification sending/receiving flowComponentsOverall ArchitectureDesign Deep diveReliabilityAdditional components and considerationsFinished DesignReferencesRef
Understand Requirements
- Push notification, SMS message, and email.
- Soft real-time system.
- Notifications can be triggered by client applications. They can also be scheduled on the server-side.
- 10 million mobile push notifications, 1 million SMS messages, and 5 million emails.
High Level Design
Contact info gathering flow
To send notifications, we need to gather mobile device tokens, phone numbers, or email addresses. When a user installs our app or signs up for the first time, API servers collect user contact info and store it in the database.

Notification sending/receiving flow

Components
- Service 1 to N:
A service can be a micro-service, a cron job, or a distributed system that triggers notification sending events. For example, a billing service sends emails to remind customers of their due payment or a shopping website tells customers that their packages will be delivered tomorrow via SMS messages.
- Notification system:
The notification system is the centerpiece of sending/receiving notifications. Starting with something simple, only one notification server is used. It provides APIs for services 1 to N, and builds notification payloads for third party services.
- Third-party services:
Third party services are responsible for delivering notifications to users. While integrating with third-party services, we need to pay extra attention to extensibility. Good extensibility means a flexible system that can easily plugging or unplugging of a third-party service. Another important consideration is that a third-party service might be unavailable in new markets or in the future. For instance, FCM is unavailable in China. Thus, alternative third-party services such as Jpush, PushY, etc are used there.
Overall Architecture

- Notification Service:
- Provide apis for services to sent notifications
- Carry out validations
- query db or cache to fetch data needed to render a notification (template, image, etc)
- Put notification to MQs for processing. Can have high priority and low priority queue for each fanout service
- Cache:
- Caches user info, device info, notification tempaltes,etc
- DB:
- Stores the same stuff as cache and extra metadata, etc
- MQ:
- Removed dependency between notification server and fanout services.
- Act as a buffer when high volumes of notifications are to be sent out.
- Each notification type is assigned with a dedicated queue with one or multiple service provider. Failed notification is put back
- Workers
- Pull notification events from queue and sent them to corresponding vendors
- Overall flow
- A service calls APIs provided by notification servers to send notifications.
- Notification servers fetch metadata such as user info, device token, and notification setting from the cache or database
- A notification event is sent to the corresponding queue for processing. For instance, an iOS push notification event is sent to the iOS PN queue.
- Workers pull notification events from message queues.
- Workers send notifications to third party services.
- Third-party services send notifications to user devices.
Design Deep dive
In the high-level design, we discussed different types of notifications, contact info gathering flow, and notification sending/receiving flow. We will explore the following in deep dive:
- Reliability.
- Additional component and considerations: notification template, notification settings, rate limiting, retry mechanism, security in push notifications, monitor queued notifications and event tracking.
- Updated design.
Reliability
- How to prevent data loss
- Use SQS and if message is not marked as finished, it will get back into the queue
- Or the notification system persists notification data in a database and implements a retry mechanism
- How to avoid deuplicate notification
- To reduce the duplication occurrence, we introduce a dedupe mechanism and handle each failure case carefully.
- Here is a simple dedupe logic: When a notification event first arrives, we check if it is seen before by checking the event ID. If it is seen before, it is discarded. Otherwise, we will send out the notification.
Additional components and considerations
We have discussed how to collect user contact info, send, and receive a notification. A
notification system is a lot more than that. Here we discuss additional components including
template reusing, notification settings, event tracking, system monitoring, rate limiting, etc.
- Notification template
- A large notification system sends out millions of notifications per day, and many of these notifications follow a similar format. Notification templates are introduced to avoid building every notification from scratch. A notification template is a preformatted notification to create your unique notification by customizing parameters, styling, tracking links, etc. Here is an example template of push notifications.
BODY:
You dreamed of it. We dared it. [ITEM NAME] is back — only until [DATE].
CTA:
Order Now. Or, Save My [ITEM NAME]
The benefits of using notification templates include maintaining a consistent format, reducing
the margin error, and saving time.
- Notification setting
Users generally receive way too many notifications daily and they can easily feel
overwhelmed. Thus, many websites and apps give users fine-grained control over notification
settings. This information is stored in the notification setting table, with the following fields:
user_id bigInt
channel varchar # push notification, email or SMS
opt_in boolean # opt-in to receive notification
Before any notification is sent to a user, we first check if a user is opted-in to receive this type
of notification.
- Rate limiting
To avoid overwhelming users with too many notifications, we can limit the number of
notifications a user can receive. This is important because receivers could turn off
notifications completely if we send too often.
- Retry mechanism
When a third-party service fails to send a notification, the notification will be added to the message queue for retrying. If the problem persists, an alert will be sent out to developers.
- Security in push notifications (Auth) For iOS or Android apps, appKey and appSecret are used to secure push notification APIs. Only authenticated or verified clients are allowed to send push notifications using our APIs.
- Monitor queued notifications (Auto scaling more workers)
- A key metric to monitor is the total number of queued notifications. If the number is large, the notification events are not processed fast enough by workers. To avoid delay in the notification delivery, more workers are needed.
- Events tracking
Notification metrics, such as open rate, click rate, and engagement are important in understanding customer behaviors. Analytics service implements events tracking. Integration between the notification system and the analytics service is usually required.

Finished Design

References
System design interviews
You Cannot Have Exactly-Once Delivery: https://bravenewgeek.com/you-cannot-have-exactly-once-delivery/
Security in Push Notifications: https://cloud.ibm.com/docs/services/mobilepush?topic=mobile-pushnotification-security-in-push-notifications
RadditMQ: https://bit.ly/2sotIa6
Ref



