Archive for 2021
Introduction
The performance of a software application is a
non-functional requirement which indicates the responsiveness of the operations
that it can perform. From user perspective, the responsiveness of the
applications should be high and fast response times are demanded regardless of the
location or the device that users are using.
A
user's satisfaction with the application is influenced by the speed of the
application. For web and mobile applications, the loading time for a page is a
major factor in page abandonment. This is evident when we look at things such
as a site's bounce and conversion rates.
Bounce rate:
A bounce occurs when a user has just a
single-page session on a site and leaves without visiting any of the other
pages. The bounce rate, which is sometimes referred to as the exit rate, is the
percentage of users who bounce: As you would expect, as page load times
increase, so too does the bounce rate.
Examples
of actions that result in a bounce include the user closing the browser
window/tab, clicking on a link to visit a different site, clicking the back
button to leave the site, navigating to a different site by typing in a new URL
or using a voice command, or having a session timeout occur.
Conversion rate:
The conversion rate is the percentage of site
visitors who ultimately take the desired conversion action. The desired
conversion action depends on the purpose of the site, but a few examples of
common ones include placing an order, registering for membership, downloading a
software product, or subscribing to a newsletter.
Page speed affects search
rankings:
Page speed is a consideration in a site's mobile search ranking in Google search results. Currently, this criterion only affects pages with the slowest performance, but it shows the importance Google places on web page performance.
Terminology:
1.Latency:
Latency is the amount of time (or delay) it takes
to send information from a source to a destination. Factors such as the type of
network hardware being utilized, the connection type, the distance that must be
traveled, and the amount of congestion on the network all affect latency.
Throughput is a measure of a number of work
items per a particular time unit. In the context of application logic,
throughput is how much processing can be done in a given amount of time. An
example of throughput in this context would be the number of transactions that
can be processed per second.
3.Bandwidth
Bandwidth
is the maximum possible throughput for a particular logical or physical
communication path. Like throughput, it is typically measured in terms of a bit
rate, or the maximum number of bits that could be transferred in a given unit
of time.
4.Processing time
Processing
time is the length of time that it takes for a software system to process a
particular request, without including any time where messages are traveling
across the network (latency).
Factors
affecting processing time are, how the application code is written, the
external software that works in conjunction with the application, and the
characteristics of the hardware that is performing the processing.
5.Response time
Response
time is the total amount of time between the user making a particular request
and the user receiving a response to that request. For a given request,
response time is a combination of both the network latency and the processing
time.
6.Workload
Workload
represents the amount of computational processing a machine has been given to
do at a particular time. A workload uses up processor capacity, leaving less of
it available for other tasks. Some common types of workload that may be
evaluated are CPU, memory, I/O, and database workloads.
7. Utilization
Utilization
is the percentage of time that a resource is used when compared with the total
time that the resource is available for use. For example, if a CPU is busy
processing transactions for 45 seconds out of a one-minute timespan, the
utilization for that interval is 75%. Resources such as CPU, memory, and disk
should be measured for utilization in order to obtain a complete picture of an
application's performance. As utilization approaches the maximum throughput,
response times will rise.
Systematic approach to
Performance Improvements:
An
iterative process that consists of the following steps can be used for
performance improvement:
- ·
Profiling
the application
- ·
Analyzing
the results
- ·
Implementing
changes
- ·
Monitoring
changes
Content Delivery Network
Content delivery network (CDN)
Problem: On average, 80% of
a website consist of static resources therefore accessing this static content
from its origin server is time consuming and makes the website slower. This
problem aggravates if the visitors of the website are from different geographic
locations.
Solution:
A CDN is
essentially a group of servers that are strategically placed across the globe
with the purpose of accelerating the delivery of your static web content.
CDN
servers cache static content like images, videos, CSS, JavaScript files, etc.
It enables the caching of HTML pages that are based on request path, query
strings, cookies, and request headers.
CDN users also benefit from
the ability to easily scale up and down much more easily due to traffic spikes.
How it works?
when
a user visits a website, a CDN server closest to the user will deliver static
content. Intuitively, the further users are from CDN servers, the slower the
website loads. For example, if CDN servers are in San Francisco, users in Los
Angeles will get content faster than users in Europe
How CDN improves load time?
CDN workflow:
1.
User
A tries to get image.png by using an image URL. The URL’s domain is provided by
the CDN provider. The following two image URLs are samples used to demonstrate
what image URLs look like on Amazon and Akamai CDNs:
• https://mysite.cloudfront.net/logo.jpg
•
https://mysite.akamai.com/image-manager/img/logo.jpg
2.
If the CDN server does not have image.png in the cache, the CDN server requests
the file from the origin, which can be a web server or online storage like
Amazon S3.
3.
The origin returns image.png to the CDN server, which includes optional HTTP
header Time-to-Live (TTL) which describes how long the image is cached.
4.
The CDN caches the image and returns it to User A. The image remains cached in
the CDN until the TTL expires.
5.
User B sends a request to get the same image.
6. The image is returned from the cache as long as the TTL has not expired.
Considerations of using a CDN:
•
Cost: CDNs are run by third-party
providers, and charges are incurred for data transfers in and out of the CDN.
Caching infrequently used assets provides no significant benefits so you should
consider moving them out of the CDN.
•
Setting an appropriate cache expiry:
For time-sensitive content, setting a cache expiry time is important. The cache
expiry time should neither be too long nor too short. If it is too long, the
content might no longer be fresh. If it is too short, it can cause repeat
reloading of content from origin servers to the CDN.
• CDN
fallback: You should consider how your website/application copes with CDN
failure. If there is a temporary CDN outage, clients should be able to detect
the problem and request resources from the origin.
• Invalidating
files: You can remove a file from the CDN before it expires by performing
one of the following operations:
• Invalidate the CDN object using APIs provided by CDN vendors.
• Use object versioning to serve a different version of the object. To version an object, you can add a parameter to the URL, such as a version number. For example, version number 2 is added to the query string: image.png?v=2.
Kinds of CDNs based on how the content is cached and refreshed?
There are two kinds
of CDNs based on how the content is cached and refreshed - Push CDNs and Pull
CDNs
Push
CDNs -
In Push CDN model, content is pushed to the CDN cache servers whenever content
is modified, or new content is added, on the origin server. Origin server is
responsible for pushing the content to the CDN cache servers
Pull
CDNs -
In Pull CDN model, content is pulled by the CDN from the origin server when
that content is first requested by a client. This content is then stored in the
cache servers for a certain period of time, usually 1-30 days based on the TTL,
so that subsequent requests for that content is served from the CDN cache
server.
Popular CDNs
Cloud providers typically offer their own CDN solutions, since it’s so popular and easy to integrate with their other service offerings. Some popular CDNs include Cloudflare CDN, AWS Cloudfront, GCP Cloud CDN, Azure CDN, and Oracle CDN.When not to use CDNs?
If the service’s target users are in a specific region, then
there won’t be any benefit of using a CDN, as you can just host your origin
servers there instead.
CDNs
are also not a good idea if the assets being served are dynamic and sensitive.
You don’t want to serve stale data for sensitive situations, such as when working with
financial/government services.
Scenario
You’re building Flipkart’s product listing service, which serves a collection of product metadata and images to online shoppers’ browsers. Where would a CDN fit in the following design?CAP Theorem
Definitions:
Consistency: consistency means
all clients see the same data at the same time no matter which node they
connect to.
Availability: availability
means any client which requests data gets a response even if some of the nodes
are down.
Partition Tolerance: a partition
indicates a communication break between two nodes. Partition tolerance means
the system continues to operate despite network partitions.
CAP theorem:
CAP
theorem states it is impossible for a distributed system to simultaneously
provide more than two of these three guarantees: consistency, availability, and
partition tolerance.
In
distributed systems, data is usually replicated multiple times. Assume data are
replicated on three replica nodes, n1, n2 and n3 as shown below.In the ideal
world, network partition never occurs. Data written to n1 is automatically
replicated to n2 and n3. Both consistency and availability are achieved.
In real world distributed systems, partitions cannot be avoided, and when a partition occurs, we must choose between consistency and availability.
In
above Figure, n3 goes down and cannot communicate with n1 and n2. If clients
write data to n1 or n2, data cannot be propagated to n3. If data is written to
n3 but not propagated to n1 and n2 yet, n1 and n2 would have stale data.
consistency over availability
If
we choose consistency over availability (CP system), we must block all write
operations to n1 and n2 to avoid data inconsistency among these three servers,
which makes the system unavailable. Bank systems usually have extremely high
consistent requirements. For example, it is crucial for a bank system to display
the most up-to-date balance info. If inconsistency occurs due to a network
partition, the bank system returns an error before the inconsistency is
resolved.
Availability over consistency
The
system keeps accepting reads, even though it might return stale data. For
writes, n1 and n2 will keep accepting writes, and data will be synced to n3
when the network partition is resolved. Choosing the right CAP guarantees that
fit your use case is an important step in building a distributed key-value
store.
Database Sharding
As the data grows over time, the database will be
overloaded and triggers a strong need to scale the data tier.
Database
Scaling:
There are 2 ways to scale a database.
1.Vertical
scaling
·
Add more power (CPU, RAM, DISK, etc.) to an existing
machine.
·
Drawbacks:
o
Hardware limits.
o
With large userbase, a single server is not enough.
o
Single point of failures.
o
Cost of vertical scaling is high. Powerful servers are
expensive.
2.Horizontal
scaling
Also known as sharding, is the practice of adding more
servers.
It separates large databases into smaller, more easily
managed parts called shards. Schema corresponding to each shard is same even
though the data on each shard is unique.
Consider the tinder application in which the user data
is allocated to a database server based on location. Anytime you access data, a hash function is
used to find the corresponding shard.
Consider an example, where the database servers are segregated
based on user_id. Say user_id % 4 is used as the hash function. If the result
equals to 0, shard 0 is used to store and fetch data. If the result equals to
1,shard 1 is used.
Horizontal
Vs Vertical Partitions:
Advantages of Sharding
Sharding allows you to scale your database to handle
increased load to a nearly unlimited degree by providing increased
read/write throughput, storage capacity, and high availability. Let’s
look at each of those in a little more detail.
- Increased
Read/Write Throughput — By distributing the dataset across multiple
shards, both read and write operation capacity is increased as long as
read and write operations are confined to a single shard.
- Increased
Storage Capacity — Similarly, by increasing the number of shards, you
can also increase overall total storage capacity, allowing near-infinite
scalability.
- High
Availability — Finally, shards provide high availability in two ways.
First, since each shard is a replica set, every piece of data is
replicated. Second, even if an entire shard becomes unavailable since the
data is distributed, the database as a whole still remains partially
functional, with part of the schema on different shards.
Important
Considerations:
Sharding key: The
most important factor to consider when implementing a sharding strategy is the
choice of the sharding key. Sharding key (known as a partition key) consists of
one or more columns that determine how data is distributed. A sharding key
allows you to retrieve and modify data efficiently by routing database queries
to the correct database.
Challenges
with Sharding:
1.Resharding
data:
Resharding data is needed when
1) a single shard could no longer hold more data due
to rapid growth.
2) Certain shards might experience shard exhaustion
faster than others due to uneven
data distribution. When shard exhaustion happens, it
requires updating the sharding
function and moving data around.
2.Celebrity problem: This is also called a
hotspot key problem. Excessive access to a specific shard could cause server
overload. Imagine data for X, Y and Z users calls end up on the same shard. For
social applications, that shard will be overwhelmed with read operations. To
solve this problem, we may need to allocate a shard for each celebrity. Each
shard might even require further partition.
3.Join and de-normalization: Once
a database has been sharded across multiple servers, it is hard to perform join
operations across database shards. A common workaround is to denormalize the
database so that queries can be performed in a single table.
Alternatives to Sharding?
Sharding adds operational complexity, it is usually performed when
dealing with very large data.
Below are the scenarios where it may be beneficial to shard a database:
·
The amount of application data grows to exceed the
storage capacity of a single database node.
·
The volume of writes or reads to the database
surpasses what a single node or its read replicas can handle, resulting in
slowed response times or timeouts.
·
The network bandwidth required by the application
outpaces the bandwidth available to a single database node and any read
replicas, resulting in slowed response times or timeouts.
Following are the alternatives to consider before considering sharding
- Setting
up a remote database:
If you’re working with a monolithic
application in which all of its components reside on the same server, you can
improve your database’s performance by moving it over to its own machine. This
doesn’t add as much complexity as sharding since the database’s tables remain
intact. However, it still allows you to vertically scale your database apart
from the rest of your infrastructure.
- Implementing caching:
If your application’s read performance is
what’s causing you trouble, caching is one strategy that can help to improve
it. Caching involves temporarily storing data that has already been requested
in memory, allowing you to access it much more quickly later on.
- Creating
one or more read replicas:
Another strategy that can help to improve read
performance, this involves copying the data from one database server (the primary
server) over to one or more secondary servers. Following this, every new
write goes to the primary before being copied over to the secondaries, while
reads are made exclusively to the secondary servers. Distributing reads and
writes like this keeps any one machine from taking on too much of the load,
helping to prevent slowdowns and crashes. Note that creating read replicas
involves more computing resources and thus costs more money, which could be a
significant constraint for some.
- Upgrading
to a larger server:
In most cases, scaling up one’s database
server to a machine with more resources requires less effort than sharding. As
with creating read replicas, an upgraded server with more resources will likely
cost more money. Accordingly, you should only go through with resizing if it
truly ends up being your best option.
Sharding
Architectures
1. Key Based Sharding
This
technique is also known as hash-based sharding. Here, we take
the value of an entity such as customer ID, customer email, IP address of a
client, zip code, etc and we use this value as an input of the hash
function. This process generates a hash value which is
used to determine which shard we need to use to store the data.
Consider an example that you have 3
database servers and each request has an application id which is incremented by
1 every time a new application is registered. To determine which server data
should be placed on, we perform a modulo operation on these applications id
with the number 3. Then the remainder is used to identify the server to store
our data.
The downside of this
method is elastic load balancing which means if you
will try to add or remove the database servers dynamically it will be a
difficult and expensive process.
2. Vertical Sharding
In this method, we split the entire
column from the table and we put those columns into new distinct tables. Data
is totally independent of one partition to the other ones. Also, each partition
holds both distinct rows and columns. Take the example of Twitter features. We
can split different features of an entity in different shards on different
machines. On Twitter users might have a profile, number of followers, and some
tweets posted by his/her own. We can place the user profiles on one shard,
followers in the second shard, and tweets on a third shard.
The main drawback of this scheme is that to answer some queries you may
have to combine the data from different shards which unnecessarily increases
the development and operational complexity of the system.
3. Directory-Based Sharding
In this
method, we create and maintain a lookup service or lookup
table for the original database. Basically we use a shard key for
lookup table and we do mapping for each entity that exists in
the database. This way we keep track of which database shards hold which
data.