Powered by Blogger.

Recent Comments

Archive for December 2021

Introduction

 

The performance of a software application is a non-functional requirement which indicates the responsiveness of the operations that it can perform. From user perspective, the responsiveness of the applications should be high and fast response times are demanded regardless of the location or the device that users are using.

A user's satisfaction with the application is influenced by the speed of the application. For web and mobile applications, the loading time for a page is a major factor in page abandonment. This is evident when we look at things such as a site's bounce and conversion rates.

Bounce rate:

 A bounce occurs when a user has just a single-page session on a site and leaves without visiting any of the other pages. The bounce rate, which is sometimes referred to as the exit rate, is the percentage of users who bounce: As you would expect, as page load times increase, so too does the bounce rate.



Examples of actions that result in a bounce include the user closing the browser window/tab, clicking on a link to visit a different site, clicking the back button to leave the site, navigating to a different site by typing in a new URL or using a voice command, or having a session timeout occur.

Conversion rate:

 The conversion rate is the percentage of site visitors who ultimately take the desired conversion action. The desired conversion action depends on the purpose of the site, but a few examples of common ones include placing an order, registering for membership, downloading a software product, or subscribing to a newsletter.



Page speed affects search rankings:

 Page speed is a consideration in a site's mobile search ranking in Google search results. Currently, this criterion only affects pages with the slowest performance, but it shows the importance Google places on web page performance. 

Terminology:

1.Latency:

 Latency is the amount of time (or delay) it takes to send information from a source to a destination. Factors such as the type of network hardware being utilized, the connection type, the distance that must be traveled, and the amount of congestion on the network all affect latency.

 2.Throughput:

 Throughput is a measure of a number of work items per a particular time unit. In the context of application logic, throughput is how much processing can be done in a given amount of time. An example of throughput in this context would be the number of transactions that can be processed per second.

3.Bandwidth

Bandwidth is the maximum possible throughput for a particular logical or physical communication path. Like throughput, it is typically measured in terms of a bit rate, or the maximum number of bits that could be transferred in a given unit of time.

4.Processing time

Processing time is the length of time that it takes for a software system to process a particular request, without including any time where messages are traveling across the network (latency).

Factors affecting processing time are, how the application code is written, the external software that works in conjunction with the application, and the characteristics of the hardware that is performing the processing.

5.Response time

Response time is the total amount of time between the user making a particular request and the user receiving a response to that request. For a given request, response time is a combination of both the network latency and the processing time.

6.Workload

Workload represents the amount of computational processing a machine has been given to do at a particular time. A workload uses up processor capacity, leaving less of it available for other tasks. Some common types of workload that may be evaluated are CPU, memory, I/O, and database workloads.

7. Utilization

Utilization is the percentage of time that a resource is used when compared with the total time that the resource is available for use. For example, if a CPU is busy processing transactions for 45 seconds out of a one-minute timespan, the utilization for that interval is 75%. Resources such as CPU, memory, and disk should be measured for utilization in order to obtain a complete picture of an application's performance. As utilization approaches the maximum throughput, response times will rise.

 

Systematic approach to Performance Improvements:



An iterative process that consists of the following steps can be used for performance improvement:

  • ·         Profiling the application
  • ·         Analyzing the results
  • ·         Implementing changes
  • ·         Monitoring changes

 

 

Content Delivery Network

 

Content delivery network (CDN)

Problem: On average, 80% of a website consist of static resources therefore accessing this static content from its origin server is time consuming and makes the website slower. This problem aggravates if the visitors of the website are from different geographic locations.

Solution:

CDN is essentially a group of servers that are strategically placed across the globe with the purpose of accelerating the delivery of your static web content.

CDN servers cache static content like images, videos, CSS, JavaScript files, etc. It enables the caching of HTML pages that are based on request path, query strings, cookies, and request headers.

CDN users also benefit from the ability to easily scale up and down much more easily due to traffic spikes.

How it works?

when a user visits a website, a CDN server closest to the user will deliver static content. Intuitively, the further users are from CDN servers, the slower the website loads. For example, if CDN servers are in San Francisco, users in Los Angeles will get content faster than users in Europe

How CDN improves load time?


CDN workflow:


1.       User A tries to get image.png by using an image URL. The URL’s domain is provided by the CDN provider. The following two image URLs are samples used to demonstrate what image URLs look like on Amazon and Akamai CDNs:

         • https://mysite.cloudfront.net/logo.jpg

         • https://mysite.akamai.com/image-manager/img/logo.jpg

2. If the CDN server does not have image.png in the cache, the CDN server requests the file from the origin, which can be a web server or online storage like Amazon S3.

3. The origin returns image.png to the CDN server, which includes optional HTTP header Time-to-Live (TTL) which describes how long the image is cached.

4. The CDN caches the image and returns it to User A. The image remains cached in the CDN until the TTL expires.

5. User B sends a request to get the same image.

6. The image is returned from the cache as long as the TTL has not expired.

Considerations of using a CDN:

Cost: CDNs are run by third-party providers, and charges are incurred for data transfers in and out of the CDN. Caching infrequently used assets provides no significant benefits so you should consider moving them out of the CDN.

Setting an appropriate cache expiry: For time-sensitive content, setting a cache expiry time is important. The cache expiry time should neither be too long nor too short. If it is too long, the content might no longer be fresh. If it is too short, it can cause repeat reloading of content from origin servers to the CDN.

 CDN fallback: You should consider how your website/application copes with CDN failure. If there is a temporary CDN outage, clients should be able to detect the problem and request resources from the origin.

 • Invalidating files: You can remove a file from the CDN before it expires by performing one of the following operations:

 • Invalidate the CDN object using APIs provided by CDN vendors.

• Use object versioning to serve a different version of the object. To version an object, you can add a parameter to the URL, such as a version number. For example, version number 2 is added to the query string: image.png?v=2. 


Kinds of CDNs based on how the content is cached and refreshed?

There are two kinds of CDNs based on how the content is cached and refreshed - Push CDNs and Pull CDNs

Push CDNs - In Push CDN model, content is pushed to the CDN cache servers whenever content is modified, or new content is added, on the origin server. Origin server is responsible for pushing the content to the CDN cache servers


Pull CDNs - In Pull CDN model, content is pulled by the CDN from the origin server when that content is first requested by a client. This content is then stored in the cache servers for a certain period of time, usually 1-30 days based on the TTL, so that subsequent requests for that content is served from the CDN cache server.

Popular CDNs

Cloud providers typically offer their own CDN solutions, since it’s so popular and easy to integrate with their other service offerings. Some popular CDNs include Cloudflare CDN, AWS Cloudfront, GCP Cloud CDN, Azure CDN, and Oracle CDN.

 When not to use CDNs?

If the service’s target users are in a specific region, then there won’t be any benefit of using a CDN, as you can just host your origin servers there instead.

CDNs are also not a good idea if the assets being served are dynamic and sensitive. You don’t want to serve stale data for sensitive situations, such as when working with financial/government services.

Scenario

You’re building Flipkart’s product listing service, which serves a collection of product metadata and images to online shoppers’ browsers. Where would a CDN fit in the following design?




CAP Theorem

 

Definitions:

Consistency: consistency means all clients see the same data at the same time no matter which node they connect to.

Availability: availability means any client which requests data gets a response even if some of the nodes are down.

Partition Tolerance: a partition indicates a communication break between two nodes. Partition tolerance means the system continues to operate despite network partitions.


CAP theorem:

CAP theorem states it is impossible for a distributed system to simultaneously provide more than two of these three guarantees: consistency, availability, and partition tolerance.


In distributed systems, data is usually replicated multiple times. Assume data are replicated on three replica nodes, n1, n2 and n3 as shown below.In the ideal world, network partition never occurs. Data written to n1 is automatically replicated to n2 and n3. Both consistency and availability are achieved.

In real world distributed systems, partitions cannot be avoided, and when a partition occurs, we must choose between consistency and availability. 



In above Figure, n3 goes down and cannot communicate with n1 and n2. If clients write data to n1 or n2, data cannot be propagated to n3. If data is written to n3 but not propagated to n1 and n2 yet, n1 and n2 would have stale data.

consistency over availability

If we choose consistency over availability (CP system), we must block all write operations to n1 and n2 to avoid data inconsistency among these three servers, which makes the system unavailable. Bank systems usually have extremely high consistent requirements. For example, it is crucial for a bank system to display the most up-to-date balance info. If inconsistency occurs due to a network partition, the bank system returns an error before the inconsistency is resolved.

Availability over consistency

The system keeps accepting reads, even though it might return stale data. For writes, n1 and n2 will keep accepting writes, and data will be synced to n3 when the network partition is resolved. Choosing the right CAP guarantees that fit your use case is an important step in building a distributed key-value store. 

References:
https://www.ibm.com/in-en/cloud/learn/cap-theorem

Database Sharding



Database Sharding:

As the data grows over time, the database will be overloaded and triggers a strong need to scale the data tier.

Database Scaling:

There are 2 ways to scale a database.

1.Vertical scaling

·         Add more power (CPU, RAM, DISK, etc.) to an existing machine.

·         Drawbacks:

o   Hardware limits.

o   With large userbase, a single server is not enough.

o   Single point of failures.

o   Cost of vertical scaling is high. Powerful servers are expensive.





 

2.Horizontal scaling

Also known as sharding, is the practice of adding more servers.

It separates large databases into smaller, more easily managed parts called shards. Schema corresponding to each shard is same even though the data on each shard is unique.

Consider the tinder application in which the user data is allocated to a database server based on location.  Anytime you access data, a hash function is used to find the corresponding shard.

Consider an example, where the database servers are segregated based on user_id. Say user_id % 4 is used as the hash function. If the result equals to 0, shard 0 is used to store and fetch data. If the result equals to 1,shard 1 is used.



Horizontal Vs Vertical Partitions:


Advantages of Sharding

Sharding allows you to scale your database to handle increased load to a nearly unlimited degree by providing increased read/write throughput, storage capacity, and high availability. Let’s look at each of those in a little more detail.

  • Increased Read/Write Throughput — By distributing the dataset across multiple shards, both read and write operation capacity is increased as long as read and write operations are confined to a single shard.
  • Increased Storage Capacity — Similarly, by increasing the number of shards, you can also increase overall total storage capacity, allowing near-infinite scalability.
  • High Availability — Finally, shards provide high availability in two ways. First, since each shard is a replica set, every piece of data is replicated. Second, even if an entire shard becomes unavailable since the data is distributed, the database as a whole still remains partially functional, with part of the schema on different shards.

 

Important Considerations:

Sharding key:  The most important factor to consider when implementing a sharding strategy is the choice of the sharding key. Sharding key (known as a partition key) consists of one or more columns that determine how data is distributed. A sharding key allows you to retrieve and modify data efficiently by routing database queries to the correct database.

Challenges with Sharding:

1.Resharding data:

Resharding data is needed when

1) a single shard could no longer hold more data due to rapid growth.

2) Certain shards might experience shard exhaustion faster than others due to uneven

data distribution. When shard exhaustion happens, it requires updating the sharding

function and moving data around.

 

2.Celebrity problem: This is also called a hotspot key problem. Excessive access to a specific shard could cause server overload. Imagine data for X, Y and Z users calls end up on the same shard. For social applications, that shard will be overwhelmed with read operations. To solve this problem, we may need to allocate a shard for each celebrity. Each shard might even require further partition.

 

 

3.Join and de-normalization: Once a database has been sharded across multiple servers, it is hard to perform join operations across database shards. A common workaround is to denormalize the database so that queries can be performed in a single table.

 

Alternatives to Sharding?

 

Sharding adds operational complexity, it is usually performed when dealing with very large data.

Below are the scenarios where it may be beneficial to shard a database:

 

·         The amount of application data grows to exceed the storage capacity of a single database node.

·         The volume of writes or reads to the database surpasses what a single node or its read replicas can handle, resulting in slowed response times or timeouts.

·         The network bandwidth required by the application outpaces the bandwidth available to a single database node and any read replicas, resulting in slowed response times or timeouts.

 

Following are the alternatives to consider before considering sharding

  • Setting up a remote database:

 If you’re working with a monolithic application in which all of its components reside on the same server, you can improve your database’s performance by moving it over to its own machine. This doesn’t add as much complexity as sharding since the database’s tables remain intact. However, it still allows you to vertically scale your database apart from the rest of your infrastructure.

 If your application’s read performance is what’s causing you trouble, caching is one strategy that can help to improve it. Caching involves temporarily storing data that has already been requested in memory, allowing you to access it much more quickly later on.

  • Creating one or more read replicas:

 Another strategy that can help to improve read performance, this involves copying the data from one database server (the primary server) over to one or more secondary servers. Following this, every new write goes to the primary before being copied over to the secondaries, while reads are made exclusively to the secondary servers. Distributing reads and writes like this keeps any one machine from taking on too much of the load, helping to prevent slowdowns and crashes. Note that creating read replicas involves more computing resources and thus costs more money, which could be a significant constraint for some.

  • Upgrading to a larger server:

 In most cases, scaling up one’s database server to a machine with more resources requires less effort than sharding. As with creating read replicas, an upgraded server with more resources will likely cost more money. Accordingly, you should only go through with resizing if it truly ends up being your best option.

Sharding Architectures

1. Key Based Sharding

This technique is also known as hash-based sharding. Here, we take the value of an entity such as customer ID, customer email, IP address of a client, zip code, etc and we use this value as an input of the hash function. This process generates a hash value which is used to determine which shard we need to use to store the data.

Consider an example that you have 3 database servers and each request has an application id which is incremented by 1 every time a new application is registered. To determine which server data should be placed on, we perform a modulo operation on these applications id with the number 3. Then the remainder is used to identify the server to store our data.

The downside of this method is elastic load balancing which means if you will try to add or remove the database servers dynamically it will be a difficult and expensive process. 

2. Vertical Sharding

In this method, we split the entire column from the table and we put those columns into new distinct tables. Data is totally independent of one partition to the other ones. Also, each partition holds both distinct rows and columns. Take the example of Twitter features. We can split different features of an entity in different shards on different machines. On Twitter users might have a profile, number of followers, and some tweets posted by his/her own. We can place the user profiles on one shard, followers in the second shard, and tweets on a third shard.

The main drawback of this scheme is that to answer some queries you may have to combine the data from different shards which unnecessarily increases the development and operational complexity of the system. 

 

3. Directory-Based Sharding

In this method, we create and maintain a lookup service or lookup table for the original database. Basically we use a shard key for lookup table and we do mapping for each entity that exists in the database. This way we keep track of which database shards hold which data. 



- Copyright © Technical Articles - Skyblue - Powered by Blogger - Designed by Johanes Djogan -