Tuesday, 22 December 2015

HBase Architecture

HBase Architecture:
First we will see the overview about the different components in HBase architecture and then we will see in detail what happens when a client writes a data into HBase.

There are two types of HBase nodes:
HMaster and RegionServer
HMaster: Master server responsible for region server monitoring as well as assignment and load balancing of regions. It holds metadata and schemas.
RegionServer: With in the region server, there are number of regions called HRegions. Each region contains a number of stores called HStores where in each store corresponds to storage of one column family. Each store in turn contains memstore and HFILE..Whenever client writes some data, it is first written to  WAL (write ahead log) and then it is written to memstore.Memstore is equivalent to ram which flushes the data to HDFS (i.e HFILE )once the memstore is filled up.WAL is required because if the region goes down, WAL helps to recover the data.
ROOT and .META tables:
There are the  special tables which store schema information and region locations.


-ROOT- keeps track of where the .META. table is. The -ROOT- table structure is as follows:
Key:    .META. region key (.META.,,1)
  • info:regioninfo (serialized HRegionInfo instance of .META.)
  • info:server (server:port of the RegionServer holding .META.)
  • info:serverstartcode (start-time of the RegionServer process holding .META.)


The .META. table keeps a list of all regions in the system. The .META. table structure is as follows:
Key:  Region key of the format ([table],[region start key],[region id])
  • info:regioninfo (serialized HRegionInfo instance for this region)
  • info:server (server:port of the RegionServer containing this region)
  • info:serverstartcode (start-time of the RegionServer process containing this region)
Zookeeper: Zookeeper is used as the distribution co-ordination service. It manages master election and server availability.


 The above picture describes the sequence of events which happen when a client writes data into Hbase.
When RegionServer (RS) receives write request, it directs the request to specific Region. Each Region stores set of rows. Rows data can be separated in multiple column families (CFs). Data of particular CF is stored in HStore which consists of Memstore and a set of HFiles. Memstore is kept in RS main memory, while HFiles are written to HDFS. When write request is processed, data is first written into the Memstore. Then, when certain thresholds are met (obviously, main memory is well-limited) Memstore data gets flushed into HFile.
The main reason for using Memstore is the need to store data on DFS ordered by row key. As HDFS is designed for sequential reads/writes, with no file modifications allowed, HBase cannot efficiently write data to disk as it is being received: the written data will not be sorted (when the input is not sorted) which means not optimized for future retrieval. To solve this problem HBase buffers last received data in memory (in Memstore), “sorts” it before flushing, and then writes to HDFS using fast sequential writes. Note that in reality HFile is not just a simple list of sorted rows, it is much more than that.
Apart from solving the “non-ordered” problem, Memstore also has other benefits, e.g.:
  • It acts as a in-memory cache which keeps recently added data. This is useful in numerous cases when last written data is accessed more frequently than older data
  • There are certain optimizations that can be done to rows/cells when they are stored in memory before writing to persistent store. E.g. when it is configured to store one version of a cell for certain CF and Memstore contains multiple updates for that cell, only most recent one can be kept and older ones can be omitted (and never written to HFile).
Important thing to note is that every Memstore flush creates one HFile per CF.
On the reading end things are simple: HBase first checks if requested data is in Memstore, then goes to HFiles and returns merged result to the user.

WAL: Write Ahead Log:
The WAL is in HDFS in /hbase/.logs/ with subdirectories per region.

The WAL is the lifeline that is needed when disaster strikes. Similar to a BIN log in MySQL it records all changes to the data. This is important in case something happens to the primary storage. So if the server crashes it can effectively replay that log to get everything up to where the server should have been just before the crash. It also means that if writing the record to the WAL fails the whole operation must be considered a failure.

HLog:The class which implements the WAL is called HLog and there is only one instance of the HLog class, per HRegionServer. When a HRegion is instantiated the single HLog is passed on as a parameter to the constructor of HRegion.
Central part to HLog's functionality is the append() method, which internally eventually calls doWrite().For performance reasons there is an option for put()delete(), and incrementColumnValue() to be called with an extra parameter set: setWriteToWAL(boolean).
Another important feature of the HLog is keeping track of the changes. This is done by using a "sequence number". It uses an AtomicLong internally to be thread-safe and is either starting out at zero - or at that last known number persisted to the file system. So as the region is opening its storage file, it reads the highest sequence number which is stored as a meta field in each HFile and sets the HLog sequence number to that value if it is higher than what has been recorded before. So at the end of opening all storage files the HLog is initialized to reflect where persisting has ended and where to continue.


Post a comment