Posted by : Sushanth Tuesday 22 December 2015


Hbase Data model

Hbase Data model - These six concepts form the foundation of HBase.

Table— HBase organizes data into tables. Table names are Strings and composed of characters that are safe for use in a file system path.
Row— Within a table, data is stored according to its row. Rows are identified uniquely by their rowkey. Rowkeys don’t have a data type and are always treated as a
byte[].
Column family— Data within a row is grouped by column family. Column families also impact the physical arrangement of data stored in HBase. For this reason,they must be
defined up front and aren’t easily modified. Every row in a table has the same column families, although a row need not store data in all its families.Column family
names are Strings and composed of characters that are safe for use in a file system path.
Column qualifier— Data within a column family is addressed via its column qualifier,or column. Column qualifiers need not be specified in advance. Column qualifiers
need not be consistent between rows. Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[].
Cell— A combination of rowkey, column family, and column qualifier uniquely identifies a cell. The data stored in a cell is referred to as that cell’s value. Values
also don’t have a data type and are always treated as a byte[].
Version— Values within a cell are versioned. Versions are identified by their timestamp,a long. When a version isn’t specified, the current timestamp is used as the
basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three. Versions stored in decreasing order of timestamp.

Example:



For a particular row say row1 ,the row key is row1,first name is the column qualifier and its value is Rohit.This column is part of Column family 1(name).
Row2 has two column families,under column family 1 we have name and in column family 2 we have address.
Logical representation of HBASE cells:



This picture describes the logical representation of the HBASE data model. For example, row1 has two column families, Name family and Address family. Name family contains the First and Last as column qualifiers. By default, it maintains 3 versions with different timestamp. When a row is retrieved, it returns the latest version, in this case it returns T1 timestamp version for Row1.

Leave a Reply

Subscribe to Posts | Subscribe to Comments

- Copyright © Technical Articles - Skyblue - Powered by Blogger - Designed by Johanes Djogan -