HBase is a distributed database, this article will introduce HBase Native API to help everyone have a preliminary understanding of HBase API.

HBase intruduction

HBase is a distributed database , which can support random/sequential write/read of batch data.HBase architecture is shown in the following figure.

fig1.hbase architecture

HBase Master is the key component, in charge of the whole cluster management, such as node heartbeat, meta data, load balance, fault tolerance, and region management. Client is in charge of the communication between application and hbase. Zookeeper is a distributed coordination component, in charge of master selection and cluster status management. Regionserver is the node of cluster, in charge of data storage, region management, write/read request response, .e.g.

HBase data model is shown in the following table.
fig2. hbase data model

As we can know from above figure, the model includes four pieces, rowkey, timestamp, column family and qualifer. Rowkey is the unique key of hbase, each record in hbase has its own timestamp to identify the time of insert or update; CF(Column Family) consists of serveral similar columns(qualifiers), different attributes of column distribute different cf in order to manage more conveniently.

HBase Shell API

How to get row data from hbase? HBase provides two kinds of api to complete it, scan and get. The GET method aims to get one row by rowkey each time, nevertheless the SCAN method aims to get multi rows by rowkey-prefix each time. Next, we will introduce GET and SCAN method.

First, Get is the standard api of hbase. If we want to get r1 rowkey data by rowkey, the command in hbase shell will like this:

1
2
get 'table','r1'
get 'table', 'r1', 'cf:a'

The first command will return all columns data (cf:a, cf:b), and the second command will only return 'cf:a' data.

Second, Scan is also the standard api of hbase. If we want to get both 'r1' and 'r2' data, and hypothesis both of them have common rowkey-prefix,then we can use scan like this:

1
scan 'table',{STARTROW=>'row-start-prefix',ENDROW=>'row-end-prefix'}

As we can see this command, we must specify two parameters STARTROW and ENDROW, which symbolize the rowkey data range of scan in hbase, if we only need specified number of lines, the other parameter can help us to do this, like this:

1
scan 'table',{STARTROW=>'row-start-prefix', ENDROW=>'row-end-prefix',LIMIT=>2}

The LIMIT instruction tells hbase to scan the limit rows and then return result.

If we need return specified value of row, we can use filter to help us, like this:

1
scan 'table',{STARTROW=>'row-start-prefix',ENDROW=>'row-end-prefix',FILTER=>"SingleColumnValueFilter('cf','a',=,'binary:1')"}

This command will return all rows of 'cf:a' equals 1 . There are also many other filter method in hbase, such PrefixFilter, CompareFilter and so on. In future articles , I will introduce more about filters knowledge.

Conclusion

In this paper, I only introduce the simple hbase shell api GET and SCAN. In next paper, I will start to introduce more about java api to get hbase data.