Today, I want to try to write the first blog with english.Maybe someone feel doubtful for its necessity. Obviously, our way of thinking is inferior to those of foreginers, and is hard to organize the whole article.However, What I want to say is that writing need take a log of time to train and summerize. Theory of 10,000 hours training is available for any fields, though my level can not hold a candle to write a professional article so far, but I believe quantitative may cause a qualitative change.

Introduction of work

My career now is engaged in big data platform construction and operation. I have been focusing on the Hadoop ecosystem,such as hbase,spark,kafka and map/reduce for 3 years plus. My primary responsibilities include system programme, demand-focused, business access, conceptual design, data access and storage, data reconciliation, and data repair,etc.

As the rapid growth of finance business, the data scale has been reached P-level, larger than G-level or T-level data of passed time. The history data had been stored in multi MySQL databases, which exposed lots of problems, such as weak of horizontal expansion, complexity of archive policies, hard of business access, weak of fault tolerance, inconveniences of data move. These of problems have been plagued by colleagues for a long time, until we decided to use HBase as our storage platform of history data. Why HBase could resolve these remnant problems? Next, I will introduce HBase briefly.

Introduction of HBase

HBase is a distributed and column-oriented database based on hadoop, which is derived from Bigtable of Google. It is very popular in worldwide companies, and has lots of application scenarios. What characteristics it have been? I summerize the following five points:

  • Large-scale data storage, P-level, in contrast with T-level of MySQL;
  • High performance of read and write, random access, sorted write;
  • Flexibility of horizontal expansion, no influence for existed nodes;
  • High availability, Active/Standby Master, supporting seamless handover, ensured the healthy of cluster.
  • Bulit on cheap machines, greatly reducing deployment costs.

Anyway, the above of five points are basic, and there are also some other charactertics not being listed. As these advantages of HBase, our boss decided to try to use it as our data storage platform, and aimed at replacing the history databases, supporting data storing into HBase real-time, and querying data at ms-level.

At initial stage, I was totally unknown HBase. Fortunately, HBase has been used in other department, and they have accumulated many experiences. So we decided to learn from them, after one month learning, we basically mastered the deployment of HBase and some pits of HBase, especially the configuration of HBase, which has hundreds of items. Meanwhile, we used the real-time data sync platform, which named TDBANK internally. With the help of TDBank, we can access business data conviently. Its essence is data acquisition (similar canal of alibaba), data subscribe and data publish (similar of kafka), data consumer uses storm to write data into HBase streamly.

With the help of these perfect platform components, Data from MySQL to HBase is flexible, fast, and safe. It reduces our a lot of work, and make us focus on the business logic.


This article just introduced some work and something about HBase, there are still lots of aspects not being involved. Next time, I will introduce more details of HBase. Thank you for your reading,see you next time!