Analyse Mysqldump File with Python
There is one project that requires to load file data to HBase recently. Here some points will be introduced and hope they are helpful for everyone.
Nowadays the technology of big data has been applied in many industrial and technical areas. As a finance technology company, there are tons of data need to be dealed each day. So many tools have been developped by internet companies, such as Google, Amazon, Microsoft, Netflix, and so on.
This chapter will not introduce those big data platforms, that maybe introduce them in next series of chapters. I will focus on dealing file data with Python here.
The data that need to be dealed has been stored in many files, which was created with mysqldump tool. The size of data is aboue 500 GB. My work focuses on parsing file and extracting
INSERT records, and then dealing with the records by
As we known, the mysqldump file is text style and has lots of miscellaneous information that we don’t have to process. So I need to think how to parse out the useful information. There are some points as below.
As we known, the record format of mysqldump file is such as
INSERT INTO .... VALUES.., maybe there are multi-rows in one insert record. The key point is to parse how many fields after
VALUES keyword and how many rows in one insert record.
The first language I thought was Python, which is convient for us to build parsing program.
The dump file format is about gigbytes with gz suffix, So the machine memory is one of influencing factors that we need to notice in program.
If you want to accelerate the parsing speed, you can use multi-threads technology.
The parsing program is written with Python.
In general speaking, the parsing program utilizes the csv module of Python to analyse the
INSERT record of dump file, if the dump file is not the standard format, the parsing process will be failed. If there are tons of dump files, you can consider the multi-processes or multi-threads.
If you want to reprint, please mark origin author. Please let me know if you have any doubts about the article.Welcome to comment here or email to firstname.lastname@example.org
Article Title:Analyse Mysqldump File with Python
Publish Time:2019-07-26, 23:00:00
Last Updated:2019-07-28, 21:14:55Original Link:http://zendwind.com/2019/07/26/python-read-dump/
Copyright: "Signature-Non commercial-Reservation 4.0" Any reprints required the reservation of original author, thank you!