Load huge or large files into MySQL database

Load very large file in the MySQL database table using mysql in built bulk load utility and python and using a very local computer. I am using 8 GB of RAM and 4 core domestic processor.

Step 1:

Install required libraries:
Python library required: mysql-connector-python

pip3 install mysql-connector-python
OR
pip install mysql-connector-python

Step 2:

Setting the variables:

  1. Data File: You need to specify data file with path.
  2. Delimiter: Specify the delimiter. Example, ‘,’ or ‘\t’
  3. New Line (Line Termination) Character: Specify the character.
  4. Header: specify this as True always. This tutorial assumes you have column names as header in the file at the very first line.
  5. MySQL Host: Host name/IP of MySQL database.
  6. MySQL Port: Port of the same.
  7. MySQL DB: MySQL database.
  8. MySQL Table: Table Name where you want to import data.
  9. MySQL User: Username of MySQL server.
  10. MySQL Password: Password for Username.
# Variables/Configurations
data_file = r'/home/subha/Desktop/myworks/dataFiles/imdb/name.basics.tsv/data.tsv'
delimiter='\t'
newline = '\n'
header=True
mysql_host = "localhost"
mysql_port = "3306"
mysql_db = "myworks"
mysql_table = "imdb_name_basics_stg"
mysql_user = "your username"
mysql_password = "your password"

Note: I am using IMDB dataset (name.basics) which has around 12 million rows and 738MB size on disk. Here is the link to the file below:
https://datasets.imdbws.com/name.basics.tsv.gz

Step 3:
Import the python library

# import the mysql library
import mysql.connector as mysql

Step 4:
Read the first line for the header/columns.

# read first line of the input file to create header
inFile = open(data_file,"r")
header = inFile.readline()
inFile.close()

Step 5:
Extract column lists

# Extract column lists
column_list = header.split(delimiter)

Step 6:
Construct Create Table Statement

# Construct Create table statement
create_table_statement = "CREATE TABLE " + mysql_db + "." + mysql_table + "("
for a_field in column_list:
    create_table_statement += a_field.replace("\n","") + " text,"
create_table_statement = create_table_statement[:-1] + ")"
print("Create Table Statement:")
print(create_table_statement)

Step 7:
Connecting to the MySQL DB

# Connect to mysql DB
db = mysql.connect(
    host = mysql_host,
    user = mysql_user,
    passwd = mysql_password,
    database = mysql_db,
    allow_local_infile=True
)

Step 8:
Creating Cursor:

cursor = db.cursor()

Step 9:
Creating Table

# Creating table (DDL hence auto commit)
cursor.execute(create_table_statement)

Step 10:
Construct Load Statement

# Constructing load statement
load_statement = "LOAD DATA LOCAL INFILE '" + data_file + "'\nINTO TABLE " + mysql_db + "." + mysql_table + "\nFIELDS TERMINATED BY '" + delimiter + "'" + "\nLINES TERMINATED BY '" + newline + "'" + "\nIGNORE 1 LINES\n("
for a_field in column_list:
    load_statement += a_field.replace("\n","") + ","
load_statement = load_statement[:-1] + ");"
print("Load Table Statement:")
print(load_statement)

Step 11:
Loading the file and finishing

cursor.execute(load_statement)
db.commit()
db.close()
print("Complete")

We are using below statement to load the data file into MySQL DB:

LOAD DATA LOCAL INFILE '/home/subha/Desktop/myworks/dataFiles/imdb/name.basics.tsv/data.tsv'
INTO TABLE myworks.imdb_name_basics_stg
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles);

The loading takes around 3/4 minutes to load all 12M records into MySQL table. IMDB has more huge files too. But it takes around 10/12 minute max to load the same using above method.
Here is the entire code together. Just change the Configuration section and run the script.

# Variables/Configurations
data_file = r'/home/subha/Desktop/myworks/dataFiles/imdb/name.basics.tsv/data.tsv'
delimiter='\t'
newline = '\n'
header=True
mysql_host = "localhost"
mysql_port = "3306"
mysql_db = "myworks"
mysql_table = "imdb_name_basics_stg"
mysql_user = "your username"
mysql_password = "user password"

# import the mysql library
import mysql.connector as mysql
# ----------------------------------
# read first line of the input file to create header
inFile = open(data_file,"r")
header = inFile.readline()
inFile.close()
# ---------------------------------------------------
# Extract column lists
column_list = header.split(delimiter)
# ---------------------------------------------------
# Construct Create table statement
create_table_statement = "CREATE TABLE " + mysql_db + "." + mysql_table + "("
for a_field in column_list:
    create_table_statement += a_field.replace("\n","") + " text,"
create_table_statement = create_table_statement[:-1] + ")"
print("Create Table Statement:")
print(create_table_statement)
print("---------------------------------------------------------------------------")
# ---------------------------------------------------
# Connect to mysql DB
db = mysql.connect(
    host = mysql_host,
    user = mysql_user,
    passwd = mysql_password,
    database = mysql_db,
    allow_local_infile=True
)
# ---------------------------------------------------
cursor = db.cursor()
# Creating table (DDL hence auto commit)
cursor.execute(create_table_statement)
# --------------------------------------
# Constructing load statement
load_statement = "LOAD DATA LOCAL INFILE '" + data_file + "'\nINTO TABLE " + mysql_db + "." + mysql_table + "\nFIELDS TERMINATED BY '" + delimiter + "'" + "\nLINES TERMINATED BY '" + newline + "'" + "\nIGNORE 1 LINES\n("
for a_field in column_list:
    load_statement += a_field.replace("\n","") + ","
load_statement = load_statement[:-1] + ");"
print("Load Table Statement:")
print(load_statement)
print("---------------------------------------------------------------------------")
# -------------------------------------
cursor.execute(load_statement)
db.commit()
db.close()
print("Complete")

Happy Coding !! Need help? Post your need in this form:
NEED HELP

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top