IMDB FOR TESTING MySQL


IMDB is a fairly large database of movies.

IMDb database can be downloaded using a Python script named IMDbPY.py



UBUNTU AND DEBIAN.

Simple install on Debian / Ubuntu:
sudo apt-get install python-imdbpy

Download the imdby2sql script from here:

http://sourceforge.net/p/imdbpy/code/ci/default/tree/bin/imdbpy2sql.py

Download plain text data files from:

ftp://ftp.fu-berlin.de/pub/misc/movies/database/

Additional mirrors:
http://www.imdb.com/interfaces/
You can use wget by creating a new directory and running:

wget -r ftp://ftp.fu-berlin.de/pub/misc/movies/database/

Create tables and populate the database:
imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u 'mysql://root:root@localhost/imdb'

Additional documentation available here:
http://imdbpy.sourceforge.net/docs/README.sqldb.txt

CENTOS AND REDHAT.

Installing IMDbPy and SQLObject

1. Install required packages using yum:
$ yum install -y gcc python python-devel openssl-devel libxml2-devel libxslt-devel zlib-devel MySQL-python python-setuptools python-pip
2. Install SQLObject using Python EasyInstall:
$ easy_install -U SQLObject
3. Download IMDbPy from this page into the MySQL server, extract it and start the installation process:
$ wget http://prdownloads.sourceforge.net/imdbpy/IMDbPY-4.9.tar.gz
$ tar -xzf IMDbPY-4.9.tar.gz
$ cd IMDbPY-*
$ python setup.py install

Importing Data

1. Create a directory to dump all the data files that we will download:
$ mkdir /root/data
$ cd /root/data
2. Download only .gz file from the IMDb mirror site to /root/data :
$ wget -r --accept="*.gz" --no-directories --no-host-directories --level 1 ftp://ftp.fu-berlin.de/pub/misc/movies/database/
3. Create a database in MySQL called ‘imdb’, with user ‘imdb’ and password ‘imdb’. We will then GRANT the user to the designated database:
mysql> CREATE DATABASE imdb;
mysql> GRANT ALL PRIVILEGES ON imdb.* TO 'imdb'@'localhost' IDENTIFIED BY 'imdb';
mysql> FLUSH PRIVILEGES;
3. Start the import process with -u and -d flag:
$ imdbpy2sql.py -d /root/data/ -u 'mysql://imdb:imdb@localhost/imdb'
Take note that -d is the directory of the .gz dump files are located and -u is the connection string for our MySQL database server. You can change the connection string to any of SQLObject’s supported database such as PostgreSQL, SQLite, Firebird and MAX DB. Please refer to this documentation for details.
You will see similar output as below which indicates the importing process has started:
SCANNING movies: Last Sunset (2006) (movieID: 2130001)
SCANNING movies: Legend of Hell (2012) (movieID: 2140001)
SCANNING movies: Lifestyles of Squirrels (2011) (movieID: 2150001)
SCANNING movies: Los signos del tiempo (1983) (movieID: 2160001)
SCANNING movies: Madame T (2012) (movieID: 2170001)
SCANNING movies: Marijji ringu (2007) (movieID: 2180001)
SCANNING movies: Menculik miyabi (2010) (movieID: 2190001)
* FLUSHING MoviesCache...
Wait up until it finish and you will have large sample data to play around in your MySQL server!


Comentarios