Mahbub's Blog

SpatialHadoop is a framework which includes spatial data processing support in each layer of Hadoop namely Storage, MapReduce, Operations, and Language layers. In this blog, I will explain the configuration of SpatialHadoop on 4-Node Hadoop Cluster. If you want learn more about "what is SpatialHadoop and how is it works" then check out the followings:

SpatialHadoop: http://spatialhadoop.cs.umn.edu/

A. Eldawy, M. F. Mokbel, and C. Jonathan. "HadoopViz: A MapReduce Framework for Extensible Visualization of Big Spatial Data". ICDE 2016, 2016.

A. Eldawy and M. F. Mokbel. "SpatialHadoop: A MapReduce Framework for Spatial Data". IEEE ICDE 2015.

A. Eldawy and M. F. Mokbel. "Pigeon: A Spatial MapReduce Language" IEEE ICDE 2014.

Prerequisit:

You need to install and configure Hadoop cluster (if you don't have it already). You can see the following link: http://emahbub.blogspot.ca/2017/01/hadoop.html

Pig installed and configured correctly with Hadoop. https://pig.apache.org/

JDK 1.6+

SpatialHadoop

(1) Download the latest version of SpatialHadoop. http://spatialhadoop.cs.umn.edu/

(2) Extract the downloaded compressed file into the home directory of Hadoop. i.e. merge the

     SpatialHadoop files with Hadoop.

(3) set the JAVA_HOME to /etc/hadoop/hadoop-env.sh (if you don’t set it already)

(4) you can test your installation by running examples given in http://spatialhadoop.cs.umn.edu/

Pig

(1) Download a recent stable release of pig from https://pig.apache.org/

(2) Unpack the downloaded Pig distribution and add the following environment variables to ~/.bashrc

export PIG_HOME=/path/to/hadoop/pig-0.16.0

export PATH=$PATH:$PIG_HOME/bin

export PIG_CLASSPATH=$HADOOP_CONF_DIR

(3) Test the pig installation:

pig -version

pig -help

(4) Test run: run a pig script using Hadoop MapReduce

Suppose we have a text file(student.txt) containing following information:

001,Rajiv,Reddy,21,984802233,Hyderabad

002,siddarth,Battacharya,22,9848022338,Kolkata

003,Rajesh,Khanna,22,9848022339,Delhi

004,Preethi,Agarwal,21,9848022330,Pune

005,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar

006,Archana,Mishra,23,9848022335,Chennai

007,Komal,Nayak,24,9848022334,trivendram

008,Bharathi,Nambiayar,24,9848022333,Chennai

And a pig script(student.pig) with followings commands:

std = LOAD './pig/student.txt' USING PigStorage(',') as (id:int, firstname:chararray, lastname:chararray, age:int, phone:chararray, city:chararray);

name = FOREACH std GENERATE firstname;

DUMP name;

Start hadoop cluster and test by running the following commands:

$ start-all.sh

$ hadoop dfs -mkdir /path/to/pig

$ hadoop dfs -copyFromLocal /path/to/pig/student.txt

/path/to/pig

$ cd pig (go to folder where you keep your pig script)

~/pig$ pig student.pig

(Rajiv)

(siddarth)

(Rajesh)

(Preethi)

(Trupthi)

(Archana)

(Komal)

(Bharathi)

Pigeon

Download latest JAR file of Pigeon from http://spatialhadoop.cs.umn.edu/pigeon/

Or you can get the latest version of Pigeon from source (https://github.com/aseldawy/pigeon). Download and unzip the source and run the following command

mvn assembly:assembly

Also, you need to download the following two jar files (SpatialHadoop package already have these JARs /spatialhadoop-2.4.2-bin/share/hadoop/common/lib)

jts-1.13.jar;

esri-geometry-api-1.2.1.jar;

Create a folder (say pigeon) and keep these JARs
Also keep all the data and pig scripts in this folder
trajectory.pig scripts contain the following line:

REGISTER 'pigeon-0.2.2.jar';

REGISTER 'esri-geometry-api-1.2.1.jar';

REGISTER 'jts-1.13.jar';

IMPORT 'pigeon_import.pig';

points = LOAD './pigeon/trajectory.tsv' AS (type, time: datetime, lat:double, lon:double);

s_points = FOREACH points GENERATE ST_MakePoint(lat, lon) AS point, time;

points_by_time = ORDER s_points BY time;

points_grouped = GROUP points_by_time ALL;

lines = FOREACH points_grouped GENERATE ST_AsText(ST_MakeLine(points_by_time));

STORE lines INTO 'line';

Start Hadoop Cluster and do the followings:

$ start-all.sh

$ hadoop dfs -mkdir /path/to/pigeon

$ hadoop dfs -copyFromLocal /path/to/pigeon/trajectory.tsv

/user/bigdata/pigeon

$ cd pigeon (go to folder where you keep your pig script and other JARs)

~/pigeon$ pig trajectory.pig

~/pigeon$ hadoop dfs -cat /path/to/line/part-r-00000

Thanks...Mahbub

Mahbub's Blog

Tuesday, June 13, 2017

SpatialHadoop Installation on Multi-Node Cluster

Prerequisit:

SpatialHadoop

Pigeon

About Me

Followers