Hadoop 2.8.0 – Debian 8.8 – IBM Power – parte IV

Nesta sequência de artigos, de I a IV, vamos construir um cluster com 3 nós para rodar o Hadoop 2.8.0. Iremos usar uma máquina IBM POWER5 9117-570. Este ambiente é apenas um laboratório para estudos. As máquinas virtuais serão configuradas conforme abaixo:

  • lindev01 – NodeName
  • lindev02 – DataNode1
  • lindev03 – DataNode2
  • lindev04 – ClientNode

Neste ponto já temos o Hadoop 2.8.0 instalado e o Name Node configurado, vamos agora adicionar os Data Nodes ao Master. Abaixo a configuração do /etc/hosts do Name Node com os IPs dos Data Nodes

# Hosts do ambiente Hadoop
192.168.0.13   lindev01  # Name Node
192.168.0.14   lindev02  # Data Node
192.168.0.15   lindev03  # Data Node
192.168.0.16   lindev04  # Client Node

Agora vamos informar ao Name Node (Master) os hostnames dos Data Nodes. Configuramos esta informação no arquivo $HADOOP_HOME/etc/hadoop/slaves

$ cat $HADOOP_HOME/etc/hadoop/slaves
lindev02
lindev03
$ 

Após adicionar os Data Nodes ao Master, vamos configurar cada um dos Data Nodes.

Como os Data Nodes foram configurados exatamente como o Name Node, eles possuem os mesmos pacotes, variáveis de ambiente e inclusive o Hadoop instalado no /srv/hadoop. Agora precisamos garantir que eles possuem o mesmo arquivo /etc/hosts

# Hosts do ambiente Hadoop
192.168.0.13   lindev01  # Name Node
192.168.0.14   lindev02  # Data Node
192.168.0.15   lindev03  # Data Node
192.168.0.16   lindev04  # Client Node

Configuracao do Hadoop nos DataNodes

Vamos fazer a troca das chaves ssh entre o node master e os Data Nodes para o usuário Hadoop. Copie o conteúdo do arquivo ~/.ssh/id_rsa.pub do (Master – lindev01) e adicione no arquivo ~/.ssh/authorized_keys dos Data Nodes. Após feita a cópia das chaves. Realize o login em um a um dos DataNodes

hadoop@lindev01:/$ ssh hadoop@lindev02
The authenticity of host 'lin002 (192.168.0.14)' can't be established.
ECDSA key fingerprint is 37:03:c4:71:07:f9:bf:07:91:f9:14:7e:c2:c8:cb:8d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'lin002,192.168.0.14' (ECDSA) to the list of known hosts.

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun May 14 13:17:32 2017 from lindev01
hadoop@lindev02:~$ exit
logout
Connection to lindev02 closed.
hadoop@lindev01:/$
hadoop@lindev01:/$ ssh hadoop@lindev03
The authenticity of host 'lindev03' can't be established.
ECDSA key fingerprint is 03:0d:65:6c:aa:f2:45:56:f9:8d:e4:01:17:2c:98:39.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'lindev03, (ECDSA) to the list of known hosts.

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright. 

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
Last login: Sun May 14 13:17:40 2017 from 
hadoop@lindev03:~$ exit 
logout 
Connection to lindev03 closed. 
hadoop@lindev01:/$ 
hadoop@lindev01:/$ ssh hadoop@lindev04 
The authenticity of host 'lindev04' can't be established. 
ECDSA key fingerprint is 85:e8:74:ac:48:af:de:29:be:52:fa:cf:c1:22:30:4a. 
Are you sure you want to continue connecting (yes/no)? yes 
Warning: Permanently added 'lindev04, (ECDSA) to the list of known hosts. 

The programs included with the Debian GNU/Linux system are free software; 
the exact distribution terms for each program are described in the 
individual files in /usr/share/doc/*/copyright. 

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent 
permitted by applicable law. 
Last login: Sun May 14 13:17:47 2017 from  
hadoop@lin004:~$ exit 
logout 
Connection to lindev04 closed. 
hadoop@lindev01:/$ 

Criar a estrutura de pastas no /srv/hadoop_work em cada um dos DataNodes

# mkdir -p /srv/hadoop_work/hdfs/datanode
# mkdir -p /srv/hadoop_work/yarn/local
# mkdir -p /srv/hadoop_work/yarn/log

Agora iremos copiar a instalação do Hadoop feita no NameNode para os DataNodes. Isto ira transferir todos os arquivos da instalação bem como os arquivos XML das configurações.

$ scp -r /srv lindev02:/
$ scp -r /srv lindev03:/

Finalizada a cópia, podemos iniciar o Hadoop !

$ $HADOOP_HOME/sbin/start-dfs.sh
17/05/14 14:34:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [lindev01]
The authenticity of host 'lindev01 (127.0.1.1)' can't be established.
ECDSA key fingerprint is d1:db:a9:df:71:26:d5:75:68:b9:5d:30:76:3f:09:6e.
Are you sure you want to continue connecting (yes/no)? yes
lin001: Warning: Permanently added 'lin001' (ECDSA) to the list of known hosts.
lindev01: starting namenode, logging to /srv/hadoop-2.8.0/logs/hadoop-hadoop-namenode-lindev01.out
lindev02: starting datanode, logging to /srv/hadoop-2.8.0/logs/hadoop-hadoop-datanode-lindev02.out
lindev03: starting datanode, logging to /srv/hadoop-2.8.0/logs/hadoop-hadoop-datanode-lindev03.out
Starting secondary namenodes 
...
hadoop@lindev01:/$ 

Abaixo podemos checar os daemons em execução no NameNode e nos DataNodes

hadoop@lindev01:/$ jps
1494 Jps
1200 NameNode
1349 SecondaryNameNode
hadoop@lindev01:/$ 
hadoop@lindev02:~$ jps
1141 Jps
1072 DataNode
hadoop@lindev02:~$ 

hadoop@lindev03:/$ jps
1054 DataNode
1122 Jps
hadoop@lindev03:/

Agora vamos criar alguns diretórios no Hadoop filesystem. Estas pastas do HDFS serão usadas pelo YARN MapReduce Satging, YARN Log e Job History Server.

hadoop@lindev01:/$ hadoop fs -mkdir /tmp
hadoop@lindev01:/$ hadoop fs -chmod -R 1777 /tmp

hadoop@lindev01:/$ hadoop fs -mkdir /user
hadoop@lindev01:/$ hadoop fs -chmod -R 1777 /user

hadoop@lindev01:/$ hadoop fs -mkdir /user/app
hadoop@lindev01:/$ hadoop fs -chmod -R 1777 /user/app

hadoop@lindev01:/$ hadoop fs -mkdir -p /var/log/hadoop-yarn
hadoop@lindev01:/$ hadoop fs -chmod -R 1777 /var/log/hadoop-yarn

hadoop@lindev01:/$ hadoop fs -mkdir -p /var/log/hadoop-yarn/apps
hadoop@lindev01:/$ hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps
# Agora vamos listar a estrutura de pastas criadas no HDFS
hadoop@lindev01:/$ hadoop fs -ls -R /
drwxrwxrwt   - hadoop supergroup          0 2017-05-14 14:40 /tmp
drwxrwxrwt   - hadoop supergroup          0 2017-05-14 14:42 /user
drwxrwxrwt   - hadoop supergroup          0 2017-05-14 14:42 /user/app
drwxr-xr-x   - hadoop supergroup          0 2017-05-14 14:43 /var
drwxr-xr-x   - hadoop supergroup          0 2017-05-14 14:43 /var/log
drwxrwxrwt   - hadoop supergroup          0 2017-05-14 14:44 /var/log/hadoop-yarn
drwxrwxrwt   - hadoop supergroup          0 2017-05-14 14:44 /var/log/hadoop-yarn/apps
hadoop@lindev01:/

Agora vamos iniciar o YARN

hadoop@lindev01:/$ $HADOOP_HOME/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /srv/hadoop-2.8.0/logs/yarn-hadoop-resourcemanager-lindev01.out
lindev03: starting nodemanager, logging to /srv/hadoop-2.8.0/logs/yarn-hadoop-nodemanager-lindev03.out
lindev02: starting nodemanager, logging to /srv/hadoop-2.8.0/logs/yarn-hadoop-nodemanager-lindev02.out
hadoop@lindev01:/$ 

Agora vamos iniciar o MapReduce History Server.

hadoop@lindev01:/$ $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /srv/hadoop/logs/mapred-hadoop-historyserver-lindev01.out
hadoop@lindev01:/

Abaixo podemos checar novamente os daemons em execução no NameNode e nos DataNodes

hadoop@lindev01:/$ jps
1200 NameNode
2210 JobHistoryServer
2262 Jps
1349 SecondaryNameNode
1910 ResourceManager
hadoop@lindev01:/$ 

hadoop@lindev02:~$ jps
1173 NodeManager
1072 DataNode
1302 Jps
hadoop@lindev02:~$ 

hadoop@lindev03:/$ jps
1054 DataNode
1282 Jps
1163 NodeManager
hadoop@lindev03:/$

Muito Bem ! O Cluster está Rodando ! Vamos testar o Hadoop filesystem fazendo o upload de um arquivo para o HDFS

hadoop@lindev01:/$ hadoop fs -mkdir /analysis
hadoop@lindev01:/$ hadoop fs -ls /
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2017-05-14 14:57 /analysis
drwxrwxrwt   - hadoop supergroup          0 2017-05-14 14:40 /tmp
drwxrwxrwt   - hadoop supergroup          0 2017-05-14 14:42 /user
drwxr-xr-x   - hadoop supergroup          0 2017-05-14 14:43 /var
hadoop@lindev01:/$ 

hadoop@lindev01:/$ ls -l
total 4
-rw-r--r-- 1 hadoop hadoop 21 May 15 11:26 teste.csv
hadoop@lindev01:/$

hadoop@lindev01:/$ hadoop fs -put $(pwd)/teste.csv /analysis/teste.csv
hadoop@lindev01:/$

hadoop@lindev01:/$ hadoop fs -ls /analysis
Found 1 items
-rw-r--r--   2 hadoop supergroup         21 2017-05-15 12:08 /analysis/teste.csv
hadoop@lindev01:/$ hadoop fs -tail /analysis/teste.csv
Teste, alguma, coisa
hadoop@lindev01:/$

 

 

Douglas Ribas de Mattos
E-mail: douglasmattos0@gmail.com
LinkedIn: https://www.linkedin.com/in/douglasmattos0/
Blog: http://www.douglasmattos.com
Twitter: @douglasmattos0

Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s