Install on Mac OSX
Install on Mac OSX
Good!
Mac OS X Yosemite安装Hadoop 2.6记录
......
如何在MacOSX上安装hadoop 其中一篇文章地址为how-to-install-hadoop-on-mac-os-x 另外一篇文章的地址为how-to-setup-hadoop-on-mac-os-x-10-9-mavericks, 本文综合以上两篇文章,经过自己测试成功,于是写上这篇,以记录并希望可以帮助到其它人。
简介
Hadoop是Apache基金会下的项目,它能够处理非常大的数据集在分布式计算环境,它可以运行在三种模式下:
独立式:
Hadoop运行所有的东西在无后台的单独的JVM中,这种模式适合在开发阶段测试与Debug MapReduce程序
伪分布式:
Hadoop做为后台应用运行在本地机器,模拟小集群
全分布式:
Hadoop做为后台应用运行真实的集群电脑中
准备条件
Java 1.6. + ———-
Hadoop 需要 Java 版本在1.6. 或以上, 运行java -version 在你的终端中,它会显示你电脑上安装的Java版本: ➜ Downloads java -version java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
SSH keys
首先确保远程登录(Remote Login)是打开的,在系统配置(System Preferences)->分享(Sharing)中可以设置,如果在终端中输入ssh localhost有返回,则你已经配置好了SSH keys,如果没有,可新建:
ssh-keygen -t rsa -P ""
为了避免每次询问你的密码,可以授权你的公钥到本地:
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
现在应该可以使用SSH在你的电脑上了,执行: ssh localhost
步骤1 安装homebrew,如果已经安装,跳到第二步 ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
步骤2 使用brew 进行安装hadoop, 假定我们安装的是hadoop 2.6.0 brew install hadoop
步骤3 配置hadoop
cd /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop
添加下面的行到 hadoop-env.sh :
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
hadoop-env.sh
文件在/usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh
将
# Extra Java runtime options.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"修改为
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="添加下面的行到core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>添加下面的行到hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>添加下面的行到mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>绝大部分已经准备好了 我们必须格式化新安装的HDFS在我们启动运行后台程序前,格式并创新空的文件系统通过创建存储目录和初始化元数据。执行:
hadoop namenode -format启动后台程序
确保你已经可以SSH到你的本机,通过start-dfs.sh启动HDFS,start-all.sh启动MapReduce
接下来是进到 hadoop 的安装目录 /usr/local/Cellar/hadoop/2.6.0/sbin 然后执行 ./start-dfs.sh 和 ./start-yarn.sh 就可以启动 Hadoop了。 不过这里会出现一个 警告:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable这对Hadoop的运行没有影响,关于这个警告后面再讲。 为了启动Hadoop的时候避免每次都首先进到安装目录,然后再执行./start-dfs.sh 和 ./start-yarn.sh这么麻烦,所以在编辑 ~/.profiles文件,加上如下两行:
alias hstart="/usr/local/Cellar/hadoop/2.6.0/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/start-yarn.sh"
alias hstop="/usr/local/Cellar/hadoop/2.6.0/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.6.0/sbin/stop-dfs.sh"然后执行 $ source ~/.profile 更新。 这样可以就可以用 hstart 和 hstop 这两个简单明了启动Hadoop了。
监视
我们可以监视HDFS,MapReduce HDFS Administrator : http://localhost:50070 (在此次执行中可以) MapReduce Administrator : http://localhost:50030 (访问不了,以后找到原因再补上)
运行範例
安装完了之后肯定想看看能不能使用,hadoop自带了一个例子。
$ hadoop jar <path to the hadoop-examples file> pi 10 100
$ hadoop jar /usr/local/Cellar/hadoop/2.3.0/libexec/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 2 5得到的结果可能是这样的:
Wrote input for Map #0
Wrote input for Map #1
Starting Job
...
Job Finished in 1.685 seconds
Estimated value of Pi is 3.60000000000000000000然后可以通过Web端进行监控。
Resource Manager: http://localhost:50070
JobTracker: http://localhost:8088
Specific Node Information: http://localhost:8042
通过他们可以访问 HDFS filesystem, 也可以取得结果输出文件.
其他參考
How to Install Hadoop on Mac OS X
How to Install Hadoop on Mac OS X
STEP 1: INSTALL HOMEBREW
$ ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
STEP 2: INSTALL HADOOP
$ brew install hadoop
Let’s assume that brew installs Hadoop 1.1.2.
STEP 3: CONFIGURE HADOOP
$ cd /usr/local/Cellar/hadoop/1.1.2/libexec
Add the following line to conf/hadoop-env.sh:
export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
Add the following lines to conf/core-site.xml inside the configuration tags:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Add the following lines to conf/hdfs-site.xml inside the configuration tags:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
Add the following lines to conf/mapred-site.xml inside the configuration tags:
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
STEP 4: ENABLE SSH TO LOCALHOST
Go to System Preferences > Sharing.
Make sure “Remote Login” is checked.
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
STEP 5: FORMAT HADOOP FILESYSTEM
$ bin/hadoop namenode -format
STEP 6: START HADOOP
$ bin/start-all.sh
Make sure that all Hadoop processes are running:
$ jps
Run a Hadoop example:
$ bin/hadoop jar /usr/local/Cellar/hadoop/1.1.2/libexec/hadoop-examples-1.1.2.jar pi 10 100
Hadoop logs: /usr/local/Cellar/hadoop/1.1.2/libexec/logs/
Web interface for Hadoop NameNode: http://localhost:50070/
Web interface for Hadoop JobTracker: http://localhost:50030/
RELATED POSTS
How to Run Hadoop in Pseudo-Distributed Mode on Mac OS X
How to Run Hadoop in Standalone Mode Using Eclipse on Mac OS X
One-Step Matrix Multiplication with Hadoop
Two-Step Matrix Multiplication with HadoopLast updated
Was this helpful?