Hadoop 入门简单例子（linux 系统环境，Java 代码）

发表时间：2022-03-26来源：网络

内容概述

使用的是 linux 可视化开发环境 (UbuntuKylin 16.04 )不含 hadoop 环境配置以及 Java 环境配置如何启动与关闭 hadoopJava 代码 hdfs 简单例子使用的是 eclipse IDE

启动与关闭 hadoop hdfs

首先找到 hadoop 的安装位置，本例中 hadoop 的安装位置为 /home/ubuntu/data/hadoop 即 ~/data/hadoop。

找到 hadoop 里面的可执行文件，这里只介绍其中的三个可执行文件。

sbin/start-dfs.sh：启动 hdfs 服务。sbin/stop-all.sh：关闭所有 hadoop 所有服务。bin/hdfs：管理 hdfs 时使用，比如查看文件系统中所有文件等。

启动 hdfs：

$ cd ~/data/hadoop $ sbin/start-dfs.sh

关闭 hdfs：

$ cd ~/data/hadoop $ sbin/stop-all.sh

查看 hdfs 是否正常启动

通过以下命令查看 9000 端口是否占用来查看 hdfs 是否启动：

$ netstat -tunple

eclipse 更改默认的JDK

新建项目 HelloWorld 后，右键点击项目，依次选择 Build Path Configure Build Path…,

找到中间上方的Libraries并点击，然后在这里找到右侧的 Add Library…

接着选择第一个 JRE System Library，然后点击下面的 Next

接着可以看到，

点击Finish 就完成了。

eclipse 添加项目依赖 jar 包

新建项目后，需要确认项目依赖使用的 jdk，而不是 eclipse 默认自带的。方法比较简单，首先找到 hadoop 的位置：/home/ubuntu/data/hadoop

在 hadoop 的目录下，找到 share/hadoop/hdfs，在这个路径下，

找到 lib 文件夹，把里面所有的 jar 文件，将需要添加到项目依赖。

找到 hadoop-hdfs-2.7.5.jar ，将需要添加到项目依赖。

在 hadoop 的目录下，找到 share/hadoop/common，在这个路径下，

找到 lib 文件夹，把里面所有的 jar 文件，将需要添加到项目依赖。找到 hadoop-common-2.7.5.jar ，将需要添加到项目依赖。

接下来需要把前面找到的那些依赖包导入项目中，导入步骤可以参考百度经验。

java 代码编写

import org.apache.hadoop.fs.*; import org.apache.hadoop.conf.Configuration; import java.io.*; import java.net.URI; import java.net.URISyntaxException; public class HDFSApi { /** * test: 查看文件是否存在 * @param conf: 相关参数配置 * @param path: 需要查看的文件 * @return 存在则返回 true * @throws IOException */ public static boolean test(Configuration conf, String path) throws IOException { FileSystem fs = FileSystem.get(conf); return fs.exists(new Path(path)); } /** * copyFromLocalFile: 把本地文件复制到hdfs * @param conf: 相关配置 * @param localFilePath：本地文件路径 * @param remoteFIlePath：远程目标路径 * @throws IOException */ public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFIlePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path localPath = new Path(localFilePath); Path remotePath = new Path(remoteFIlePath); fs.copyFromLocalFile(false, true,localPath,remotePath); fs.close(); } /** * 在后面添加 * @param conf：相关配置 * @param localFilePath：本地文件路径 * @param remoteFIlePath：远程目标路径 * @throws IOException */ public static void appendToFile(Configuration conf,String localFilePath, String remoteFIlePath) throws IOException { FileSystem fs = FileSystem.get(conf); Path remotePath = new Path(remoteFIlePath); FileInputStream in = new FileInputStream(localFilePath); FSDataOutputStream out = fs.append(remotePath); byte[] data = new byte[1024]; int read = -1; // 循环添加爱 while((read=in.read(data))>0) { out.write(data,0,read); } out.close(); in.close(); fs.close(); } public static void main(String[] args) throws IOException, URISyntaxException, InterruptedException { Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://localhost:9000"); String localFilePath = "/home/ubuntu/text.txt"; String remoteFilePath = "/user/hadoop/text.txt"; String choice = "overwrite"; try { Boolean fileExists = false; // 查看是否已经存在 if(HDFSApi.test(conf, remoteFilePath)) { fileExists = true; System.out.println(remoteFilePath+" 已存在."); } else { System.out.println(remoteFilePath+" 不存在."); } // 如果不存在 if(!fileExists) { HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath); System.out.println(localFilePath+" 上传 "+remoteFilePath); } else if (choice.equals("overwrite")) { HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath); System.out.println(localFilePath+" 重写 "+remoteFilePath); } else if (choice.equals("append")) { HDFSApi.appendToFile(conf, localFilePath, remoteFilePath); System.out.println(localFilePath+" 追加 "+remoteFilePath); } } catch (Exception e) { e.printStackTrace(); } } }

第一次运行的时候，效果可能如下：

这是因为复制的原来的文件不存在，dfs 中 /user/hadoop/text.txt 也不存在。所以需要手动地在对应地位置 /home/ubuntu/ 新建一个 text.txt 文件，随便写写东西。