Spark环境搭建

###服务器环境

~$ cat /proc/version
Linux version 4.4.0-24-generic (buildd@lcy01-30) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2.1) ) #43-Ubuntu SMP Wed Jun 8 19:25:16 UTC 2016

###Spark安装教程 ####JDK安装

~$ java -version
java version "1.8.0_92"
Java(TM) SE Runtime Environment (build 1.8.0_92-b14)
Java HotSpot(TM) Server VM (build 25.92-b14, mixed mode)

####Scala安装

  • 版本:2.10.5
  • 下载地址
  • 环境变量配置

    export SCALA_HOME=/home/coldface/software/scala-2.10.5
    export PATH=${SCALA_HOME}/bin:$PATH
    
  • 使修改生效执行source /etc/profile
  • 在命令行输入scala,出现下列信息说明安装成功
 ~$ scala
 Welcome to Scala 2.10.5 (Java HotSpot(TM) Server VM, Java 1.8.0_92).
 Type in expressions for evaluation. Or try :help.

 scala>

####Spark安装

  • 版本:spark-1.6.1-bin-hadoop2.6
  • 下载地址
  • 环境变量配置
 export SPARK_HOME=/home/coldface/software/spark-1.6.1-bin-hadoop2.6
 export PATH=${SPARK_HOME}/bin:$PATH
  • 使修改生效执行source /etc/profile
  • 在spark安装bin目录执行./spark-shell
 Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

 Using Scala version 2.10.5 (Java HotSpot(TM) Server VM, Java 1.8.0_92)

####运行Spark WordCount示例

  • 在命令行输入:./spark-shell开启spark
  • 把输入文件加载进RDD
 scala> var textFile = sc.textFile("/home/coldface/data.txt")
  • 执行MapReduce操作
 scala> val workCounts=textFile.flatMap(line => line.split(" ")).map(word=>(word,1)).reduceByKey((a, b) => a + b)
  • 查看结果(查询每个单词出现的次数)
 scala> workCounts.collect()
res2: Array[(String, Int)] = Array((Hamstra,1), (Zhu	Databricks,1), (Liang	Hortonworks,1), (Das,1), (Sharma,1), (Haberman,1), (Webtrends,1), (Dai,1), (Andy,1), (Graves,1), (Bizo,1), (Dave,1), (IBM,2), (Databricks,,1), (Alluxio,,1), (Dudziak,1), (DB,1), (Or	Databricks,1), (B.V.,1), (Intel,2), (Tsai	Netflix,1), (ClearStory,1), (Vanzin	Cloudera,1), (Data,2), (MIT,1), (Arbor,1), (University,1), (Zaharia,1), (Ram,1), (Wenchen,1), (Alibaba,1), (Robert,1), (Fan	Databricks,1), (Thomas,2), (Yin,1), (UC,4), ("",52), (Meng	Databricks,1), (McNamara,1), (Health,1), (Databricks,6), (Patrick,1), (Josh,1), (Huai	Databricks,1), (Ryza	Clover,1), (Charles,1), (Reiss,1), (Matei,1), (Xia,1), (Ryan,1), (Bradley	Databricks,1), (Davies,1), (Facebook,1), (Michigan,,1), (Mridul,1), (Michael,1), (Rashid,1), (Marc...
----EOF-----

Categories: spark Tags: spark