DataSphere

来自ling
跳转至: 导航搜索

常用链接

https://github.com/WeBankFinTech/DataSphereStudio-Doc/blob/main/zh_CN/%E5%AE%89%E8%A3%85%E9%83%A8%E7%BD%B2/DSS%E5%8D%95%E6%9C%BA%E9%83%A8%E7%BD%B2%E6%96%87%E6%A1%A3.md


Linkis1.0.2 安装及使用指南 https://www.jianshu.com/p/d0e8b605c4ce

WeDataSphere 常见问题(含DSS,Linkis等)QA文档 https://docs.qq.com/doc/DSGZhdnpMV3lTUUxq

systemctl stop firewalld
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动

http://192.168.74.135:50070/

http://192.168.74.135:8080

http://192.168.74.135:20303/

http://192.168.74.135:8088

安装准备

yum -y install yum-utils
yum-config-manager --disable mysql80-community	
yum-config-manager --enable mysql57-community
yum repolist enabled | grep mysql
yum install -y  mysql-community-server
yum install -y telnet,tar,sed,dos2unix,unzip,expect

http://nginx.org/en/linux_packages.html#RHEL-CentOS

touch /etc/yum.repos.d/nginx.repo
vi /etc/yum.repos.d/nginx.repo
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true

[nginx-mainline]
name=nginx mainline repo
baseurl=http://nginx.org/packages/mainline/centos/$releasever/$basearch/
gpgcheck=1
enabled=0
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
yum install yum-utils

yum install -y nginx
whereis nginx

perl

https://www.perl.org/get.html 项目部署#安装nginx依赖:查看是否已经安装

wget https://www.cpan.org/src/5.0/perl-5.34.1.tar.gz
tar -xzf  perl-5.34.1.tar.gz 
cd perl-5.34.1
mv /usr/bin/perl /usr/bin/perl.bak
./Configure -des -Dprefix=/usr/local/perl
make&&make install
perl -v


ln -s /usr/local/perl/bin/perl /usr/bin/perl

mysql

https://www.cnblogs.com/milton/p/15418572.html

wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql-community-server

b、然后手动下载

wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm

c、然后安装该数据库的rpm包

rpm -ivh mysql-community-release-el7-5.noarch.rpm

d、开始安装mysql-server

yum install mysql-server

(2)卸载mysql-community-release-el7-5.noarch

rpm -e --nodeps mysql-community-release-el7-5.noarch

centos8的镜像包不在维护,重装centos后一切都好了

wget https://cdn.mysql.com//Downloads/MySQL-8.0/mysql-8.0.28-1.el8.x86_64.rpm-bundle.tar
tar -xvf mysql-8.0.28-1.el8.x86_64.rpm-bundle.tar
rpm -ivh mysql-community-common-8.0.28-1.el8.x86_64.rpm
rpm -ivh mysql-community-client-plugins-8.0.28-1.el8.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.28-1.el8.x86_64.rpm
rpm -ivh mysql-community-client-8.0.28-1.el8.x86_64.rpm
rpm -ivh  mysql-community-icu-data-files-8.0.28-1.el8.x86_64.rpm
rpm -ivh mysql-community-server-8.0.28-1.el8.x86_64.rpm


mysqld --console查看日志后发现是data文件的问题,将data文件手动删除之后使用mysqld --initalize-insecure 系统自动生成data文件夹及内部文件,再使用mysqld -install 重新安装

rm -rf /var/lib/mysql
mysqld --initalize-insecure
mysqld -install
tail -f /var/log/mysqld.log

安装hadoop2.7.2

http://192.168.74.135:50070/

http://192.168.74.135:8088/


https://blog.csdn.net/qq_44665283/article/details/121329554

mkdir /datasphere
cd /datasphere
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
 tar -zxvf hadoop-2.7.2.tar.gz -C /datasphere

vi /datasphere/hadoop-2.7.2/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/datasphere/jdk1.8.0_91

vi /datasphere/hadoop-2.7.2/etc/hadoop/core-site.xml

<configuration>
    <!-- 指定HDFS老大(namenode)的通信地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.74.135:9000</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储路径 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/datasphere/hadoop-2.7.2/tmp</value>
    </property>
</configuration>

vi /datasphere/hadoop-2.7.2/etc/hadoop/hdfs-site.xml

<configuration>
    <!-- 设置hdfs副本数量 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

SSH免密登录

  1. 到 root 目录下:
cd /root
  1. 执行生成密钥命令:
ssh-keygen -t rsa
  1. 然后三个回车
  2. 然后复制公钥追加到第一台节点的公钥文件中:
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.74.135
  1. 选择 yes
  2. 输入登录第一台节点的密码(操作完成该节点公钥复制到第一台节点中)

配置环境变量

vim /etc/profile
export HADOOP_HOME=/datasphere/hadoop-2.7.2/
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$MAVEN_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile

hdfs 启动与停止

第一次启动得先格式化(最好不要复制):

hdfs namenode -format

启动hdfs

start-dfs.sh

(9)开放50070端口

添加永久开放的端口

firewall-cmd --add-port=50070/tcp --permanent
firewall-cmd --reload

http://192.168.74.135:50070/

(10) 配置yarn启动

1、配置mapred-site.xml

cd /datasphere/hadoop-2.7.2/etc/hadoop/
mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
    <!-- 通知框架MR使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

2、配置yarn-site.xml

<configuration>
    <!-- reducer取数据的方式是mapreduce_shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

3、启动yarn

start-yarn.sh

浏览器访问(防火墙开放8088端口):

firewall-cmd --add-port=8088/tcp --permanent
firewall-cmd --reload

http://192.168.74.135:8088/

至此,我们Hadoop的单机模式搭建成功。

Hive2.3.3的安装

https://blog.csdn.net/qq_44665283/article/details/121147347 下载地址:

http://archive.apache.org/dist/hive/hive-2.3.3/

wget http://archive.apache.org/dist/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz
tar -zxvf apache-hive-2.3.3-bin.tar.gz -C /datasphere
mv apache-hive-2.3.3-bin hive-2.3.3

1 解压配置环境变量

  1. 配置环境变量
sudo vi /etc/profile

末尾追加

export  HIVE_HOME=/datasphere/hive-2.3.3
export  PATH=$PATH:$HIVE_HOME/bin

重新编译环境变量生效

source /etc/profile

2 配置Hive文件

2.1 修改hive-env.sh

cp hive-env.sh.template hive-env.sh


# HADOOP_HOME=${bin}/../../hadoop
打开注释修改 HADOOP_HOME=/datasphere/hadoop-2.7.2
# export HIVE_CONF_DIR=
打开注释修改 HIVE_CONF_DIR=/datasphere/hive-2.3.3/conf

2.2 修改hive-log4j.properties

修改hive的log存放日志到/datasphere/hive-2.3.3/logs

cp hive-log4j2.properties.template hive-log4j2.properties


vi hive-log4j2.properties

找到 property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}

修改 property.hive.log.dir = /datasphere/hive-2.3.3/logs

3 配置MySQL作为Metastore

默认情况下, Hive的元数据保存在了内嵌的 derby 数据库里, 但一般情况下生产环境使用 MySQL 来存放 Hive 元数据。

安装mysql,拷贝 mysql-connector-java-5.1.47.jar 放入 $HIVE_HOME/lib 下。

3.2 修改配置文件

参数配置文档:https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin

复制hive-default.xml.template为hive-site.xml 文件,删除掉configuration里的配置信息,重新配置 MySQL 数据库连接信息。

 cp hive-default.xml.template hive-site.xml
touch hive-site.xml
vi hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration> 
<!--Hive作业的HDFS根目录位置 --> 
<property>
    <name>hive.exec.scratchdir</name>
    <value>/user/hive/tmp</value>
</property>
<!--Hive作业的HDFS根目录创建写权限 --> 
<property>
    <name>hive.scratch.dir.permission</name>
    <value>733</value>
</property>
<!--hdfs上hive元数据存放位置 --> 
<property>  
  <name>hive.metastore.warehouse.dir</name>  
  <value>/user/hive/warehouse</value>   
</property>
<!--连接数据库地址,名称 -->  
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://rm-8vbe87b5295dz08zhxo.mysql.zhangbei.rds.aliyuncs.com:3306/hive?createDatabaseIfNotExist=true</value>  
</property>  
<!--连接数据库驱动 --> 
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>  
  <value>com.mysql.cj.jdbc.Driver</value>  
</property> 
<!--连接数据库用户名称 -->  
<property>  
  <name>javax.jdo.option.ConnectionUserName</name>  
  <value>lingcloud</value>
</property> 
<!--连接数据库用户密码 -->  
<property>  
  <name>javax.jdo.option.ConnectionPassword</name>  
  <value>Wb19831010!</value>
</property>
<!--客户端显示当前查询表的头信息 --> 
 <property>
  <name>hive.cli.print.header</name>
  <value>true</value>
</property>
<!--客户端显示当前数据库名称信息 --> 
<property>
  <name>hive.cli.print.current.db</name>
  <value>true</value>
</property> 
</configuration>

3.3 mysql创建hive用户密码

CREATE DATABASE hive; 
USE hive; 
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive'; 
GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive'; 
FLUSH PRIVILEGES; 

4.1 初始化数据库 从Hive 2.1开始,我们需要运行下面的schematool命令作为初始化步骤。例如,这里使用“mysql”作为db类型。

schematool -dbType mysql -initSchema

执行成功后,可以使用Navicat Premium 查看元数据库 hive 是否已经创建成功。

4.2 启动 Hive 客户端

启动Hadoop服务,使用 Hive CLI(Hive command line interface), **hive --service cli和hive效果一样,**可以在终端输入以下命令

hive

安装spark

https://spark.apache.org/downloads.html

a、下载安装
 wget https://dlcdn.apache.org/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz
b、解压安装包

 tar -zxvf spark-3.0.3-bin-hadoop2.7.tgz
c、修改spark-env.sh文件

 cp spark-env.sh.template spark-env.sh
末尾添加以下内容:

export JAVA_HOME=/datasphere/jdk1.8.0_91
export SPARK_MASTER_IP=192.168.74.135
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1

d、配置环境变量
vim /etc/profile

export SPARK_HOME=/datasphere/spark-3.0.3-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

source /etc/profile
e、启动

./sbin/start-master.sh

安装dss

hadoop用户

vim /etc/profile
export JAVA_HOME=/datasphere/jdk1.8.0_91
export JRE_HOME=$JAVA_HOME/jre
export JAVA_BIN=$JAVA_HOME/bin
export JAVA_LIB=$JAVA_HOME/lib
export CLASSPATH=.$CLASSPATH:$JAVA_LIB/tools.jar:$JAVA_LIB/dt.jar
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:

export HADOOP_HOME=/datasphere/hadoop-2.7.2/
export PATH=$PATH:$MAVEN_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export  HIVE_HOME=/datasphere/hive-2.3.3
export  PATH=$PATH:$HIVE_HOME/bin

export SPARK_HOME=/datasphere/spark-3.0.3-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_CONF_DIR=$HIVE_HOME/conf
export FLINK_CONF_DIR=$FLINK_HOME/conf
export FLINK_LIB_DIR=%FLINK_HOME/lib
export SPARK_CONF_DIR=$SPARK_HOME/conf
source /etc/profile


unzip -o DSS-Linkis全家桶20220223.zip -d dss

如果有问题userdel -r hadoop

adduser hadoop
passwd hadoop
usermod -a -G hadoop hadoop

cat /etc/passwd | grep hadoop

useradd hadoop -g hadoop
vi /etc/sudoers
hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL


vim /home/hadoop/.bash_rc

同profile

检查环境

ENGINECONN_ROOT_PATH为本地目录,需要用户提前创建,并且完成授权,授权命令chmod -R 777 /目录,若为 Linkis1.0.2 版本,不必提前创建与授权,会在脚本、程序中自动创建与授权。

HDFS_USER_ROOT_PATH为 HDFS 上的路径,需要提前创建,且完成授权,授权命令hadoop fs -chmod -R 777 /目录。

yum install gcc,zlib -y

sh bin/checkEnv.sh

dnf install python3
alternatives --set python /usr/bin/python3
dnf install python2
alternatives --set python /usr/bin/python2
pip install --upgrade pip
python -m pip install matplotlib

如果要删除默认的python命令,请输入:lternatives --auto python

配置

vi conf/db.sh
MYSQL_HOST=rm-8vbe87b5295dz08zhxo.mysql.zhangbei.rds.aliyuncs.com
MYSQL_PORT=3306
MYSQL_DB=dss
MYSQL_USER=lingcloud
MYSQL_PASSWORD=Wb19831010!

##hive的配置
HIVE_HOST=rm-8vbe87b5295dz08zhxo.mysql.zhangbei.rds.aliyuncs.com
HIVE_PORT=3306
HIVE_DB=hive
HIVE_USER=lingcloud
HIVE_PASSWORD=Wb19831010!


vi conf/config.sh
###HADOOP CONF DIR #/appcom/config/hadoop-config
HADOOP_CONF_DIR=/datasphere/hadoop-2.7.2/etc/hadoop
###HIVE CONF DIR  #/appcom/config/hive-config
HIVE_CONF_DIR=/datasphere/hive-2.3.3/conf
###SPARK CONF DIR #/appcom/config/spark-config
SPARK_CONF_DIR=/datasphere/spark-3.0.3-bin-hadoop2.7/conf


启动

启动hadoop

start-dfs.sh

第一次运行,否则不要运行

/datasphere/dss/bin/install.sh

启动

/datasphere/dss/bin/start-all.sh

停止

/datasphere/dss/bin/stop-all.sh

单个启动

cd /datasphere/dss/dss/sbin
sh dss-daemon.sh start dss-framework-project-server
sh dss-daemon.sh start dss-framework-orchestrator-server


cd /datasphere/dss/linkis/logs
cd /datasphere/dss/dss/logs
tail -f /datasphere/dss/dss/logs/dss-framework-project-server.out
tail -f /datasphere/dss/linkis/logs/linkis-ps-publicservice.log


tail -f /datasphere/dss/dss/logs/dss-framework-orchestrator-server.out

问题处理

javafx.util.Pair

rpm -qa | grep java
rpm -e --nodeps java
 java -version

安装oracle jdk