DataSphere
目录
常用链接
Linkis1.0.2 安装及使用指南 https://www.jianshu.com/p/d0e8b605c4ce
WeDataSphere 常见问题(含DSS,Linkis等)QA文档 https://docs.qq.com/doc/DSGZhdnpMV3lTUUxq
systemctl stop firewalld systemctl stop firewalld.service #停止firewall systemctl disable firewalld.service #禁止firewall开机启动
安装准备
yum -y install yum-utils yum-config-manager --disable mysql80-community yum-config-manager --enable mysql57-community yum repolist enabled | grep mysql yum install -y mysql-community-server
yum install -y telnet,tar,sed,dos2unix,unzip,expect
http://nginx.org/en/linux_packages.html#RHEL-CentOS
touch /etc/yum.repos.d/nginx.repo vi /etc/yum.repos.d/nginx.repo
[nginx-stable]
name=nginx stable repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
[nginx-mainline]
name=nginx mainline repo
baseurl=http://nginx.org/packages/mainline/centos/$releasever/$basearch/
gpgcheck=1
enabled=0
gpgkey=https://nginx.org/keys/nginx_signing.key
module_hotfixes=true
yum install yum-utils yum install -y nginx
whereis nginx
perl
https://www.perl.org/get.html 项目部署#安装nginx依赖:查看是否已经安装
wget https://www.cpan.org/src/5.0/perl-5.34.1.tar.gz tar -xzf perl-5.34.1.tar.gz cd perl-5.34.1 mv /usr/bin/perl /usr/bin/perl.bak ./Configure -des -Dprefix=/usr/local/perl make&&make install
perl -v ln -s /usr/local/perl/bin/perl /usr/bin/perl
mysql
https://www.cnblogs.com/milton/p/15418572.html
wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm yum -y install mysql57-community-release-el7-10.noarch.rpm yum -y install mysql-community-server
b、然后手动下载
wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
c、然后安装该数据库的rpm包
rpm -ivh mysql-community-release-el7-5.noarch.rpm
d、开始安装mysql-server
yum install mysql-server
(2)卸载mysql-community-release-el7-5.noarch
rpm -e --nodeps mysql-community-release-el7-5.noarch
centos8的镜像包不在维护,重装centos后一切都好了
wget https://cdn.mysql.com//Downloads/MySQL-8.0/mysql-8.0.28-1.el8.x86_64.rpm-bundle.tar tar -xvf mysql-8.0.28-1.el8.x86_64.rpm-bundle.tar
rpm -ivh mysql-community-common-8.0.28-1.el8.x86_64.rpm rpm -ivh mysql-community-client-plugins-8.0.28-1.el8.x86_64.rpm rpm -ivh mysql-community-libs-8.0.28-1.el8.x86_64.rpm rpm -ivh mysql-community-client-8.0.28-1.el8.x86_64.rpm rpm -ivh mysql-community-icu-data-files-8.0.28-1.el8.x86_64.rpm rpm -ivh mysql-community-server-8.0.28-1.el8.x86_64.rpm
mysqld --console查看日志后发现是data文件的问题,将data文件手动删除之后使用mysqld --initalize-insecure 系统自动生成data文件夹及内部文件,再使用mysqld -install 重新安装
rm -rf /var/lib/mysql mysqld --initalize-insecure mysqld -install
tail -f /var/log/mysqld.log
安装hadoop2.7.2
https://blog.csdn.net/qq_44665283/article/details/121329554
mkdir /datasphere cd /datasphere wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz tar -zxvf hadoop-2.7.2.tar.gz -C /datasphere
vi /datasphere/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/datasphere/jdk1.8.0_91
vi /datasphere/hadoop-2.7.2/etc/hadoop/core-site.xml
<configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.74.135:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/datasphere/hadoop-2.7.2/tmp</value>
</property>
</configuration>
vi /datasphere/hadoop-2.7.2/etc/hadoop/hdfs-site.xml
<configuration>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
SSH免密登录
- 到 root 目录下:
cd /root
- 执行生成密钥命令:
ssh-keygen -t rsa
- 然后三个回车
- 然后复制公钥追加到第一台节点的公钥文件中:
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.74.135
- 选择 yes
- 输入登录第一台节点的密码(操作完成该节点公钥复制到第一台节点中)
配置环境变量
vim /etc/profile
export HADOOP_HOME=/datasphere/hadoop-2.7.2/ PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$MAVEN_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
hdfs 启动与停止
第一次启动得先格式化(最好不要复制):
hdfs namenode -format
启动hdfs
start-dfs.sh
(9)开放50070端口
添加永久开放的端口
firewall-cmd --add-port=50070/tcp --permanent firewall-cmd --reload
(10) 配置yarn启动
1、配置mapred-site.xml
cd /datasphere/hadoop-2.7.2/etc/hadoop/ mv mapred-site.xml.template mapred-site.xml vim mapred-site.xml
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2、配置yarn-site.xml
<configuration>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3、启动yarn
start-yarn.sh
浏览器访问(防火墙开放8088端口):
firewall-cmd --add-port=8088/tcp --permanent firewall-cmd --reload
至此,我们Hadoop的单机模式搭建成功。
Hive2.3.3的安装
https://blog.csdn.net/qq_44665283/article/details/121147347 下载地址:
http://archive.apache.org/dist/hive/hive-2.3.3/
wget http://archive.apache.org/dist/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz tar -zxvf apache-hive-2.3.3-bin.tar.gz -C /datasphere mv apache-hive-2.3.3-bin hive-2.3.3
1 解压配置环境变量
- 配置环境变量
sudo vi /etc/profile
末尾追加
export HIVE_HOME=/datasphere/hive-2.3.3 export PATH=$PATH:$HIVE_HOME/bin
重新编译环境变量生效
source /etc/profile
2 配置Hive文件
2.1 修改hive-env.sh
cp hive-env.sh.template hive-env.sh
# HADOOP_HOME=${bin}/../../hadoop 打开注释修改 HADOOP_HOME=/datasphere/hadoop-2.7.2 # export HIVE_CONF_DIR= 打开注释修改 HIVE_CONF_DIR=/datasphere/hive-2.3.3/conf
2.2 修改hive-log4j.properties
修改hive的log存放日志到/datasphere/hive-2.3.3/logs
cp hive-log4j2.properties.template hive-log4j2.properties
vi hive-log4j2.properties
找到 property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
修改 property.hive.log.dir = /datasphere/hive-2.3.3/logs
3 配置MySQL作为Metastore
默认情况下, Hive的元数据保存在了内嵌的 derby 数据库里, 但一般情况下生产环境使用 MySQL 来存放 Hive 元数据。
安装mysql,拷贝 mysql-connector-java-5.1.47.jar 放入 $HIVE_HOME/lib 下。
3.2 修改配置文件
参数配置文档:https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin
复制hive-default.xml.template为hive-site.xml 文件,删除掉configuration里的配置信息,重新配置 MySQL 数据库连接信息。
cp hive-default.xml.template hive-site.xmltouch hive-site.xml vi hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--Hive作业的HDFS根目录位置 -->
<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
</property>
<!--Hive作业的HDFS根目录创建写权限 -->
<property>
<name>hive.scratch.dir.permission</name>
<value>733</value>
</property>
<!--hdfs上hive元数据存放位置 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!--连接数据库地址,名称 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://rm-8vbe87b5295dz08zhxo.mysql.zhangbei.rds.aliyuncs.com:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<!--连接数据库驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<!--连接数据库用户名称 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>lingcloud</value>
</property>
<!--连接数据库用户密码 -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Wb19831010!</value>
</property>
<!--客户端显示当前查询表的头信息 -->
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<!--客户端显示当前数据库名称信息 -->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>
3.3 mysql创建hive用户密码
CREATE DATABASE hive; USE hive; CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive'; GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive'; GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive'; FLUSH PRIVILEGES;
4.1 初始化数据库 从Hive 2.1开始,我们需要运行下面的schematool命令作为初始化步骤。例如,这里使用“mysql”作为db类型。
schematool -dbType mysql -initSchema
执行成功后,可以使用Navicat Premium 查看元数据库 hive 是否已经创建成功。
4.2 启动 Hive 客户端
启动Hadoop服务,使用 Hive CLI(Hive command line interface), **hive --service cli和hive效果一样,**可以在终端输入以下命令
hive
安装spark
https://spark.apache.org/downloads.html
a、下载安装
wget https://dlcdn.apache.org/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz
b、解压安装包
tar -zxvf spark-3.0.3-bin-hadoop2.7.tgz
c、修改spark-env.sh文件
cp spark-env.sh.template spark-env.sh
末尾添加以下内容:
export JAVA_HOME=/datasphere/jdk1.8.0_91
export SPARK_MASTER_IP=192.168.74.135
export SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
d、配置环境变量
vim /etc/profile
export SPARK_HOME=/datasphere/spark-3.0.3-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
source /etc/profile
e、启动
./sbin/start-master.sh
安装dss
hadoop用户
vim /etc/profile
export JAVA_HOME=/datasphere/jdk1.8.0_91
export JRE_HOME=$JAVA_HOME/jre
export JAVA_BIN=$JAVA_HOME/bin
export JAVA_LIB=$JAVA_HOME/lib
export CLASSPATH=.$CLASSPATH:$JAVA_LIB/tools.jar:$JAVA_LIB/dt.jar
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:
export HADOOP_HOME=/datasphere/hadoop-2.7.2/
export PATH=$PATH:$MAVEN_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HIVE_HOME=/datasphere/hive-2.3.3
export PATH=$PATH:$HIVE_HOME/bin
export SPARK_HOME=/datasphere/spark-3.0.3-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_CONF_DIR=$HIVE_HOME/conf
export FLINK_CONF_DIR=$FLINK_HOME/conf
export FLINK_LIB_DIR=%FLINK_HOME/lib
export SPARK_CONF_DIR=$SPARK_HOME/conf
source /etc/profile
unzip -o DSS-Linkis全家桶20220223.zip -d dss
如果有问题userdel -r hadoop
adduser hadoop passwd hadoop usermod -a -G hadoop hadoop cat /etc/passwd | grep hadoop
useradd hadoop -g hadoop
vi /etc/sudoers hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL
vim /home/hadoop/.bash_rc
同profile
检查环境
ENGINECONN_ROOT_PATH为本地目录,需要用户提前创建,并且完成授权,授权命令chmod -R 777 /目录,若为 Linkis1.0.2 版本,不必提前创建与授权,会在脚本、程序中自动创建与授权。
HDFS_USER_ROOT_PATH为 HDFS 上的路径,需要提前创建,且完成授权,授权命令hadoop fs -chmod -R 777 /目录。
yum install gcc,zlib -y sh bin/checkEnv.sh dnf install python3 alternatives --set python /usr/bin/python3
dnf install python2 alternatives --set python /usr/bin/python2
pip install --upgrade pip python -m pip install matplotlib
如果要删除默认的python命令,请输入:lternatives --auto python
配置
vi conf/db.sh
MYSQL_HOST=rm-8vbe87b5295dz08zhxo.mysql.zhangbei.rds.aliyuncs.com
MYSQL_PORT=3306
MYSQL_DB=dss
MYSQL_USER=lingcloud
MYSQL_PASSWORD=Wb19831010!
##hive的配置
HIVE_HOST=rm-8vbe87b5295dz08zhxo.mysql.zhangbei.rds.aliyuncs.com
HIVE_PORT=3306
HIVE_DB=hive
HIVE_USER=lingcloud
HIVE_PASSWORD=Wb19831010!
vi conf/config.sh
###HADOOP CONF DIR #/appcom/config/hadoop-config
HADOOP_CONF_DIR=/datasphere/hadoop-2.7.2/etc/hadoop
###HIVE CONF DIR #/appcom/config/hive-config
HIVE_CONF_DIR=/datasphere/hive-2.3.3/conf
###SPARK CONF DIR #/appcom/config/spark-config
SPARK_CONF_DIR=/datasphere/spark-3.0.3-bin-hadoop2.7/conf
启动
启动hadoop
start-dfs.sh
第一次运行,否则不要运行
/datasphere/dss/bin/install.sh
启动
/datasphere/dss/bin/start-all.sh
停止
/datasphere/dss/bin/stop-all.sh
单个启动
cd /datasphere/dss/dss/sbin sh dss-daemon.sh start dss-framework-project-server sh dss-daemon.sh start dss-framework-orchestrator-server
cd /datasphere/dss/linkis/logs cd /datasphere/dss/dss/logs
tail -f /datasphere/dss/dss/logs/dss-framework-project-server.out tail -f /datasphere/dss/linkis/logs/linkis-ps-publicservice.log
tail -f /datasphere/dss/dss/logs/dss-framework-orchestrator-server.out
问题处理
javafx.util.Pair
rpm -qa | grep java rpm -e --nodeps java
java -version
安装oracle jdk