by Velucid ⏲ June 09, 2021
OCI OKE(Oracle Kubernetes Engine) 따라하기
OCI OKE 운영을 위한 기본 환경 구성
OCI OKE에 애플리케이션 배포하기
Docker에 Zeppelin과 Pyspark 환경을 구성하고 이를 OCI OKE Cluster에 배포하는 과정을 보여드립니다.
OCI Registry에 Docker Image를 등록하기 때문에, OCI Registry: Container Image 등록 및 안전한 배포 과정의 Docker Login과 K8S secret 생성을 사전에 완료해야 합니다.
Docker Image의 구성 목표는 Zeppelin UI를 기반으로 %pyspark 인터페이스를 사용하여 Spark-PySpark와 연동하고 ML 분석을 수행하는 것입니다.
Apache/Zeppelin 최신 릴리즈인 0.9.0은 Kubernetes Cluster 구성 환경이 너무 복잡하고 구성이 어려워서, 바로 이전 버전인 0.8.2를 사용하여 구성하였습니다.
본 예제에서 사용된 제품들의 버전은 다음과 같습니다.
Docker가 설치된 Bastion Server에서 수행합니다.
xxxxxxxxxx
$ docker pull ubuntu:20.04
xxxxxxxxxx
$ docker images
xxxxxxxxxx
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 20.04 69040ea16df0 6 weeks ago 72.7MB
Docker Container 구동 (수행 위치 : bastion server as opc)
xxxxxxxxxx
$ docker run -it --rm -p 80:8080 --name zeppelin-setup ubuntu:20.04 &
Ubuntu OS에 Zeppelin과 Saprk-Pyspark 설치를 위한 기본 환경을 구성합니다.
xxxxxxxxxx
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bc74f469ee54 ubuntu:20.04 "/bin/bash" 9 seconds ago Up 6 seconds 80->8080/tcp zeppelin
xxxxxxxxxx
$ docker exec -it bc74f469ee54 /bin/bash
xxxxxxxxxx
apt update -y
apt install -y software-properties-common wget curl vim dos2unix
add-apt-repository -y ppa:deadsnakes/ppa
software-properties-common 설치 도중, 혹시(?) 지역/타임존 입력을 요구하면 ASIA/Seoul로 설정
- Geographic area (ASIA) : 6
- Time zone(Seoul): 69
add-apt-repository 수행 로그에, 다음과 같이 deadsnales에 404 오류가 발생하면, Python3.7 설치가 불가능하므로, Ubuntu Image를 다른 버전으로 재수행해야 합니다. (저의 경우, 20.10에서 발생함)
xxxxxxxxxx
...
Err:5 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu groovy Release
404 Not Found [IP: 91.189.95.85 80]
...
Ubuntu OS에 Zeppelin과 Pyspark 설치하는 과정을 보여드립니다. 설치 스크립트(install-zeppelin-pyspark-on-ubuntu.sh)는 Github에 업로드 되어 있으며, 이를 직접 다운 받아 설치 합니다.
xxxxxxxxxx
cd /root
wget http://velucid.github.io/oci/oke/data/install-zeppelin-pyspark-on-ubuntu.sh
dos2unix install-zeppelin-pyspark-on-ubuntu.sh
bash -x install-zeppelin-pyspark-on-ubuntu.sh
xxxxxxxxxx
# velucid 2021.06.08
#######################################
##### Install Open-JDK 8 #####
#######################################
apt update -y
apt install -y openjdk-8-jdk
#######################################
##### Install Python 3.7 #####
#######################################
apt install -y python3.7
apt install -y python3.7-venv
python3.7 -m venv /opt/python
source /opt/python/bin/activate
apt install -y python3-pip
python3.7 -m pip install --upgrade pip
pip --version
#######################################
##### Install Python Packages #####
#######################################
pip3 install ipython ipykernel
pip3 install keras tensorflow matplotlib
pip3 install scikit-learn pandas
#pip3 install jupyter-client grpcio protobuf
#pip3 install pyspark
#pip3 install --upgrade ipython
#######################################
##### Install Spark 2.4.8 #####
#######################################
cd /opt
wget https://mirror.navercorp.com/apache/spark/spark-2.4.8/spark-2.4.8-bin-hadoop2.7.tgz
tar -xvf spark-2.4.8-bin-hadoop2.7.tgz
mv spark-2.4.8-bin-hadoop2.7 spark
rm -f spark-2.4.8-bin-hadoop2.7.tgz
#######################################
##### Config Spark Environment #####
#######################################
cp /opt/spark/conf/spark-env.sh.template /opt/spark/conf/spark-env.sh
echo "" >> /opt/spark/conf/spark-env.sh
echo "export PYTHON_HOME=/opt/python" >> /opt/spark/conf/spark-env.sh
echo "export SPARK_HOME=/opt/spark" >> /opt/spark/conf/spark-env.sh
echo "export PATH=/opt/python/bin:/opt/spark/bin:\$PATH" >> /opt/spark/conf/spark-env.sh
echo "" >> /opt/spark/conf/spark-env.sh
#######################################
##### Install Zeppelin 0.8.2 #####
#######################################
cd /opt
wget https://mirror.navercorp.com/apache/zeppelin/zeppelin-0.8.2/zeppelin-0.8.2-bin-all.tgz
tar -xvf zeppelin-0.8.2-bin-all.tgz
mv zeppelin-0.8.2-bin-all zeppelin
rm -f zeppelin-0.8.2-bin-all.tgz
#######################################
##### Config Zeppelin Environment #####
#######################################
cp /opt/zeppelin/conf/zeppelin-env.sh.template /opt/zeppelin/conf/zeppelin-env.sh
echo "" >> /opt/zeppelin/conf/zeppelin-env.sh
echo "export PYTHON_HOME=/opt/python" >> /opt/zeppelin/conf/zeppelin-env.sh
echo "export SPARK_HOME=/opt/spark" >> /opt/zeppelin/conf/zeppelin-env.sh
echo "export PYSPARK_PYTHON=/opt/python/bin/python" >> /opt/zeppelin/conf/zeppelin-env.sh
echo "export PYSPARK_DRIVER_PYTHON=/opt/python/bin/python" >> /opt/zeppelin/conf/zeppelin-env.sh
echo "export ZEPPELIN_ADDR=0.0.0.0" >> /opt/zeppelin/conf/zeppelin-env.sh
echo "export PATH=/opt/python/bin:/opt/zeppelin/bin:\$PATH" >> /opt/zeppelin/conf/zeppelin-env.sh
echo "" >> /opt/zeppelin/conf/zeppelin-env.sh
#######################################
##### Setup Bash Environment #####
#######################################
echo "export PYTHON_HOME=/opt/python" >> ~/.bashrc
echo "export SPARK_HOME=/opt/spark" >> ~/.bashrc
echo "export ZEPPELIN_HOME=/opt/zeppelin" >> ~/.bashrc
echo "export PATH=\$PYTHON_HOME/bin:\$ZEPPELIN_HOME/bin:\$SPARK_HOME/bin:\$PATH" >> ~/.bashrc
echo "" >> ~/.bashrc
xxxxxxxxxx
# ls -al /opt/python/bin | grep 3.7
-rwxr-xr-x. 1 root root 239 Jun 8 23:16 easy_install-3.7
-rwxr-xr-x. 1 root root 225 Jun 8 23:18 f2py3.7
-rwxr-xr-x. 1 root root 230 Jun 8 23:17 pip3.7
lrwxrwxrwx. 1 root root 9 Jun 8 23:16 python -> python3.7
lrwxrwxrwx. 1 root root 9 Jun 8 23:16 python3 -> python3.7
lrwxrwxrwx. 1 root root 18 Jun 8 23:16 python3.7 -> /usr/bin/python3.7
# /opt/python/bin/python --version
Python 3.7.10
# /opt/python/bin/pip --version
pip 21.1.2 from /opt/python/lib/python3.7/site-packages/pip (python 3.7)
xxxxxxxxxx
cat web-ingress.json
xxxxxxxxxx
[
{
"source": "0.0.0.0/0",
"protocol": "6",
"tcpOptions": {
"destinationPortRange": {
"max": 80,
"min": 80
}
},
"isStateless": "false",
"Description": "Web Access"
},
{
"source": "0.0.0.0/0",
"protocol": "6",
"tcpOptions": {
"destinationPortRange": {
"max": 22,
"min": 22
}
},
"isStateless": "false",
"Description": "SSH Access"
}
]
xxxxxxxxxx
$ oci network security-list update --security-list-id ocid1.securitylist.oc1.ap-seoul-1.aaaaaaaa*****rbq --ingress-security-rules file://./web-ingress.json --force
{
"data": {
"compartment-id": "ocid1.compartment.oc1..aaaaaaaa*******wdwq",
"defined-tags": {},
"display-name": "k8s-bastion-seclist",
"egress-security-rules": [
{
"description": "HTTS (OCI API)",
"destination": "0.0.0.0/0",
...
xxxxxxxxxx
/opt/zeppelin/bin/zeppelin-daemon.sh start
Apache/Zeppelin 구동 (수행 위치 : web browser)
Docker Container의 변경된 내용을 commit하여 신규 이미지로 생성합니다.
xxxxxxxxxx
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bc74f469ee54 ubuntu:20.04 "/bin/bash" 2 hours ago Up 2 hours 80->8080/tcp zeppelin
xxxxxxxxxx
$ docker commit bc74f469ee54 zeppelin:0.8.2
xxxxxxxxxx
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 20.04 69040ea16df0 6 weeks ago 72.7MB
zeppelin 0.8.2 899e164bf639 31 seconds ago 4.47GB
Zeppling/PySpark 이미지를 OCI Repository에 등록하는 과정을 보여드립니다.
OCI Console을 통해 "zeppelin"라는 이름으로 OCI Repository를 생성합니다.
Repository 생성
Zeppelin 이미지를 OCI Repository로 업로드합니다. (수행 위치 : bastion server as opc)
스크립트 내의 <tenancy> 는 현재 실습을 수행하고 있는 OCI tenancy 이름으로 변경해야 합니다.
xxxxxxxxxx
$ docker tag zeppelin:0.8.2 icn.ocir.io/<tenency>/zeppelin:0.8.2
OCIR image name 규칙: <region>.ocir.io/<namespace>/<image name>/<tag>
- <namespace> = <tenancy name>
- <image name> = <ocir repository name>
xxxxxxxxxx
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 20.04 69040ea16df0 6 weeks ago 72.7MB
zeppelin 0.8.2 899e164bf639 5 minutes ago 4.47GB
icn.ocir.io/**/zeppelin 0.8.2 899e164bf639 5 minutes ago 4.47GB
xxxxxxxxxx
$ docker push icn.ocir.io/<tenancy>/zeppelin:0.8.2
The push refers to repository [icn.ocir.io/<tenancy>/zeppelin]
162be6d5e374: Pushing [======> ] 530.1MB/4.4GB
3ddefbdbdfbe: Pushed
5a6ab33e4b09: Pushed
f99824acacbf: Pushed
OCI Repository에 업로드된 Zeppelin 이미지를 OCI OKE Cluster로 배포합니다.
xxxxxxxxxx
$ cat deploy-zeppelin.yaml
xxxxxxxxxx
kind Deployment
metadata
name zeppelin
labels
app zeppelin
spec
selector
matchLabels
app zeppelin
replicas1
template
metadata
labels
app zeppelin
spec
containers
name zeppelin
image icn.ocir.io/apackrsct01/zeppelin0.8.2
command"/bin/bash" "-c" "/opt/zeppelin/bin/zeppelin.sh"
ports
containerPort8080
env
name ZEPPELIN_ADDR
value0.0.0.0
imagePullSecrets
name ocir-icn-secret
xxxxxxxxxx
$ kubectl create -f deploy-zeppelin.yaml
xxxxxxxxxx
$ kubectl get pods -l app=zeppelin
NAME READY STATUS RESTARTS AGE
zeppelin-55846bc849-zh9c2 1/1 Running 0 3m26s
Zeppelin을 외부 인터넷으로 접속하기 위한 Load Balancer Service를 생성 합니다.
xxxxxxxxxx
$ cat service-zeppelin.yaml
xxxxxxxxxx
apiVersion v1
kind Service
metadata
name apache-zeppelin-lb
labels
app apache-zeppelin
spec
type LoadBalancer
ports
port80
protocol TCP
targetPort8080
selector
app apache-zeppelin
xxxxxxxxxx
$ kubectl create -f service-zeppelin.yaml
xxxxxxxxxx
$ kubectl get services -l app=zeppelin
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
zeppelin-lb LoadBalancer 10.96.243.190 146.**.***.223 80:32371/TCP 2m10s
Zeppelin/Pyspark를 이용하여 "적립 카드 발급 대상 분석" 예제를 Import 합니다.
Zeppelin 접속
Notebook URL 입력 및 Import
순차적으로 "Play" 버튼을 클릭하여 ML 분석 과정 수행
Zeppelin 이미지로의 직접 접속 방법
xxxxxxxxxx
$ kubectl get pods -l app=zeppelin
NAME READY STATUS RESTARTS AGE
zeppelin-55846bc849-zh9c2 1/1 Running 0 21m
xxxxxxxxxx
$ kubectl exec -it zeppelin-55846bc849-zh9c2 -- /bin/bash
xxxxxxxxxx
root@zeppelin:/# curl localhost:8080
<!doctype html><!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
...