Published on 00/00/0000
Last updated on 00/00/0000
Published on 00/00/0000
Last updated on 00/00/0000
Share
Share
7 min read
Share
Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified
Apache Zeppelin on Kubernetes series: Running Zeppelin Spark notebooks on Kubernetes Running Zeppelin Spark notebooks on Kubernetes - deep dive
Apache Kafka on Kubernetes series: Kafka on Kubernetes - using etcd
Note: The Pipeline CI/CD module mentioned in this post is outdated and not available anymore. You can integrate Pipeline to your CI/CD solution using the Pipeline API. Contact us for details.There is a newer version of Pipeline, 0.3.0 available. Accordingly, the value of this blogpost may have depreciated, so we highly recommend that you check Pipeline's documentation for the latest howto. This blog post will quickly and concisely present how to a perform a basic CI/CD workflow setup using our platform. We have also prepared a few projects in our GitHub repo that can be used as starting points (see the links at the end of this post).
These example projects make use of resources (Spark clusters on k8s on AWS) provisioned on-demand during their CI/CD workflow. However, some workflow steps vary considerably (build details, run details).
This example will focus on Spark, but you should note that Pipeline is a generic microservice platform that's not exclusively tied to use with big data workloads - is able to run any containerized, distributed workflow. Our next example in this series will involve databases, as we are moving towards JEE support, a key component of which is a persistent datastore.To hook a Spark project into the CI/CD workflow of Banzai Cloud Pipeline, watch the video we've provided or follow the instructions below.
Authorization callback URL
with a dummy value. This field will be updated once the Control Plane is up and running, using an IP address or DNS name.
Take note of the Client ID
and Client Secret
, as these are required to launch the Pipeline Control Plane.
Specify an Amazon S3 template URL
and add the URL to our template https://s3-eu-west-1.amazonaws.com/cf-templates-grr4ysncvcdl-eu-west-1/2017340oCy-new.templatei5xlidcwt4p
Client Id
Client Secret
0.3.0
to use the current stable Pipeline release.Control Plane
instance.Control Plane
.{" "}Authorization callback URL
field to http://{control_plane_public_ip}/authorize
.pipeline.yml
pipeline workflow configuration for your Spark application.pipeline.yml
file, which must be placed in the root directory of the Spark application's source code. The file has to be pushed to the GitHub repo along with the source files of the application.
Here's an example Spark application, spark-pi-example, which can be used to try out the CI/CD pipeline.
Note: To accomplish this, fork that repository into your own!In order to set up your own spark application for the workflow, start by customizing the
.pipeline.yml
configuration file in spark-pi-example
.
The following sections need to be modified:
remote_build:
...
original_commands:
- mvn clean package -s settings.xml
run:
...
spark_class: banzaicloud.SparkPi
run:
...
spark_app_name: sparkpi
jar
. The jar
is generated by this build command.
run:
...
spark_app_source: target/spark-pi-1.0-SNAPSHOT.jar
run:
...
spark_app_args: 1000
http://{control_plane_public_ip}
in your web browser and grant access for the organizations that contain the GitHub repositories that you want to hook into the CI/CD workflow. Then click authorize access.
It may take some time for all of Pipeline's services to fully initialize, and so the page may not load at first. If this happens, please wait a little and retry.
http://{control_plane_public_ip}
, bringing you to the CI/CD user interface. Select Repositories
from the top left menu. This lists all the repositories that the Pipeline has access to. Select the repositories you want to be hooked into the CI/CD flow.
plugin_endpoint
- specify http://{control_plane_public_ip}/pipeline/api/v1
plugin_username
- specify the same user name you used for Pipeline Credentialsplugin_password
- specify the same password you used for Pipeline Credentials.pipeline.yml
file.
http://{control_plane_public_ip}/account/repos
Get emerging insights on emerging technology straight to your inbox.
Outshift is leading the way in building an open, interoperable, agent-first, quantum-safe infrastructure for the future of artificial intelligence.
* No email required
The Shift is Outshift’s exclusive newsletter.
Get the latest news and updates on generative AI, quantum computing, and other groundbreaking innovations shaping the future of technology.