Get ahead
VMware offers training and certification to turbo-charge your progress.
Learn moreBatch processing has been a challenging area of computer science since its inception in the early days of punch cards and magnetic tapes. Nowadays, the modern cloud computing era comes with a whole new set of challenges for how to develop and operate batch workload efficiently in a cloud environment. In this blog post, I introduce some of the challenges a batch developer or architect may face when designing and running batch applications at scale and show how Spring Batch, Spring Boot and Kubernetes can tremendously simplify this task.
Designing cloud-native batch applications might seem easy compared to web applications, but this is not true. Batch developers face many challenges.
Batch processes typically interact with other services (such as databases, messages brokers, web services, and others) which are, by nature, flaky in cloud environments. Moreover, even the nodes on which those processes are run can die at any time and be replaced with healthy nodes. Cloud native batch applications should be designed in a fault-tolerant way.
It is not uncommon that the human error of running a batch job twice has some big financial consequences (such as what happened to Walgreens, ANZ Bank, and NatWest, to name a few). Moreover, some platforms, such as Kubernetes, have some known limitations about the eventuality of running the same job twice. A cloud native batch application should be ready to deal with this kind of issues by design.
Cloud infrastructures are billed by cpu/memory/bandwidth usage. In case of failure, It would be inefficient to not be able to restart a job from where it left off and "lose" the cpu/memory/bandwidth usage of the previous run (and hence be billed twice or more!).
Any modern batch architecture should be able to know at any point in time some key metrics, including:
Being able to have these KPIs at a glance on a dashboard is vital for efficient operations.
We are dealing with an unprecedented amounts of data, which is impossible to handle on a single machine any more. Correctly processing large volumes of distributed data is probably the most challenging point. Cloud-native batch applications should be scalable by design.
All these aspects should be taken into consideration when designing and developing cloud-native batch applications. This is a considerable amount of work on the developer's side. Spring Batch takes care of most of these issues. I explain the details in the next section.
Spring Batch is the de facto batch processing framework on the JVM. Entire books have been written on the rich feature set provided by Spring Batch, but I would like to highlight the most relevant features that address the previously mentioned challenges in the context of cloud-native development:
Spring Batch provides fault-tolerance features, such as transaction management and skip and retry mechanisms, which are useful when batch jobs interact with flaky services in a cloud environment.
Spring Batch uses a centralized transactional job repository, which prevents duplicate job executions. By design, human errors and platform limitations that may lead to running the same job twice are impossible.
Spring Batch jobs maintain their state in an external database, which makes it possible to restart failed jobs where they left off. This is cost effective, compared to other solutions that would redo the work from the beginning and, hence, would be billed twice or more!
Spring Batch provides integration with Micrometer, which is key in terms of observability. A Spring Batch-based batch infrastructure provides key metrics, such as the currently active jobs, read/write rates, failed jobs, and others. It can even be extended with custom metrics.
As already mentioned, Spring Batch jobs maintain their state in an external database. As a result, they are stateless processes from the 12 factors methodology point of view. This stateless nature makes them suitable to be containerized and executed in cloud environments in a scalable way. Moreover, Spring Batch provides several vertical and horizontal scaling techniques, such as multi-threaded steps and remote partitioning/chunking of data, to scale batch jobs in an efficient way.
Spring Batch provides other features, but the ones mentioned above are very helpful when designing and developing cloud-native batch processes.
Kubernetes is the de facto container orchestration platform for the cloud. Operating a batch infrastructure at scale is far from being a trivial task, and Kubernetes really is a game changer in this space. Before the cloud era, in one of my previous jobs, I played the role of a batch operator and I had to manage a cluster of 4 machines dedicated to batch jobs. Here are some of the tasks I had to either do manually or find a way to automate with (bash!) scripts:
All these tasks are obviously inefficient and error prone, leaving four dedicated machines under-utilized due to poor resource management. If you are still doing such tasks in 2021 (either manually or via scripts), I believe it's a good time to think about migrating your batch infrastructure to Kubernetes. The reason is that Kubernetes lets you do all these tasks with a single command against the entire cluster, and this is a huge difference from an operational point of view. Moving to Kubernetes lets you:
In this section, I take the same job developed in Spring Batch's getting started guide (which is a data ingestion job that loads some person data from a CSV file into a relational database table), containerize it, and deploy it on Kubernetes. If you want to go a step further by wrapping this job in a Spring Cloud Task and deploying it in a Spring Cloud Data Flow server, see Deploy a Spring Batch application by Using Data Flow.
I use a MySQL database to store Spring Batch metadata. The database lives outside the Kubernetes cluster, and this is on purpose. The reason is to mimic a realistic migration path, where only stateless workloads are migrated to Kubernetes in a first step. For many companies, migrating databases to Kubernetes is not an option yet (and this is a reasonable decision). To start the database server, run the following commands:
$ git clone [email protected]:benas/spring-batch-lab.git
$ cd blog/spring-batch-kubernetes
$ docker-compose -f src/docker/docker-compose.yml up
This will create a MySQL container pre-populated with Spring Batch's technical tables as well as the business table, PEOPLE
. We can check this, as follows:
$ docker exec -it mysql bash
root@0a6596feb06d:/# mysql -u root test -p # the root password is "root"
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.21 MySQL Community Server - GPL
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show tables;
+------------------------------+
| Tables_in_test |
+------------------------------+
| BATCH_JOB_EXECUTION |
| BATCH_JOB_EXECUTION_CONTEXT |
| BATCH_JOB_EXECUTION_PARAMS |
| BATCH_JOB_EXECUTION_SEQ |
| BATCH_JOB_INSTANCE |
| BATCH_JOB_SEQ |
| BATCH_STEP_EXECUTION |
| BATCH_STEP_EXECUTION_CONTEXT |
| BATCH_STEP_EXECUTION_SEQ |
| PEOPLE |
+------------------------------+
10 rows in set (0.01 sec)
mysql> select * from PEOPLE;
Empty set (0.00 sec)
Go to start.spring.io and generate a project with the following dependencies: Spring Batch and the MySQL driver. You can use this link to create the project. After unzipping the project and loading it in your favorite IDE, you can change the main class, as follows:
package com.example.demo;
import java.net.MalformedURLException;
import javax.sql.DataSource;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.core.io.Resource;
import org.springframework.core.io.UrlResource;
@SpringBootApplication
@EnableBatchProcessing
public class DemoApplication {
public static void main(String[] args) {
System.exit(SpringApplication.exit(
SpringApplication.run(DemoApplication.class, args)));
}
@Bean
@StepScope
public Resource resource(@Value("#{jobParameters['fileName']}") String fileName) throws MalformedURLException {
return new UrlResource(fileName);
}
@Bean
public FlatFileItemReader<Person> itemReader(Resource resource) {
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(resource)
.delimited()
.names("firstName", "lastName")
.targetType(Person.class)
.build();
}
@Bean
public JdbcBatchItemWriter<Person> itemWriter(DataSource dataSource) {
return new JdbcBatchItemWriterBuilder<Person>()
.dataSource(dataSource)
.sql("INSERT INTO PEOPLE (FIRST_NAME, LAST_NAME) VALUES (:firstName, :lastName)")
.beanMapped()
.build();
}
@Bean
public Job job(JobBuilderFactory jobs, StepBuilderFactory steps,
DataSource dataSource, Resource resource) {
return jobs.get("job")
.start(steps.get("step")
.<Person, Person>chunk(3)
.reader(itemReader(resource))
.writer(itemWriter(dataSource))
.build())
.build();
}
public static class Person {
private String firstName;
private String lastName;
// default constructor + getters/setters omitted for brevity
}
}
The @EnableBatchProcessing
annotation sets up all the infrastructure beans required by Spring Batch (job repository, job launcher, and others) as well as some utilities, such as JobBuilderFactory
and StepBuilderFactory
to facilitate the creation of steps and jobs. In the snippet above, I used those utilities to create a job with a single chunk-oriented step, defined as follows:
UrlResource
. In some cloud environments, file systems are read-only or do not even exist, so the ability to stream data without downloading it is almost an essential requirement. Fortunately, Spring Batch has you covered! All file-based item readers (for flat files, XML files, and JSON files) work against the powerful Spring Framework Resource
abstraction, so any implementation of Resource
should work. In this example, I use a UrlResource
to read data directly from the remote URL of sample-data.csv at GitHub without downloading it. The file name is passed in as a job parameter.Person
items to the PEOPLE
table in MySQL.That's it. Let's package the job and create a docker image for it by using Spring Boot's maven plugin:
$ mvn package
...
$ mvn spring-boot:build-image -Dspring-boot.build-image.imageName=benas/bootiful-job
[INFO] Scanning for projects...
[INFO]
…
[INFO] --- spring-boot-maven-plugin:2.4.1:build-image (default-cli) @ demo ---
[INFO] Building image 'docker.io/benas/bootiful-job:latest'
…
[INFO] Successfully built image 'docker.io/benas/bootiful-job:latest'
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
The image should now be correctly built, but let's check that:
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
benas/bootiful-job latest 52244b284f08 41 seconds ago 242MB
Note how Spring Boot created a Docker image without the need to create a Dockerfile! A complete blog post has been written about this awesome feature by the awesome Josh Long: YMNNALFT: Easy Docker Image Creation with the Spring Boot Maven Plugin and Buildpacks. Now let's run this job in a Docker container to check that everything is working as expected:
$ docker run \
-e SPRING_DATASOURCE_URL=jdbc:mysql://192.168.1.53:3306/test \
-e SPRING_DATASOURCE_USERNAME=root \
-e SPRING_DATASOURCE_PASSWORD=root \
-e SPRING_DATASOURCE_DRIVER-CLASS-NAME=com.mysql.cj.jdbc.Driver \
benas/bootiful-job \
fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample1.csv
You should see something like:
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.4.1)
2021-01-08 17:03:15.009 INFO 1 --- [ main] com.example.demo.DemoApplication : Starting DemoApplication v0.0.1-SNAPSHOT using Java 1.8.0_275 on 876da4a1cfe0 with PID 1 (/workspace/BOOT-INF/classes started by cnb in /workspace)
2021-01-08 17:03:15.012 INFO 1 --- [ main] com.example.demo.DemoApplication : No active profile set, falling back to default profiles: default
2021-01-08 17:03:15.899 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2021-01-08 17:03:16.085 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2021-01-08 17:03:16.139 INFO 1 --- [ main] o.s.b.c.r.s.JobRepositoryFactoryBean : No database type set, using meta data indicating: MYSQL
2021-01-08 17:03:16.292 INFO 1 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
2021-01-08 17:03:16.411 INFO 1 --- [ main] com.example.demo.DemoApplication : Started DemoApplication in 1.754 seconds (JVM running for 2.383)
2021-01-08 17:03:16.414 INFO 1 --- [ main] o.s.b.a.b.JobLauncherApplicationRunner : Running default command line with: [fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample1.csv]
2021-01-08 17:03:16.536 INFO 1 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job]] launched with the following parameters: [{fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample1.csv}]
2021-01-08 17:03:16.596 INFO 1 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step]
2021-01-08 17:03:17.481 INFO 1 --- [ main] o.s.batch.core.step.AbstractStep : Step: [step] executed in 884ms
2021-01-08 17:03:17.501 INFO 1 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job]] completed with the following parameters: [{fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample1.csv}] and the following status: [COMPLETED] in 934ms
2021-01-08 17:03:17.513 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2021-01-08 17:03:17.534 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
The job is now completed, and we can check that data has been successfully loaded in the database:
mysql> select * from PEOPLE;
+----+------------+-----------+
| ID | FIRST_NAME | LAST_NAME |
+----+------------+-----------+
| 1 | Jill | Doe |
| 2 | Joe | Doe |
| 3 | Justin | Doe |
| 4 | Jane | Doe |
| 5 | John | Doe |
+----+------------+-----------+
5 rows in set (0.00 sec)
That's it! Now let's deploy this job on Kubernetes. However, before moving on and deploying this job on Kubernetes, I want to show two things:
If you want to see how Spring Batch prevents duplicate job executions, you can try to re-run the job with the same command. The application should fail to start with the following error:
2021-01-08 20:21:20.752 ERROR 1 --- [ main] o.s.boot.SpringApplication : Application run failed
java.lang.IllegalStateException: Failed to execute ApplicationRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:798) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:785) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:333) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1309) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1298) [spring-boot-2.4.1.jar:2.4.1]
at com.example.demo.DemoApplication.main(DemoApplication.java:30) [classes/:0.0.1-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_275]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_275]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_275]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_275]
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) [workspace/:na]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:107) [workspace/:na]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) [workspace/:na]
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88) [workspace/:na]
Caused by: org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException: A job instance already exists and is complete for parameters={fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample1.csv}. If you want to run this job again, change the parameters.
…
Spring Batch does not let the same job instance be re-run after it has successfully completed. This is by design, to prevent duplicate job executions due to either a human error or a platform limitation, as explained in the previous section.
In the same spirit, Spring Batch prevents concurrent executions of the same job instance. To test it, add an item processor that does a Thread.sleep
to slow down the processing and try to run a second job execution (in a separate terminal) while the first one is running. The second (concurrent) attempt fails with:
2021-01-08 20:59:04.201 ERROR 1 --- [ main] o.s.boot.SpringApplication : Application run failed
java.lang.IllegalStateException: Failed to execute ApplicationRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:798) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:785) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:333) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1309) [spring-boot-2.4.1.jar:2.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1298) [spring-boot-2.4.1.jar:2.4.1]
at com.example.demo.DemoApplication.main(DemoApplication.java:31) [classes/:0.0.1-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_275]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_275]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_275]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_275]
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:49) [workspace/:na]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:107) [workspace/:na]
at org.springframework.boot.loader.Launcher.launch(Launcher.java:58) [workspace/:na]
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:88) [workspace/:na]
Caused by: org.springframework.batch.core.repository.JobExecutionAlreadyRunningException: A job execution for this job is already running: JobExecution: id=1, version=1, startTime=2021-01-08 20:58:46.434, endTime=null, lastUpdated=2021-01-08 20:58:46.435, status=STARTED, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=1, version=0, Job=[job]], jobParameters=[{fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample1.csv}]
…
Thanks to the centralized job repository, Spring Batch can detect currently running executions (based on the job status in the database) and prevent concurrent executions either on the same node or any other node of the cluster by throwing a JobExecutionAlreadyRunningException
.
Setting up a Kubernetes cluster is beyond the scope of this post, so I assume you already have a Kubernetes cluster up and running and can interact with it by using kubectl
. In this post, I use the single-node local Kubernetes cluster provided by the Docker Desktop application.
First, I create a service for the external database, as described in "Scenario 1: Database outside cluster with IP address" from Kubernetes best practices: mapping external services. Here is the service definition:
kind: Service
apiVersion: v1
metadata:
name: mysql
spec:
type: ClusterIP
ports:
- port: 3306
targetPort: 3306
---
kind: Endpoints
apiVersion: v1
metadata:
name: mysql
subsets:
- addresses:
- ip: 192.168.1.53 # This is my local IP, you might need to change it if needed
ports:
- port: 3306
---
apiVersion: v1
kind: Secret
metadata:
name: db-secret
type: Opaque
data:
# base64 of "root" ($>echo -n "root" | base64)
db.username: cm9vdA==
db.password: cm9vdA==
This service can be applied to Kubernetes, as follows:
$ kubectl apply -f src/kubernetes/database-service.yaml
Now, since we have already created a Docker image for our job, deploying it to Kubernetes is a matter of defining a Job
resource with the following manifest:
apiVersion: batch/v1
kind: Job
metadata:
name: bootiful-job-$JOB_NAME
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: bootiful-job
image: benas/bootiful-job
imagePullPolicy: Never
args: ["fileName=$FILE_NAME"]
env:
- name: SPRING_DATASOURCE_DRIVER-CLASS-NAME
value: com.mysql.cj.jdbc.Driver
- name: SPRING_DATASOURCE_URL
value: jdbc:mysql://mysql/test
- name: SPRING_DATASOURCE_USERNAME
valueFrom:
secretKeyRef:
name: db-secret
key: db.username
- name: SPRING_DATASOURCE_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: db.password
This manifest follows the same approach as creating jobs based on a template, as suggested by Kubernetes docs. This job template serves as a base for creating a job for each input file to ingest. I have already ingested the sample1.csv
file, so I create a job for another remote file named sample2.csv by using the following command:
$ JOB_NAME=sample2 \
FILE_NAME="https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample2.csv" \
envsubst < src/k8s/job.yaml | kubectl apply -f -
This command substitutes variables in the job template to create a job definition for the given file and then submits it to Kubernetes. Let's check the job and pod resources in Kubernetes:
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
bootiful-job-sample2 0/1 97s 97s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
bootiful-job-sample2-n8mlb 0/1 Completed 0 7s
$ kubectl logs bootiful-job-sample2-n8mlb
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.4.1)
2021-01-08 17:48:42.053 INFO 1 --- [ main] com.example.demo.BootifulJobApplication : Starting BootifulJobApplication v0.1 using Java 1.8.0_275 on bootiful-job-person-n8mlb with PID 1 (/workspace/BOOT-INF/classes started by cnb in /workspace)
2021-01-08 17:48:42.056 INFO 1 --- [ main] com.example.demo.BootifulJobApplication : No active profile set, falling back to default profiles: default
2021-01-08 17:48:43.028 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2021-01-08 17:48:43.180 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2021-01-08 17:48:43.231 INFO 1 --- [ main] o.s.b.c.r.s.JobRepositoryFactoryBean : No database type set, using meta data indicating: MYSQL
2021-01-08 17:48:43.394 INFO 1 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : No TaskExecutor has been set, defaulting to synchronous executor.
2021-01-08 17:48:43.541 INFO 1 --- [ main] com.example.demo.BootifulJobApplication : Started BootifulJobApplication in 1.877 seconds (JVM running for 2.338)
2021-01-08 17:48:43.544 INFO 1 --- [ main] o.s.b.a.b.JobLauncherApplicationRunner : Running default command line with: [fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample2.csv]
2021-01-08 17:48:43.677 INFO 1 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job]] launched with the following parameters: [{fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample2.csv}]
2021-01-08 17:48:43.758 INFO 1 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step]
2021-01-08 17:48:44.632 INFO 1 --- [ main] o.s.batch.core.step.AbstractStep : Step: [step] executed in 873ms
2021-01-08 17:48:44.653 INFO 1 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job]] completed with the following parameters: [{fileName=https://raw.githubusercontent.com/benas/spring-batch-lab/master/blog/spring-batch-kubernetes/data/sample2.csv}] and the following status: [COMPLETED] in 922ms
2021-01-08 17:48:44.662 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2021-01-08 17:48:44.693 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
You can then check the newly added persons in the PEOPLE
table:
mysql> select * from PEOPLE;
+----+------------+-----------+
| ID | FIRST_NAME | LAST_NAME |
+----+------------+-----------+
| 1 | Jill | Doe |
| 2 | Joe | Doe |
| 3 | Justin | Doe |
| 4 | Jane | Doe |
| 5 | John | Doe |
| 6 | David | Doe |
| 7 | Damien | Doe |
| 8 | Danny | Doe |
| 9 | Dorothy | Doe |
| 10 | Daniel | Doe |
+----+------------+-----------+
10 rows in set (0.00 sec)
That's it, our job is successfully running in Kubernetes!
Before concluding this post, I wanted to share some tips and tricks that are worth considering when migrating Spring Batch jobs to the cloud on Kubernetes.
Running more than one Spring Batch job in a single container or pod is not a good idea. This does not follow the cloud-native development best practices and the Unix philosophy in general. Running a job per container or pod has the following advantages:
A successful Spring Batch job instance cannot be restarted. In the same way, a successful Kubernetes job cannot be restarted. This makes designing a Kubernetes job per Spring Batch job instance a perfect match! As a consequence, correctly choosing the identifying job parameters in Spring Batch becomes a crucial task, as doing so determines the identity of job instances and consequently the design of Kubernetes jobs (See point 3). Two important aspects of the framework are affected by this choice:
Batch processing is about processing fixed, immutable data sets. If the input data is not fixed, then a stream-processing tool is more appropriate. Identifying job parameters in Spring Batch should represent a uniquely identifiable immutable data set. A good hint to correctly choose a set of identifying job parameters is calculating their hash (or more precisely the hash of the data they represent) and making sure that that hash is stable. Here are some examples:
Job parameters | Good/Bad | Comments |
---|---|---|
fileName=log.txt | Bad | An ever growing log file is not a fixed data set |
fileName=transactions-2020-08-20.csv | Good | As long as the file content is fixed |
folderName=/in/data | Bad | A folder with a variable content is not a fixed data set |
folderName=/in/data/2020/12/20 | Good | A folder with the files of all orders received on a given day |
jmsQueueName=events | Bad | Items are removed from the queue so this is not a fixed data set |
orderDate=2020-08-20 | Good | If used, for example, in a database select query on D+1 |
Unfortunately, many people fail at designing good identifying job parameters and end up adding a timestamp or a random number as an additional identifying job parameter acting as job instance discriminator. Using an ever growing “run.id” parameter is a symptom of such a failure.
The Kubernetes' documentation provides a whole section called Job patterns, which describes how to choose the right job deployment pattern. In this post, I followed the Parallel processing using expansions approach to create a job per file from a template. While this approach allows for processing multiple files in parallel, it can put a pressure on Kubernetes when there are many files to ingest, as this would result in many Kubernetes job objects being created. If all your files have a similar structure and you want to create a single job to ingest them in one shot, you can use the MultiResourceItemReader
provided by Spring Batch and create a single Kubernetes job. Another option is to use a single job with a partitioned step where each worker step handles a file (this can be achieved by using the built-in MultiResourcePartitioner
).
When a Spring Batch job execution fails, you can restart it if the job instance is restartable. You can automate this, as long as the job execution is shut down gracefully, since this gives Spring Batch a chance to correctly set the job execution's status to FAILED
and set its END_TIME
to a non-null value. However, if the job execution fails abruptly, the job execution's status is still be set to STARTED
and its END_TIME
is null
. When you try to restart such a job execution, Spring Batch will think (since it only looks at the database status) that a job execution is currently running for this instance and fails with a JobExecutionAlreadyRunningException
. In such cases, the metadata tables should be updated to allow the restart of such a failed execution -- something like:
> update BATCH_JOB_EXECUTION set status = 'FAILED', END_TIME = '2020-01-15 10:10:28.235' where job_execution_id = X;
> update BATCH_STEP_EXECUTION set status = 'FAILED' where job_execution_id = X and step_name='failed step name';
Graceful/Abrupt shutdown of Spring Batch jobs is directly related to Kubernetes jobs restart policy. For example, with restartPolicy=OnFailure
, when a pod fails abruptly and the job controller creates a new pod immediately after, you cannot update the database in a timely manner and the new Spring Batch job execution fails with a JobExecutionAlreadyRunningException
. The same happens with the third pod and so on, until the pod reaches the CrashLoopBackOff
state and gets deleted once the backoffLimit
is exceeded.
Now, if you follow the best practice of running your Spring Boot Batch application with System.exit(SpringApplication.exit(SpringApplication.run(MyBatchApplication.class, args)));
as shown in the snippet above, Spring Boot (and, in turn, Spring Batch) can correctly handle SIGTERM
signals and gracefully shutdown your application when Kubernetes starts the pod termination process. With this in place, when pods are gracefully shutdown, the Spring Batch job instance can automatically restart until completion. Unfortunately, graceful shutdown of Kubernetes pods is not guaranteed, and you should take this into consideration when you set the restart policy and the backoffLimit
values, to ensure you have enough time to update the job repository as needed for failed jobs.
It should be noted that the shell
form of docker's ENTRYPOINT
does not send Unix signals to the sub-process running in the container. So in order to correctly intercept Unix signals by the Spring Batch job running in a container, the ENTRYPOINT
form should be exec
. This is also directly related to Kubernetes' pod termination process mentioned above. More details about this matter can be found in the Kubernetes best practices: terminating with grace blog post.
As I pointed out earlier, Spring Batch prevents concurrent job executions of the same job instance. So, if you follow the "Kubernetes job per Spring Batch job instance" deployment pattern, setting the job's spec.parallelism
to a value higher than 1 does not make sense, as this starts two pods in parallel and one of them will certainly fail with a JobExecutionAlreadyRunningException
. However, setting a spec.parallelism
to a value higher than 1 makes perfect sense for a partitioned job. In this case, partitions can be executed in parallel pods. Correctly choosing the concurrency policy is tightly related to which job pattern is chosen (As explained in point 3).
Deleting a Kubernetes job deletes its corresponding pods. Kubernetes provides a way to automatically clean up completed jobs by using the ttlSecondsAfterFinished
parameter. However, there is no equivalent to this in Spring Batch: You should clean up the job repository manually. You should take this into consideration for any serious production batch infrastructure, as job instances and executions can grow very quickly, depending on the frequency and number of deployed jobs. I see a good opportunity here to create a Kubernetes Custom Resource Definition that deletes Spring Batch's metadata when the corresponding Kubernetes job is deleted.
I hope this post has shed some light on the challenges of designing, developing, and running batch applications in the cloud and how Spring Batch, Spring Boot and Kubernetes can tremendously simplify this task. This post showed how to go from start.spring.io to Kubernetes in three simple steps, thanks to the productivity of the Spring ecosystem, but this is only scratching the surface of the matter. This post is the first part of a blog series in which I will cover other aspects of running Spring Batch jobs on Kubernetes. In the next posts, I will tackle job observability with Micrometer and Wavefront and then how to scale Spring Batch jobs on Kubernetes. Stay tuned!