Monday, February 24, 2020

Hadoop and Spark Locally

As a continuation of the last post, we now look at how to make it deployable in a proper Spark/Hadoop cluster. We will not go into the details of the setup of these clusters themselves but more into how do we make sure a program that we developed earlier could run as a job in a cluster.
We will continue with the setup in our local machine. I am using a Mac so the instructions are with respect to that but most instructions would be common to any other platform.
If Spark is processing data from a database and writing into a hive, pretty much what we did in the last post would work. The problem arises if some of the data being processed exists as flat files. If we want to submit our jobs to a Spark cluster, we can not use local files because the jobs are not running in the local file system. 
The best approach is to either use a hdfs cluster or deploy a single node hdfs on your machine. Here I am enumerating the steps to set up a single node hdfs cluster on a Mac OS X machine.
  • Download the Hadoop distribution for your machine here.
  • Hadoop distribution is available in the form of a .tar.gz file and you can expand it in some directory on your machine. The expansion will create a directory of the form hadoop-x.y.z assuming your Hadoop version is x.y.z. set the environment variable HADOOP_HOME to the full pathname of this Hadoop directory.
  • Add $HADOOP_HOME/bin to the PATH variable.
  • Now we need to update the configuration files for hadoop.
$ cd $HADOOP_HOME/etc/hadoop
$ vi core-site.xml

We update the file with the following properties.

$ vi hdfs-site.xml

We update the file with the following properties.
$ vi mapred-site.xml

We update the file with the following properties.
$ vi yarn-site.xml

We update the file with the following properties.
Now we start the Hadoop.
$ cd $HADOOP_HOME
$ sbin/start-all.sh
Now we can access files stored into the HDFS in our spark jobs.
The next post will go more into the details of how to process files in spark.

Tuesday, February 18, 2020

Hive and Spark

In this blog post, we take a slight deviation from core issues related to the spring framework and look at an issue that spring programmers might face regularly. Recently I was looking with Spark and found a need to read the data from MySQL, do some processing and write it back to a hive instance. We will look at this issue in this post.
We start by making sure our hive instance is backed by a database. To do this, we do the following.
$ cp $HIVE_HOME/conf/hive-default.xml.template $HIVE_HOME/conf/hive-site.xml
$ vi gnu/apache-hive-3.1.2-bin/conf/hive-site.xml

We edit the hive-site.xml file and make sure it is configured as below.
We can configure the values to suit our needs. But make sure the MySQL username, password, database URL exists and has relevant permissions.
Now we create another database in MySQL which will contain the data that we need to process. I am calling this database mystuff, with username mystuff, password mystuff123. We need to run following commands in MySQL to make sure everything exists and permissions are appropriate.
create database mystuff;
Query OK, 1 row affected (0.00 sec)

mysql> create user mystuff@localhost identified by 'mystuff123';
Query OK, 0 rows affected (0.00 sec)

mysql> create user mystuff@'%' identified by 'mystuff123';
Query OK, 0 rows affected (0.01 sec)

mysql> grant all on mystuff.* to mystuff@localhost;
Query OK, 0 rows affected (0.00 sec)

mysql> grant all on mystuff.* to mystuff@'%';
Query OK, 0 rows affected (0.00 sec)

Now we create a plain java project in IntelliJ with the following pom.xml file.
Now to look at the problem at hand. We have a table in MySQL with the following structure.
mysql> desc mydata;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int          | YES  |     | NULL    |       |
| k     | varchar(10)  | YES  |     | NULL    |       |
| v     | varchar(255) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

We want to flatten this table such that each key gets converted to a column for each id. So assume our current data is as below.
mysql> select * from mydata;
+------+-------+-------------------+
| id   | k     | v                 |
+------+-------+-------------------+
|    1 | NAME  | John Doe.         |
|    1 | EMAIL | jd@example.com    |
|    2 | NAME  | Jane Doe          |
|    2 | EMAIL | janed@example.com |
+------+-------+-------------------+
2 rows in set (0.00 sec)
We want to load this data and convert it into a flattened table that has three columns i.e. id, name. email. Then we want to populate this into table person into hive with flattened data.
hive (default)> select * from person;
OK
person.id person.email person.name
1 jd@example.com.  John Doe
2 janed@example.com Jane Doe
Time taken: 0.094 seconds, Fetched: 2 row(s)

The following code will perform the above conversion.
Line numbers 11 through 17 create a SparkSession for hive operations. The key instruction here is enableHiveSupport. Lines 20 through 26 create a SparkSession that will be used for MySQL operations. Lines 29 through 37 load the complete contents of the table. Lines 39 through 42 will group the results by the id and pivot the table on the field K. Line 46 through 49 creates a data frame for hiveSession and writes the contents in a table with named person.

Friday, April 12, 2019

Spring and more Kafka

In the last post, we saw how to integrate Kafka with Spring Boot application. The post was a very simple implementation of Kafka. The real world is much more complex. You have to deal with multiple topics, you need multiple partitions. In this post, we explore more details of a spring boot application with Kafka.
Let's start with the fact that we have multiple topics that we are dealing with. The very first thing that we need to do is to separate out the properties for each of the topics.

We can no longer use default KafkaTemplate, we will have to create our own. The best way is to define a KafkaProducer. To create a KafkaProducer, we need to create KarkaProducerConfig. Here are two sets of KafkaProducerConfig class, one each for the topic.


Looking at the producer config, we can observe that we create two beans which are qualified by names and return an appropriate KafkaTemplate which is later used to send the message. Similar to producer config, we need to create consumer config. Consumer config is used by the listener to listen for the message.


As we can see in each of the consumer configs, we create a bean which returns a ConcurrentKafkaListenerContainerFactory. The beans are qualified by a name so that we can use an appropriate container factory for receiving messages.
We also modify the MyTopicMessage and add a member variable topicName that will help us distinguish the topic to which the message needs to be sent.

We also modify the endpoint so that the message can be sent to the appropriate topic.

Now we modify the Listener to integrate everything so that the appropriate message can be received. Look at @KafkaListener annotation, we pass on the ListenerContainerFactory as an argument to receive a message from a queue.
Now we can test the server. The flow of the service is as below.

  1. The message is posted to the endpoint as a POST request
  2. The @RestController receives the message and based on the topicName in the request, it sends to message to the topic with the same name.
  3. The listener receives the message from the appropriate queue.
$ curl -X POST \
>   'http://localhost:8081/send?token=3193fa24-a0ba-451b-83ff-eb563c3fd43b-cdf12811-7e41-474b-8fa6-e8fefd4a738c' \
>   -H 'Content-Type: application/json' \
>   -H 'Postman-Token: 15fbe075-9c80-4af9-a797-6b5e0979fd1b' \
>   -H 'cache-control: no-cache' \
>   -H 'token: 3193fa24-a0ba-451b-83ff-eb563c3fd43b-cdf12811-7e41-474b-8fa6-e8fefd4a738c' \
>   -d '{
> "message" : "This is my message!",
> "topicName" : "FirstTopic"
> }'
Success!

The message receipt is indicated in the server log.
2019-04-12 14:50:47.142  INFO 51396 --- [ntainer#0-0-C-1] i.s.b.t.listeners.KafkaMessageListener   : Received FirstTopic message for partition 0 This is my message!
Similarly, we can send a message to SecondTopic.
$ curl -X POST   'http://localhost:8?token=3193fa24-a0ba-451b-83ff-eb563c3fd43b-cdf12811-7e41-474b-8fa6-e8fefd4a738c'   -H 'Content-Type: application/json'   -H 'Postman-Token: 15fbe075-9c80-4af9-a797-6b5e0979fd1b'   -H 'cache-control: no-cache'   -H 'token: 3193fa24-a0ba-451b-83ff-eb563c3fd43b-cdf12811-7e41-474b-8fa6-e8fefd4a738c'   -d '{
"message" : "This is another message!",
"topicName" : "SecondTopic"
}'
Success!

The message receipt is indicated in the server log.
2019-04-12 14:53:14.972  INFO 51396 --- [ntainer#1-0-C-1] i.s.b.t.listeners.KafkaMessageListener   : Received SecondTopic message for partition 0 This is another message!
The complete code base for this tutorial can be found at my github repository at v1.4.

Wednesday, April 10, 2019

Kafka and Spring

Kafka has become a very popular platform and is being used as a stream, journal and even eventing system. In this post, we explore how to integrate Kafka with spring framework application. First, we add the Kafka bootstrap server details in the application.properties file.

Let's also add dependencies in pom.xml.

Now, for each Kafka topic, we create a listener class. The listener class provides a callback method that is called when any message is retrieved on that topic.

Now we create an endpoint through which we inject a message in the queue. The message is sent to the queue and is retrieved by the listener.

We autowire a KafkaTemplate instance that is used to send the message to the queue.

$ curl -X POST \
>   'http://localhost:8081/send?token=3193fa24-a0ba-451b-83ff-eb563c3fd43b-cdf12811-7e41-474b-8fa6-e8fefd4a738c' \
>   -H 'Content-Type: application/json' \
>   -H 'Postman-Token: e281e3c5-0dae-4bb7-ac8d-6555f66a18c6' \
>   -H 'cache-control: no-cache' \
>   -H 'token: 3193fa24-a0ba-451b-83ff-eb563c3fd43b-cdf12811-7e41-474b-8fa6-e8fefd4a738c' \
>   -d '{
> "message" : "This is my message!"
> }'
Message sent successfully!.
The receipt of message is indicated in the spring server log.

2019-04-10 14:28:03.969  INFO 31091 --- [ntainer#0-0-C-1] i.s.b.t.listeners.MyTopicKafkaListener   : Received Promise message This is my message!

Sunday, March 10, 2019

Spring Boot and Docker Containers

With microservices based deployment, the first step is to dockerize your software. In our case, we want to create docker images for each of our microservices so that we can orchestrate them better. I have decided to use container registry provided by Google to build and upload the images.

The first thing we need to do is to create a dependency to spring-cloud-dependencies pom. Then we add a dependency to spring-cloud-config-server. Then we add a dockerfile-maven-plugin. Now we need to keep in mind that in the configuration for dockerfile-maven-plugin we need to provide the repository. If can see that our repository starts with gcr.io/. This makes sure that the image after creation is pushed to the container registry hosted by Google. If you want to have some other registry, you need to provide it in the form hostname:portnumber.
Now we can issue the following command the docker image would be build and pushed to google registry.

$ mvn deploy -DskipTests
[INFO] 
[INFO] --- dockerfile-maven-plugin:1.4.10:push (default) @ customer ---
[INFO] Using Google application default credentials
[INFO] loaded credentials for user account with clientId=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com
[INFO] The push refers to repository [gcr.io/myproject/rae/customer]
[INFO] Image 967d96afcc46: Preparing
[INFO] Image 36e051842720: Preparing
[INFO] Image d1646aaa6540: Preparing
[INFO] Image 19382582b926: Preparing
[INFO] Image 41715d8d7d2b: Preparing
[INFO] Image f3a38968d075: Preparing
[INFO] Image a327787b3c73: Preparing
[INFO] Image 5bb0785f2eee: Preparing
[INFO] Image f3a38968d075: Waiting
[INFO] Image a327787b3c73: Waiting
[INFO] Image 5bb0785f2eee: Waiting
[INFO] Image 36e051842720: Layer already exists
[INFO] Image 41715d8d7d2b: Layer already exists
[INFO] Image d1646aaa6540: Layer already exists
[INFO] Image 19382582b926: Layer already exists
[INFO] Image 967d96afcc46: Pushing
[INFO] Image a327787b3c73: Layer already exists
[INFO] Image 5bb0785f2eee: Layer already exists
[INFO] Image f3a38968d075: Layer already exists
[INFO] Image 967d96afcc46: Pushed
[INFO] 0.0.1-SNAPSHOT: digest: sha256:f6bad4811f867dd75225797bee684ea43c0ddaf2b83de1b419a9f75e9a3941bc size: 2001
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 47.638 s
[INFO] Finished at: 2019-03-10T19:58:27+05:30
[INFO] Final Memory: 55M/857M
[INFO] ------------------------------------------------------------------------

We can see below that the image is now pushed to google container registry.
$ docker images
REPOSITORY                     TAG                 IMAGE ID            CREATED             SIZE
gcr.io/myproject/rae/customer   0.0.1-SNAPSHOT      a566e2f28705        19 seconds ago      518MB
gcr.io/myproject/rae/customer                       8c61d1a5aef4        13 minutes ago      518MB

Tuesday, February 12, 2019

12. Adding GIT release information

In the previous post, we saw how to enable actuator endpoints on our spring server. Once we have done that, it is a good idea to add GIT release information to the server in order to get the information related to the currently deployed server at runtime.
We add the git-commit-id-plugin to our pom.xml.

   <plugin>
    <groupid>pl.project13.maven</groupid>
    <artifactid>git-commit-id-plugin</artifactid>
    <version>2.2.1</version>
   </plugin>
The next things to do is to create a git.properties file in the resource directory of the project.

The git-commit-id is added by the maven plugin so it is a good idea to build the project using maven.

$ mvn clean package -DskipTests
Now we can run the server and check the /manage/info endpoint using curl command.
$ curl -X GET http://localhost:9091/manage/info
{"git":{"branch":"master","commit":{"id":"6fb94c0","time":1549868235.000000000}}

11. Spring Actuators

Spring provides actuators that are a helpful set of tools to debug the application on runtime. Here is how to enable them. We first add actuator dependency in the pom.xml
<dependency>
  <groupid>org.springframework.boot</groupid>
  <artifactid>spring-boot-starter-actuator</artifactid>
</dependency>

Now we define a prefix for all the actuator endpoints. We add the following line into the application.properties file. This enables all the actuator endpoints. We can enable specific endpoints by adding a comma delimited list of endpoints. We also deploy management endpoint on a separate port so that we can block its access from something like ELB.
management.endpoints.web.base-path=/manage
management.server.port=9091
management.endpoints.web.exposure.include=*
Since we already have a security filter defined, we need to exempt health and info endpoint from security check. We add the following URLs int he SecurityConfiguration configure method.
@Override
    public void configure(WebSecurity web) throws Exception {
        web.ignoring().antMatchers("/manage/health");
        web.ignoring().antMatchers("/manage/info");
        web.ignoring().antMatchers("/webjars/**");
        web.ignoring().antMatchers("/error");
        web.ignoring().antMatchers("/swagger-ui.html");
        web.ignoring().antMatchers("/v2/api-docs/**");
        web.ignoring().antMatchers("/swagger-resources/**");
    }
Here we have added paths related to error, actuator, and swagger.
This enables actuator endpoints for our server. We can query these endpoints and following is the sample response.

$ curl -X GET http://localhost:9091/manage/health
{"status":"UP"}

Monday, February 11, 2019

10. Application with multiple datasources

Many times it is a practical requirement to have multiple databases for a single application. These databases could be at different locations on the cloud and different entities in your application may be dealing with these databases.
Multi-tenancy is a great requirement when multiple data sources are needed. Many tenants may insist on having their own databases. Here we present how we can configure the spring application to interact with multiple databases.
The first step is to look at our application.properties file. We have a list of properties defined for the default dataSource which we will need to replicate for our second database. Let's assume we are going to use two data sources, the first one is called the user and the second one is called other.

As we can see above, we have replicated all the data source properties and given it a new prefix, other. Now, these two databases could have completely independent settings. They could point to totally different databases. Each of these properties could be configured to completely different settings. Here we have just changed the name of the database, username, password.
To make this work. we will have to split the repositories and the entity objects for each of the data sources. Here we create the following hierarchy of packages for each of the data sources.
src/main/java
- in.springframework.blog.tutorials
 - user
    - domain
    - repository
  - other
    - domain
    - repository
As we can see for each of the data sources, we have a domain package that would contain the entities and a repository package that would contain the repository class. This is needed so that each of the entity managers only searches for its own classes.
Now we need to define a configuration of each of the data sources that we have defined. The first data source will contain the user table and will also be the primary data source.

As we can see in the class above, it uses all the properties prefixed with spring.datasource and scans directories related to the user data source. Now let's look at the configuration for other data source.

The things to note in the other data source is that the @Primary annotation doesn't exist because we can have only one set of primary beans of a type. Also, the directories to be searched are for the other data source's domain and repository objects.
At this time we also move the old User and UserRepository classes to their respective subdirectories. We also create Other and OtherRepository classes in their respective subdirectories. We also change CrudRepository to JpaRepository in each of the repository classes.
Now, our application is set up to use two different data sources and we can verify that by running the application. Since we have set ddl-auto property to update, it should create a new schedule when the application is run.
$ mysql -u tutorial -ptutorial123 tutorial
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 115
Server version: 8.0.12 Homebrew

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;
+--------------------+
| Tables_in_tutorial |
+--------------------+
| hibernate_sequence |
| user               |
+--------------------+
2 rows in set (0.00 sec)

mysql> desc user;
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | bigint(20)   | NO   | PRI | NULL    |       |
| email      | varchar(255) | YES  | UNI | NULL    |       |
| fullname   | varchar(255) | YES  |     | NULL    |       |
| password   | varchar(255) | YES  |     | NULL    |       |
| username   | varchar(255) | YES  | UNI | NULL    |       |
| auth_token | varchar(255) | YES  | UNI | NULL    |       |
| expiry     | datetime     | YES  |     | NULL    |       |
| mask       | bigint(20)   | NO   |     | NULL    |       |
| authToken  | varchar(255) | YES  |     | NULL    |       |
+------------+--------------+------+-----+---------+-------+
9 rows in set (0.00 sec)

$ mysql -u other -pother123 other
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 116
Server version: 8.0.12 Homebrew

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show tables;
+--------------------+
| Tables_in_other    |
+--------------------+
| hibernate_sequence |
| other              |
+--------------------+
2 rows in set (0.00 sec)

mysql> desc other
    -> ;
+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id        | bigint(20)   | NO   | PRI | NULL    |       |
| otherData | varchar(255) | YES  |     | NULL    |       |
+-----------+--------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
The source code for this tutorial is available at git repository as v1.2.

Wednesday, January 23, 2019

9. Role based data access

Many times we have a need to return data from an endpoint based on the role. Springframework provider easy mechanism for us to be able to do that. Let's take an example. In the previous example, we want to add a GET method in UserEndpoint that returns all the users. For any safe system, we want to return all the users if the role is ADMIN but if the role is USER then we want to return only that particular user. We don't want to write multiple methods for that purpose.
    @RequestMapping(method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
    @PostFilter("hasAuthority('ADMIN') or filterObject.authToken == authentication.name")
    public Iterable getUsers() {
        Iterable users = userRepository.findAll();
        return users;
    }
The method is described above. As we can see it is an extremely simple method, it calls findAll method on the repository which will return all the valid user records and returns an Iterable collection back. The interesting aspect of this method is in the @PostFilter annotation. Let's try to understand the annotation. The first condition is hasAuthority('ADMIN'). It implies that if the authenticated role is ADMIN then return the records as it is. The next bit of the filter condition uses an object called filterObject. This is an automatically defined expression that we can use in @PostAuthorize filter. A complete list of all the expressions exists here. The expression filterObject is used for each element of a collection that is returned from the endpoint. Since we know that when this endpoint is called, we would only be authenticating using authToken, the name of authentication in security context is set to the token itself. We can verify this in the code in AuthenticationFilter class.
            else {

                Optional token = getOptionalHeader(httpRequest,"token");
                TokenPrincipal authTokenPrincipal = new TokenPrincipal(token);
                processTokenAuthentication(authTokenPrincipal);
            }

The code fragment described above creates a TokenPrincipal object with the token as its name. That is the reason we have the condition filterObject.authToken == authentication.name. Taking the complete condition of @PostFilter we can see that the condition implies that return everything unconditionally if the role is ADMIN otherwise return the users with authToken as the currently authenticated user.
We have two users defined on the system. One has a role of the user and the other has a role of admin. Here are the examples of what happens when we call the endpoint with both of these users.
The first example is with a user with role USER.
$ curl -X GET "http://localhost:8081/user" -H "accept: application/json" -H "token: 3d47912d-73a0-4c4c-95e6-0486273d6221-28fa4f38-0f1b-4740-8e1d-3228288de631" | python -m json.tool
[
    {
        "authToken": "3d47912d-73a0-4c4c-95e6-0486273d6221-28fa4f38-0f1b-4740-8e1d-3228288de631",
        "email": "user@springframework.in",
        "expiry": 1548345909000,
        "fullname": "User",
        "id": 2,
        "mask": 1,
        "password": "User123",
        "username": "user"
    }
]

The second example is with a user with role ADMIN.
$ curl -X GET "http://localhost:8081/user" -H "accept: application/json" -H "token: a137dd09-11e4-4dcf-a141-0b235d39a505-60d43bf4-3674-4248-be1d-c2669f14589f" | python -m json.tool
[
    {
        "authToken": "3d47912d-73a0-4c4c-95e6-0486273d6221-28fa4f38-0f1b-4740-8e1d-3228288de631",
        "email": "user@springframework.in",
        "expiry": 1548345909000,
        "fullname": "User",
        "id": 2,
        "mask": 1,
        "password": "User123",
        "username": "user"
    },
    {
        "authToken": "a137dd09-11e4-4dcf-a141-0b235d39a505-60d43bf4-3674-4248-be1d-c2669f14589f",
        "email": "admin@springframework.in",                                                                                         
        "expiry": 1548348263000,                                                                                                     
        "fullname": "Administrator",                                                                                                 
        "id": 3,                                                                                                                     
        "mask": 4,                                                                                                                   
        "password": "Admin123",                                                                                                      
        "username": "admin"                                                                                                          
    }
]

As we can see above, the call with the role USER only returns the object related to that particular user while the call with the role returns all the users present in the system. This is how we can achieve role based object access without writing multiple endpoints.

Tuesday, January 22, 2019

8. That little matter of creating a user

Now that our simplified authentication system is in place, we are faced with the little matter of how to create a new user. Since we don't have a username, password, or token we can't really create a new user.
To accomplish that, we need to make some modifications to our AuthenticationFilter.  In the doFilter method, we add a special if block to take care of user creation.
            if (httpRequest.getRequestURI().toString().equals("/user") && httpRequest.getMethod().equals("POST")) {

                Optional username = getOptionalHeader(httpRequest,"username");
                UsernamePasswordPrincipal usernamePasswordPrincipal = new UsernamePasswordPrincipal(username, username, true);
                processUsernameAuthentication(usernamePasswordPrincipal);
            }
            else if (httpRequest.getRequestURI().toString().equals("/authenticate") && httpRequest.getMethod().equals("POST")) {

                Optional username = getOptionalHeader(httpRequest,"username");
                Optional password = getOptionalHeader(httpRequest,"password");
                UsernamePasswordPrincipal usernamePasswordPrincipal = new UsernamePasswordPrincipal(username, password);
                processUsernameAuthentication(usernamePasswordPrincipal);
            }

As we can see the code fragment, if the call is made to /user endpoint with POST method, we look for a username header and trigger spring authentication. We also need to make a change in the UsernamePasswordPrincipal to take a flag that would tell us if the user is a new user or existing user.
Now that we have modified the principal and filter, we need to handle this in the provider. The provider that gets invoked for username and password authentication is UsernamePasswordAuthenticationProvider.
As we can see in the provider's authenticate method, we have added an if block that checks if this is a new user. In case it is a new user, we authenticate this user with a role NEWUSER. The create user endpoint is only allowed to be called for a role NEWUSER.
Now that we have stitched the path for authentication of a new user to create user endpoint, we can see the endpoint itself.
    @RequestMapping(method = RequestMethod.POST, produces = MediaType.APPLICATION_JSON_VALUE)
    @PreAuthorize(MyConstants.ANNOTATION_ROLE_NEWUSER)
    public Optional createUser(@RequestHeader(value="username") String username, @RequestBody User user) {
        user.setMask(Role.USER.ordinal());
        User storedUser = userRepository.save(user);
        storedUser.setPassword(null);
        return Optional.of(storedUser);
    }
As we can see, we have a @PreAuthorize added with NEWUSER role. We also set the role of the user to the USER. Before returning the response, we set the password to null.
This fixes our service to allow the creation of a new user. The complete code is available tagged as v1.1 for this and previous posts.