Spring Data Redis: High-Availability with Sentinel

1. Overview

For high-availability with Redis, we can use Spring Data Redis’ support for Redis Sentinel. With Sentinel, we can create a Redis deployment that automatically resists certain failures.

Redis Sentinel also provides other collateral tasks such as monitoring, notifications and acts as a configuration provider for clients.

At a high level, Sentinel’s capabilities are:

  • Automated failover. When a master is not working as expected, Sentinel starts a failover process for us where a slave is promoted to master. Additionally, the other slaves are reconfigured to use the new master and the applications using the Redis server are informed about the new address to use.
  • Configuration source. When a failover happens, Sentinels will report the new address. This is because Sentinel functions as a source of authority for clients. When clients do service discovery, they connect to Sentinels to request the address of the current Redis master responsible for a given service.
  • Monitoring. Sentinel periodically checks if our master and slave instances are working as they are intended to.
  • Notifying. Sentinel can be configured to notify a variety of targets when an error occurs with one of the Redis instances. These targets include other applications, a sysadmin, or an API.

2. How to Run Sentinel

A stable release of Sentinel has shipped with Redis since Redis 2.8.

Starting Sentinel is very easy. When we reviewed Spring Data Redis (with Spring Boot) in my previous article, we installed Redis using homebrew on Mac. This command allows us to run Sentinel with that installation:

redis-sentinel /path/to/sentinel.conf

If we are using the redis-sentinel executable (or if have a symbolic link using that name to the redis-server executable), we can run Sentinel with the above command as well.

Alternatively, we can use the redis-server executable and start it in Sentinel mode, like this:

redis-server /path/to/sentinel.conf --sentinel

3. Key Concepts to Know Before Deploying Sentinel

Some concepts we should review before deploying to Sentinel include:

  1. We require at least three Sentinel instances for a durable Redis deployment.
  2. We should place the three Sentinel instances into computers or virtual machines that are believed to fail independently rather than together. For instance, this could mean different availability zones.
  3. Redis uses asynchronous replication and therefore does not guarantee that received writes are kept during failures, even when using Sentinel. However, we can deploy Sentinel that mitigates the amount of time that writes can be lost.
  4. Any high-availability setup must be tested periodically and Sentinel is no different. We need to test in both development environments and in our production environments. By planning and testing for failure, we limit our failures.

4. Configuration in Spring Data

When we use a Sentinels-based configuration, we do not provide the Redis host/port information to Spring Data Redis. Instead we provide the property for the master server and a list of Sentinel URLs. Each Sentinel process has its own configuration file that lists the master Redis server, such as:

sentinel monitor themaster 127.0.0.1 6379 2
sentinel down-after-milliseconds themaster 60000
sentinel failover-timeout themaster 180000
sentinel parallel-syncs themaster 1

Once we have configured our master, slaves and Sentinels, we need to change the spring data redis configuration in our application to work with the sentinels.

4.1 Java Configuration

The Java configuration can be done using both Jedis and Lettuce:

/**
 * Jedis
 */
@Bean
public RedisConnectionFactory jedisConnectionFactory() {
  RedisSentinelConfiguration sentinelConfig = new RedisSentinelConfiguration()
  .master("themaster")
  .sentinel("127.0.0.1", 26579)
  .sentinel("127.0.0.1", 26580);
  return new JedisConnectionFactory(sentinelConfig);
}

/**
 * Lettuce
 */
@Bean
public RedisConnectionFactory lettuceConnectionFactory() {
  RedisSentinelConfiguration sentinelConfig = new RedisSentinelConfiguration()
  .master("themaster")
  .sentinel("127.0.0.1", 26579)
  .sentinel("127.0.0.1", 26580);
  return new LettuceConnectionFactory(sentinelConfig);
}

4.2 Properties Configuration

A ProperySource, such as application.properties, can be used for the configuration. For example, if we use a localhost:

spring.redis.sentinel.master= themaster # Name of our Redis server.
spring.redis.sentinel.nodes= localhost:26579, localhost:26580, localhost:26581 # Comma-separated list of host:port pairs.

5. Conclusion

Today we reviewed how high-availability can be achieved with Redis by using Sentinel and how Spring Data Redis supports this is in our Spring applications. For more information about Sentinel, the Redis website is a good source.

On my site, there’s also information starting with Spring Data Redis and Spring Boot and several articles about the Spring Framework in general.

AngularJS Scope

1. Overview

In this post, we are reviewing scope in AngularJS. It is passed as an argument when we make a controller:

<script>
var myApp = angular.module('myApp', []);

myApp.controller('myController', function($scope) {
    $scope.name = "Red Dead Redemption";
});
</script>

Scope is actually an object that refers to the model in an application structure, such as Model View Controller (MVC) .

It provides definitions – also known as context – for JavaScript-like code snippets called expressions. Scopes are structured in a hierarchy that mimics the Document Object Model (DOM) structure of the application. Scopes can watch expressions and propagate events in a similar way to DOM events.

2. Scope is a Data Model

What does it mean for Scope to be a data model? It is a JavaScript object with properties and methods that can be accessed by both the view and controller:

diagram showing how $scope object is accessed in angularjs

Here, we have an example that demonstrates how modifying the view can affect the controller and model:

<!DOCTYPE html>
<html>
   <script src = "https://ajax.googleapis.com/ajax/libs/angularjs/1.7.5/angular.min.js"></script>
<body>

<div ng-app="demoApp" ng-controller="demoCtrl">

<input ng-model="word">

<h1>Hello, {{word}}!</h1>

</div>

<script>
var app = angular.module('demoApp', []);
app.controller('demoCtrl', function($scope) {
    $scope.word = "word";
});
</script>

<p>If we change the word in the input field, the change will affect the model and the word property in the controller.</p>

</body>
</html>

Feel free to copy and paste that code into your favorite text editor or download the file from my Github.

In this simple example, if we modify the input in the web page, we see the value change for the greeting in the <h1> tags:

3. Root Scope & Hierarchies

Each AngularJS application has only one root scope, but can have any number of child scopes.

An application can have several scopes because directives can create new child scopes. When new scopes are created, they become children of their parent scope. This creates a tree structure which parallels the DOM where they’re attached.

The example below, which is available in my Github, shows how multiple scopes work in an application and also prototypal inheritance of properties:

<script src = "https://ajax.googleapis.com/ajax/libs/angularjs/1.7.5/angular.min.js"></script>
  <script>
  (function(angular) {
  'use strict';
angular.module('scopeExample', [])
  .controller('HelloController', ['$scope', '$rootScope', function($scope, $rootScope) {
    $scope.name = 'World';
    $rootScope.department = 'Red Dead Redemption 2';
  }])
  .controller('ListController', ['$scope', function($scope) {
    $scope.names = ['Arthur', 'Dutch', 'Bill'];
  }]);
})(window.angular);

  </script>
  
</head>
<body ng-app="scopeExample">
  <div class="show-scope-demo">
  <div ng-controller="HelloController">
    Hello {{name}}!
  </div>
  <div ng-controller="ListController">
    <ol>
      <li ng-repeat="name in names">{{name}} from {{department}}</li>
    </ol>
  </div>
</div>

The page will display like this:

Multiple scope example in AngularJS. Shows prototypal inheritance.

In the code shown above, when [[name]] is evaluated by AngularJS, it first looks at the scope associated with the value for the ng-controller attribute for the name property. This is why it says “Hello World!” on top rather than a character name from Red Dead Redemption 2.

If the property is not found, it searches the parent scope and so on until the root scope is reached. The AngularJS documentation calls this prototypical inheritance, while others like Mozilla call it prototypal inheritance.

Regardless of what it’s truly called, what we need to know is JavaScript only has one construct: objects. Every object has a private property that holds a link to another object called its prototype.

4. Conclusion

Today we reviewed the core concepts of scope in AngularJS. We reviewed that it is an object that resides in the model of an application; that it is hierarchical and has prototypal inheritance; and, there can only be one root scope.

AngularJS can be used effectively with the Spring Framework. To learn more AngularJS, I will be checking out courses on Udemy and Treehouse.

DevSecOps for Authorization

1. Overview

What is DevSecOps? DevSecOps refers to the strategy of development, security, and operations teams working hand-in-hand on their projects, rather than working in isolation. Each component of DevSecOps – development, security, and operations – is meant to be integrated into the processes of its fellow components. For example, in terms of security, DevOps should be part of the lifecycle of security procedures. 

If we are to apply DevOps to security, we must treat security as code. In this article, we will review how by treating authorization policies as code, we can effectively bring authorization into the strategy of DevSecOps.  

2. Centralized and Externalized Access ControI

In order to practice the agility and responsiveness that the strategy of DevSecOps calls for, access control must be centralized and externalized from applications, similar to what is described in the eXtensible Access Control Markup Language (XACML). Centralizing and externalization authorization also makes an organization safer as well because the security policies are in one place rather than baked into every application. This means we have to review one set of policies rather than several! 

A modern trend is microservices. A common issue is a microservice implementing authorization and not following the the principle of single responsibility. Both monolithic and microservice applications need to externalize and centralize their authorization.

3. Version Control of Policies

In order to treat security as code, we need to apply version control our authorization policies. The benefits of using version control on our policies include:

  • The ability to roll back to a previous policy if an issue is encountered with a new version.
  • To properly deploy policy in development, QA, and production.
  • To effectively collaborate within a security policy team, as you can compare policies, identify differences, and merge changes as see fit.

As we can see, the benefits of using version control amount to more agility and responsiveness, which are cornerstones of DevSecOps. 

4. Automation

By integrating our externalized and centralized authorization software with an automation server, such as Jenkins, we can automate:

  • Deployment of policies from our version control system, such as Git.
  • Acceptance tests that ensure that critical authorization errors aren’t part of the new policy. 

5. Conclusion

By following the DevSecOps principles we discussed here today, we can greatly improve the efficiency and responsiveness of authorization. The benefits of implementing these changes lead to a more secure organization. 

To read more about modern authorization, check out my posts Authorizing Resources Based On Who Created Them and Expression-Based Access Control.

Grails with Spring Security

1. Overview of Spring Security Integration with Grails

Spring Security touts a number of authentication, authorization, instance-based, and various other features that make it so attractive to secure applications with.

With this in mind, due to Grails use of Spring’s Inversion of Control Framework and MVC setup, developers sought to use Spring Security to secure Grails.

This has resulted in two notable plugins: Spring Security Core Plugin and Spring Security ACL Plugin.

We will be reviewing the capabilities of these Spring Security plugins and making comparisons to using Spring Security for a plain old Spring application. 

2. Spring Security Core Plugin

This plugin provides practical defaults with many configuration options for customization. 

2.1 Domain Classes

The Spring Security Core Plugin uses the default Grails domain classes. In order to use the standard lookup for the plugin, we need at a minimum a Person and Authority domain class. 

If we want to store URL <==> Role mappings in the database (which is one of several approaches for defining the mappings), we need a Requestmap domain class. If we use the recommended approach for mapping the many-to-many relationship between Person and Authority, we also need a domain class to map the join table.

To use the user/group lookup, we’ll also need a Group domain class. If we are using the recommended approach for mapping many-to-many relationship between Person and Group and between Group and Authority we’ll need a domain class for each to map the join tables. We can still additionally use Requestmap with this approach.

We can use the s2-quickstart to generate domain classes. The syntax is quite easy:

grails s2-quickstart DOMAIN_CLASS_PACKAGE USER_CLASS_NAME ROLE_CLASS_NAME [REQUESTMAP_CLASS_NAME] [--groupClassName=GROUP_CLASS_NAME]

An example with Person, Authority, and Requestmap:

grails s2-quickstart com.ourapp Person Authority Requestmap

2.2 Configuring Request Mappings for Securing URLs

We can choose among the following approaches to configure request mappings for securing URLs:

We can only use one method at a time. 

For example, here is use of the @Secured annotation with Spring Expression Language (SpEL): 

class SecureController {

   @Secured("hasRole('ROLE_USER')")
   def someRandomAction() {
      ...
   }

2.3 Various Other Features

Some various features of the Spring Security Core Plugin include:

  • Helper classes for dealing with lower levels of Spring Security, such as a SecurityTagLib that provides GSP tags to support conditional display based on whether the user is authenticated, and/or has the required role to perform a particular action.
  • Events – including event notifications, event listeners, and callback closures. 
  • Filters, including the ability  to define which filters are applied to different URL patterns. 

3. Spring Security ACL Plugin

The Spring Security ACL Plugin adds Domain Object Security support to a Grails application that uses the aforementioned Spring Security Core Plugin. So, we need to have that other plugin already in our build.gradle

What does it mean to add Domain Object Security support? The Spring Security Core plugin and other extension plugins support restricting access to URLs via rules that include checking a user’s authentication status, roles, etc. and the ACL plugin extends this by adding support for restricting access to individual domain class instances.

3.1 Method Security

The four annotations typically available in Spring Security are available for use with Spring Expression Language (SpEL) expressions to perform Expression-Based Access Control:

  • @PreAuthorize
  • @PreFilter
  • @PostAuthorize
  • @PostFilter

The above annotations are all documented in the Method Security Expressions portion of the Spring Security documentation. 

The ability to use method security is a very significant difference between the Spring Security ACL Plugin and the Spring Security Core Plugin. If we are to implement fine-grained access control, the Spring Security ACL Plugin is a must have for this reason.

Thankfully, besides the syntax differences between Groovy and Java, the code really looks the same:

  @PreAuthorize("hasRole('ROLE_USER')")
   @PostFilter("hasPermission(filterObject, read) or " +
               "hasPermission(filterObject, admin)")
   List getAllDocs(params = [:]) {
      Report.list(params)
   }

3.2 Domain Classes

Like the Spring Security Core Plugin, the Spring Security ACL Plugin uses domain classes for appropriate structuring.

The domain classes, in this case, are used to manage database state. To be compatible with the typically JDBC-based Spring Security code, domain classes are created to generate the table and column names.

The classes in this plugin associated with persistence use these classes. However, they can be overridden by running the s2-create-acl-domains script:

grails s2-create-acl-domains

So, the script will generate the same domain classes in our application’s grails-app/domain folder to allow some customization.

3.3 Various Other Features

Some various features of the Spring Security ACL Plugin include:

  • Run-As-Authentication Replacement: this is a temporary authentication switch that only lasts for one method invocation.
  • Custom permissions: There are 5 permissions available from the BasePermission class: READWRITECREATEDELETE, and ADMINISTRATION. You can add your own permissions, if you need.
  • Tag library (taglib) for permit and deny. 

4. Conclusion

The Spring Security Core Plugin offers a number of very useful features for securing Grails with Spring Security, but in order to implement more sophisticated, fine-grained authorization it is necessary to use the Spring Security ACL Plugin in conjunction. 

Recommended reading: Authorizing Resources Based On Who Created Them (Spring Security) and my posts about the Spring Framework for general Spring knowledge.

Software Engineer Salary

1. Overview

A concern for many – if not all – software engineers when considering employment is: what is a competitive software engineer salary? Whether you are an aspiring software engineer or a seasoned one, we will be reviewing various software engineer salaries including different locations. 

2. Google Software Engineer

This is really more or less the highest salary and likely total compensation as you will get, depending on your experience level.

Here we have a screenshot from Glassdoor showing the salary and total compensation for a Google software engineer in San Francisco, CA:

Average total pay 127K for google software engineer

 There are 314 salaries reported for a Google software engineer in San Francisco, so let’s assume this is the average, basically a mid-level compensation. Does $127,000 sound like a lot of money to you? With the median home price reported last year as $1.61 million, it won’t take you very far in terms of purchasing a home, even if your spouse makes the same amount as you.

If you are making $200,000 or more, as a single earner, you may be able to afford a $1 million home. If your spouse makes at least about $150,000, you can successfully purchase a home around the median house price of $1.61 million in San Francisco, according to RedFin:

If you work in New York City as a software engineer, your salary may be on average $1,000 lower than your colleagues in San Francisco:

But the compensation you’d receive in New York City may actually go further than it would in San Francisco. That’s because although the salary on average may be lower, the cost of living in NYC is lower too. It is reported that you need $40,000 more annually in San Francisco to live comfortably than in NYC. That’s a HUGE difference!

3. Morgan Stanley Software Engineer

If you end up working at a large financial company as a software engineer like Morgan Stanley, don’t worry, the compensation looks just fine. You’ll be making on average 96% of what a Google software engineer makes:

You may notice the Average Base Pay is noticeably lower than the one for a Google software engineer. This difference is made up through the bonuses the software engineers receive at Morgan Stanley. I’ve heard this is normal in various positions in the financial industry, but of course, I encourage you to do your due diligence if you considering employment with Morgan Stanley or another similar company. 

4. REI Systems Software Engineer

The Washington D.C. area is about 10% less expensive than New York City. With this in mind, the average salary by REI Systems for a software engineer in the Washington D.C. Metro Area is not impressive:

You cannot live comfortably inWashington D.C. on less than $90,000 according to the article I shared above and believe me this number is really only accurate for a single person with no responsibilities. They mention that people live in Maryland and Virginia to save on costs. 

Well, in order to get a reasonable single-family home in the same city as REI Systems’ headquarters, Sterling, you’ll need to have double the income. 

Likewise, you’ll need to nearly have double the income to be a middle class household in the county that REI Systems resides in.

5. Amazon Web Services Software Engineer

As we previously saw, a single person needs approximately $90,000 in the Washington D.C. area to live comfortably. It sounds more or less true to me as someone who lives in the area. Consider that as responsible individuals we need to max out our 401Ks and save other money as well.

Amazon pays their entry level Software Engineers on average nearly $30,000 more than REI Systems:

That’s a huge difference, particularly if you consider that money growing over time, if you invest it. 

6. Conclusion

Different employers pay different salaries to their software engineers. Your friend who lives in your same town may make $30,000 more than you for doing the exact same work.

So what’s the moral of the story? Be sure to investigate the compensation for your line of work when looking at employment opportunities. Also, if you are considering moving, evaluate the cost of living in each area versus the salary that is offered.

We didn’t cover this, but also be sure to look at benefits. 401K matching, health care, and other benefits are all part of your total compensation package that you need to closely evaluate. 

If you are looking at ways to improve your chances of getting hired, consider starting a blog to market yourself and taking courses on Udemy to keep your knowledge current.

Expression-Based Access Control

1. Overview

Today, we’ll be reviewing the differences between Expression-Based Access Control (EBAC), Role Based Access Control (RBAC), and Attribute Based Access Control (ABAC), with a deeper focus on EBAC.

2. What is Expression-Based Access Control?

Simply put, Expression-Based Access Control is the use of expressions to write authorization.

The phrase Expression-Based Access Control (EBAC) is currently most commonly associated with the use of the Spring Expression Language expressions to write authorization.

It was in Spring Security 3.0 that the ability to use Spring EL expressions as an authorization mechanism in addition to the simple use of configuration attributes and access-decision voters was introduced.

However, using expressions for access control is NOT limited to just Spring Security! This blog post is partially a request to the greater community to recognize the use of expressions in authorization as Expression-Based Access Control (EBAC), since it is uniquely different than other forms of access control, due to its ability to let you implement other forms of access control such as RBAC and ABAC.

Other examples of EBAC include the Access Control Expressions (ACE) in MapR and Dynamic Access Control in Windows. There may others as well, such as the PHP Framework Symfony.

Is Expression-Based Access Control (EBAC) Equivalent to Attribute Based Access Control (ABAC)?

No, but ABAC can be implemented with EBAC.

Here is a high level definition of ABAC according to NIST Special Publication 800-162:

An access control method where subject requests to perform
operations on objects are granted or denied based on assigned attributes of the subject, assigned
attributes of the object, environment conditions, and a set of policies that are specified in terms of those
attributes and conditions

With this in mind, we could write our own using an expression language, such as Spring Expression Language based expressions, that can then call with the existing @PreAuthorize, @PostAuthorize, @PreFilter and @PostFiler, sec:authorize tags and even from intercept-url conditions.

Is Expression-Based Access Control (EBAC) Equivalent Role Based Access Control (RBAC)?

No, EBAC is not equivalent to RBAC, but RBAC comes built-in to certain expression languages such as Spring EL. For instance, there are these two common expressions that allow us to implement RBAC with ease:

  • hasRole([role])
  • hasAnyRole([role1,role2])

However, when writing fine-grained authorization rules, we easily begin to write expressions that surpass the granularity level of RBAC.

3. Web Security Expressions

EBAC implementations, such as Spring Security, allow us to secure URLs. The expressions should evaluate to true or false, defining whether or not access is granted. An example of restricting access in a RESTful application base on userID in a Java configuration:

http
.authorizeRequests()
.antMatchers("/user/{userId}/**").access("@webSecurity.checkUserId(authentication,#userId)")
...

4. Method Security Expressions

Method security is more complicated than permit or deny.

For example, in Spring Security, there are four annotations that take expression attributes to perform pre and post-invocation authorization checks and also to support filtering of submitted collection arguments or return values.

@PreAuthorize, which is the most commonly used, decides whether a method can actually be invoked or not.

@PostAuthorize, an uncommonly used annotation, performs an access-control check after the method has been invoked.

With @PostFilter, Spring Security iterates through the returned collection and removes any items for which the provided expression is false.

@PreFilter allows us to filter before the method call, but this is less commonly used.

Below we have an example of combining PreAuthorize with @PostFilter for more fine-grained security:


@PreAuthorize("hasRole('USER')")
@PostFilter("hasPermission(filterObject, 'read') or hasPermission(filterObject, 'admin')")
public List<contact> getAll();

5. When to Use Expression-Based Access Control (EBAC)?

If the security we need requires more granularity than simple Access Control Lists (ACLs), then we need to use EBAC. How we decide to implement EBAC is a matter what resources we have available to us. For instance, in an organization that uses Spring Security, then why not use their Spring EL? Likewise, if we have MapR, then we’d use their Access Control Expressions.

In other situations, in order to meet the needs of the organization, it may be required to write our own expression language in the favored language in order to implement EBAC. The reason why we’d spend time doing this, of course, is to allow us to implement whatever kind of access control we want, with the conditions we want. Once we have the adequate expression language to accomplish this, another benefit is that we are less likely to rely on others – whether commercial off the shelf products or open source.

6. Conclusion

Various software comes with the ability to write authorization using expressions, such as MapR, Windows, and, of course, Spring Security. If fine-grained access control can be accomplished using the expressions, I refer to it – and suggest you refer to it – as Expression-Based Access Control (EBAC). By giving it a name, we are more likely to use it to secure our systems over traditional RBAC. This is good because fine-grained access control, when done properly, is more likely to prevent breaches.


Resource and Dependency Injection in Java EE 7

1. Overview

Contexts and Dependency Injection (CDI) is a feature of Java EE that helps meld the web tier and transactional tier of its platform that is included in Java EE 6 and higher. From a technical perspective, what this means is that CDI offers a dependency injection framework and also manages the dependencies’ lifecycle.

In this tutorial today, we will be covering CDI for Java EE 7.

1.1 Contexts and Dependency Injection Specification

As mentioned on Oracle’s Java EE 7 website, Java EE 7 uses CDI 1.1, which is outlined in JSR 346.

CDI 1.1 brought many major changes, as mentioned in this blog post by the CDI lead Pete Muir, such as:

  • Global enablement of interceptors, global enablement of decorators, and alternatives using the @Priority annotation
  • Support for @AroundConstruct lifecycle callback for constructors
  • EventMetadata to allow inspection of event metadata
  • Allowing binding interceptors to constructors

As mentioned, other significant changes are mentioned in the blog post, and it is encouraged that they are all reviewed.

2. Comparing Dependency Injection and Resource Injection

Injection Type Can Inject JNDI Resources Directly Can Inject Regular Classes Directly Resolves By Typesafe
Resource Injection True False Resource name No
Dependency Injection False True Type Yes

2.1 Dependency Injection

Dependency injection allows us to turn regular Java classes into managed objects and to inject those managed objects into other managed objects. The hurdle is ensuring we are providing the correct managed object at the right time.

Here we have an @Inject annotation that denotes that we will be providing – also known as injecting – a dependency to this constructor:

@Inject
	public MaskingDataProcessor(MaskingData maskingData) {
		this.maskingData = maskingData;

	}

So, where does this dependency come from?

We have two classes in this example: SSNDataMasker and BirthdayMasker, and they both implement the same interface.

SSNDataMasker is annotated to be the default and therefore will be chosen by default if available:

@Default
public class SSNMasker implements MaskingData {

	...
}

BirthdayMasker is annotated to be the alternative dependency and therefore will be chosen if SSNDataMasker is unavailable:

@Alternative
public class BirthdayMasker implements MaskingData {

	...
}

2.2 Resource Injection

Resource injection allows us to inject any resource available in the JNDI namespace into any object managed by the container. For instance, we can use resource injection to inject connectors, data sources, or any other resources available in the JNDI namespace.

In the code below, we inject a data source object into a field and this kind of resource injection is appropriately called field-based injection:

public class MyClass {
	 @Resource(name="java:comp/SomeDataSource")
	private DataSource myDataBase;
	...
}


Another way of injecting resources is method-based injection. In method-based injection, the parameter that is passed is injected with the resource:

public class MyClass {
	
	private DataSource myDataBase;
        ...

	@Resource(name="java:comp/SomeDataSource")
	public void setMyDataSource(DataSource dataSource) {
		myDataBase = dataSource;
	}
}


3. What’s the Difference Between EJB and CDI?

As this article on the Oracle website states, the “C” in CDI is the main difference between EJB beans and CDI beans. EJB components might be stateful, but are not inherently contextual. When we reference a stateful component instance, it must be explicitly passed between clients and destroyed by the application.  CDI improves the EJB component model with contextual lifecycle management. However, there are times when we want to use one over another.

3.1 When to Use EJB

There are several useful container services that are available only if we make our CDI bean also an EJB by adding @Stateful, @Stateless, or @Singleton.

Examples include:

  • When we are exposing a JAX-WS @WebService, making it an EJB allows us to not have to list it and map it as a servlet in the xml file. This is available to @Stateless and @Singleton.
  • When we are exposing a JAX-RS resource via @Path. When the RESTful service is an EJB, we get automatic discovery and don’t need to add it to a JAX-RS Application subclass or elsewhere. This is available to @Stateless and @Singleton.
  • When we are working in parallel, the @Asynchronous method invocation is useful. As we know, having too many thread can degrade performance. The @Asynchronous annotation allows us to parallelize things we do using the container’s thread pool. This is available to @Stateful, @Stateless and @Singleton.

3.2 When to Use CDI

Simply put, we should use CDI when we benefit from its use. When we need injection, events, interceptors, decorators, lifecycle tracking, and other features that CDI offers.

4. Conclusion

To quickly test the concepts we reviewed regarding CDI, let’s add Weld to a Maven project:

<dependency>
    <groupId>org.jboss.weld.se</groupId>
    <artifactId>weld-se-core</artifactId>
    <version>2.4.1.Final</version>
</dependency>

Assuming we already have code to test – such as the code previously mentioned in the blog post – we just need to execute Weld, like:

public static void main(String[] args) {
		
    Weld weld = new Weld();
    WeldContainer container = weld.initialize();
    MaskingDataProcessor maskingDataProcessor = container.select(MaskingDataProcessor.class).get();
    container.shutdown();
	}

Authorizing Resources Based On Who Created Them

A colleague of mine pointed me to an interesting question on StackOverflow and suggested it may be a good one for me to answer because of my experience with Spring.

The question was, “How to authorize specific resources based on users who created those in REST, using annotations.”

The gist of it is this:

What I’m trying to do is create an annotation named @Authorize and use it on methods which needs user authorization in order to perform some action( the user is already authenticated at this point). eg. I have an order service with a getOrder() method. I want only the user who created this order to access it.

My Answer on StackOverflow

To implement authorization controls on methods in Java, I highly recommend Spring Security with an eXtensible Access Control Markup Language (XACML) implementation that has a Spring Security API.

Spring Security

Spring Security provides two main means to protect access to methods:

  • Preauthorization: this allows for certain conditions/constraints to be checked before the execution of the method is allowed. Failure to verify these conditions will result in the failure to call the method.
  • Postauthorization: this allows for certain conditions/constraints to be checked after the method returns. This is used less often that preauthorization check, but can be used to provide extra security around complex interconnected business tier methods, especially around constraints related to the object returned by the method.

Say for example, that one of the access control rule is that the user has have the ROLE_ADMIN authority before being able to invoke a method getEvents(). The way to do that within the Spring Security framework would be to use the PreAuthorize annotation as below:

public interface Sample { ... 
@PostAuthorize("hasRole('ROLE_ADMIN')") 
Event getEvent(); } 

In essence Spring Security uses a runtime Aspect Oriented Programming (AOP) pointcut to execute before an advice on the method and throw an o.s.s.access.AccessDeniedException if the security constraints specified are not met.

More can be found about Spring Security’s Method Level Security in section 27.3 of this documentation.

eXtensible Access Control Markup Language (XACML) – a policy language for ABAC

Spring Security does a great job of implementing access control with its expression based access control, but attribute based access control (ABAC) allows more fine grained control of access and is recommended by the National Institute of Standards and Technology.

To address the limitations of Role Based Access Control (RBAC), NIST came up with a new model called ABAC (Attribute Based Access Control). In ABAC, you can now use more metadata / parameters. You can for instance consider:

  • a user’s identity, role, job title, location, department, date of birth…
  • a resource’s type, location, owner, value, department…
  • contextual information e.g. time of day the action the user is attempting on the resource

All these are called attributes. Attributes are the foundation of ABAC, hence the name. You can assemble these attributes into policies. Policies are a bit like the secret sauce of ABAC. Policies can grant and deny access. For instance:

  • An employee can view a record if the employee and the record are in the same region
  • Deny access to reading records between 5pm and 8am.

Policies can be used to express advanced scenarios e.g.

  • segregation of duty
  • time-based constraints (see above)
  • relationship-based access control (see above)
  • delegation rules delegate Bob access to Alice’s document.

There are 2 main syntaxes available to write policies:

ABAC also comes with an architecture to define how the policies will get evaluated and enforced.

ABAC GRAPH

The architecture contains the following components:

  • the Policy Enforcement Point (PEP): this is the component that secures the API / application you want to protect. The PEP intercepts the flow, analyzes it, and send an authorization request to the PDP (see below). It then receives a decision (Permit/Deny) which it enforces.
  • the Policy Decision Point (PDP) receives an authorization request (e.g. can Alice view record #123?) and evaluates it against the set of policies it has been configured with. It eventually reaches a decision which it sends back to the PEP. During the evaluation process, the PDP may need additional metadata e.g. a user’s job title. To that effect, it can turn to policy information points (PIP)
  • the Policy Information Point (PIP) is the interface between the PDP and underlying data sources e.g. an LDAP, a database, a REST service which contain metadata about users, resources, or other. You can use PIPs to retrieve information the PDP may need at runtime e.g. a risk score, a record’s location, or other.

Implementations of XACML

Full disclosure – I am on the XACML Technical Committee and work for Axiomatics, a provider of dynamic authorization that implements XACML.

Axiomatics provides a Spring Security SDK for their Axiomatics Policy Server and it provides four expressions that can be used to query the PDP as a part of protecting a method invocation

  1. xacmlDecisionPreAuthz, called with @PreAuthorize
  2. xacmlDecisionPostAuthz, called with @PostAuthorize
  3. xacmlDecisionPreFilter, called with @PreFilter
  4. xacmlDecisionPostFilter, called with @PostFilter

The exact signatures for these methods are as follows:

  1. xacmlDecisionPreAuthz(Collection<String> attributeCats,
    Collection<String> attributeTypes, Collection<String> attributeIds,
    ArrayList<Object> attributeValues)
  2. xacmlDecisionPostAuthz(Collection<String> attributeCats,
    Collection<String> attributeTypes, Collection<String> attributeIds,
    ArrayList<Object> attributeValues)
  3. xacmlDecisionPreFilter(Collection<String> attributeCats, Collection<String>
    attributeTypes, Collection<String> attributeIds, ArrayList<Object>
    attributeValues)
  4. xacmlDecisionPostFilter (Collection<String>
    attributeCats, Collection<String> attributeTypes, Collection<String>
    attributeIds, ArrayList<Object> attributeValues)

For an entire list of XACML implementations, you can check this list on Wikipedia.

Selenium with Java: Google Search

1. Overview

In this tutorial, we will be exploring the basics of how to use Selenium with Java. We will use Selenium to open Google, search, and click a URL.

The code is available on Github.

2. What is Selenium?

Selenium automates web browsers. That’s really it.

Selenium enables us to emulate user interaction with a web page. There are two Selenium products we can use: Selenium WebDriver and Selenium IDE. We will be using WebDriver.

What is WebDriver? WebDriver is an official W3C Specification, and in essence it is a way of interacting with a web browser. Previously, with Selenium RC, Selenium would operate with the browser by injecting JavaScript to interact with elements. With the adoption of the WebDriver specification, companies like Google, Mozilla, and Microsoft release their browser with the ability to be controlled by a hook, that Selenium can tap into. This hook enables Selenium to interact with the web browser in the same way that humans do.

We will be using Google Chrome and therefore it is required that we download the chromedriver.

After downloading the driver, we need to execute the file.
On Macs, we can simply do this for instance:

./chromedriver

3. pom.xml

I use Spring Tool Suite and created a new Spring Starter project, which wasn’t necessary, but I tend to like Spring. So Selenium is managed by the Spring Boot Starter Parent actually. The version is 2.53.1.

<!-- typical pom beginning-->
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>1.5.10.RELEASE</version>
		<relativePath /> <!-- lookup parent from repository -->
	</parent>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>

		<dependency>
			<groupId>com.h2database</groupId>
			<artifactId>h2</artifactId>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		
		<dependency>
			<groupId>org.seleniumhq.selenium</groupId>
			<artifactId>selenium-java</artifactId>
		</dependency>

	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>
<!-- typical pom ending-->

4. Open Chrome and Search

For this step, we will be establishing the connection to the chromedriver, opening the browser, and searching for “Selenium”.

The port we target for our localhost is 9515 because the chromedriver runs on the local server’s port 9515.

RemoteWebDriver implements WebDriver and WebDriver’s goal is to supply an object-oriented API that provides support for modern advanced web-app testing problems. So, we can tell based on these facts that RemoteWebDriver is the implementation that allows the use of a remote browser. The benefits include separation of where the tests run from where the browser is and the ability to test with browsers not available on the current OS. The cons include the fact that we need an external servlet container to be running and there may be latency if an exception is thrown.

            // create a Chrome Web Driver
            URL local = new URL("http://localhost:9515");
            WebDriver driver = new RemoteWebDriver(local, DesiredCapabilities.chrome());
            // open the browser and go to open google.com
            driver.get("https://www.google.com"); 
            
            driver.findElement(By.id("lst-ib")).sendKeys("Selenium");
            driver.findElement(By.name("btnK")).click();
            driver.manage().window().maximize();

5. Get Pages and Click

WebDriver gives us the methods findElement and findElements method to locate element(s) on a web page. These methods accept a By object as a parameter. By has methods to locate elements within a document with the help of a locator value. Selenium has documented their API well.

Once we understand how Selenium is used to identify elements, it is easy to read any of the the driver.findElements(By…) methods. But we need to know how to write them too. Using a browser, like Chrome, we can right click (or the equivalent) to Inspect an element to get its HTML/CSS information. Also, we can “View Source” to get more complete information as well.

To demonstrate how to scroll on a web page, we perform jse.executeScript(“window.scrollBy(0,250)”, “”).
Like the name suggests, JavaScriptExecutor executes JavaScript. JavaScriptExecutor is an interface provided through Selenium WebDriver. It provides two methods “executescript” & “executeAsyncScript” to run javascript on the selected window or current page.

With the code below, it may be possible to create a more comprehensive bot to search Google and click URLs for several pages.

         // get the number of pages
            int size = driver.findElements(By.cssSelector("[valign='top'] > td")).size();
            for(int j = 1 ; j < size ; j++) {
                if (j > 1) {// we don't need to navigate to the first page
                    driver.findElement(By.cssSelector("[aria-label='Page " + j + "']")).click(); // navigate to page number j
                }

                String pagesearch = driver.getCurrentUrl();

                List<WebElement> findElements = driver.findElements(By.xpath("//*[@id='rso']//h3/a"));
                System.out.println(findElements.size());

                for(int i=0;i<findElements.size();i++){
                    findElements= driver.findElements(By.xpath("//*[@id='rso']//h3/a"));                
                    findElements.get(i).click(); 

                    driver.navigate().to(pagesearch);
                    JavascriptExecutor jse = (JavascriptExecutor) driver;
                    //Scroll vertically downward by 250 pixels
                    jse.executeScript("window.scrollBy(0,250)", "");
                }
            }

6. Conclusion

This was a basic introduction to Selenium with Java. As we discovered, in Selenium Webdriver, locators like XPath, CSS, etc. are used to identify and perform operations on a web page. It is also possible to execute arbitrary JavaScript.
The complete code can be found on Github.

Converting HTML to RichTextString for Apache POI

1. Overview

In this tutorial, we will be building an application that takes HTML as an input and creates a Microsoft Excel Workbook with a RichText representation of the HTML that was provided. To generate the Microsoft Excel Workbook, we will be using Apache POI. To analyze the HTML, we will be using Jericho.

The full source code for this tutorial is available on Github.

2. What is Jericho?

Jericho is a java library that allows analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognized or invalid HTML. It also provides high-level HTML form manipulation functions. It is an open source library released under the following licenses: Eclipse Public License (EPL)GNU Lesser General Public License (LGPL), and Apache License.

I found Jericho to be very easy to use for achieving my goal of converting HTML to RichText.

3. pom.xml

Here are the required dependencies for the application we are building. Please take note that for this application we have to use Java 9. This is because of a java.util.regex appendReplacement method we use that has only been available since Java 9.

<parent>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-parent</artifactId>
	<version>1.5.9.RELEASE</version>
	<relativePath /> <!-- lookup parent from repository -->
</parent>

<properties>
	<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
	<java.version>9</java.version>
</properties>

<dependencies>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-batch</artifactId>
	</dependency>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-thymeleaf</artifactId>
	</dependency>

	<dependency>
		<groupId>com.h2database</groupId>
		<artifactId>h2</artifactId>
		<scope>runtime</scope>
	</dependency>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-test</artifactId>
		<scope>test</scope>
	</dependency>
	<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 -->
	<dependency>
		<groupId>org.apache.commons</groupId>
		<artifactId>commons-lang3</artifactId>
		<version>3.7</version>
	</dependency>
	<dependency>
		<groupId>org.springframework.batch</groupId>
		<artifactId>spring-batch-test</artifactId>
		<scope>test</scope>
	</dependency>
	<dependency>
		<groupId>org.apache.poi</groupId>
		<artifactId>poi</artifactId>
		<version>3.15</version>
	</dependency>

	<dependency>
		<groupId>org.apache.poi</groupId>
		<artifactId>poi-ooxml</artifactId>
		<version>3.15</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/net.htmlparser.jericho/jericho-html -->
	<dependency>
		<groupId>net.htmlparser.jericho</groupId>
		<artifactId>jericho-html</artifactId>
		<version>3.4</version>
	</dependency>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-configuration-processor</artifactId>
		<optional>true</optional>
	</dependency>
	<!-- legacy html allow -->
	<dependency>
		<groupId>net.sourceforge.nekohtml</groupId>
		<artifactId>nekohtml</artifactId>
	</dependency>
</dependencies>

4. Web page – Thymeleaf

We use Thymeleaf to create a basic webpage that has a form with a textarea. The source code for Thymeleaf page is available here on Github. This textarea could be replaced with a RichText Editor if we like, such as CKEditor. We just must be mindful to make the data for AJAX correct, using an appropriate setData method. There is a previous tutorial about CKeditor titled AJAX with CKEditor in Spring Boot.

5. Controller

In our controller, we Autowire JobLauncher and a Spring Batch job we are going to create called GenerateExcel. Autowiring these two classes allow us to run the Spring Batch Job GenerateExcel on demand when a POST request is sent to “/export”.

Another thing to note is that to ensure that the Spring Batch job will run more than once we include unique parameters with this code: addLong(“uniqueness”, System.nanoTime()).toJobParameters(). An error may occur if we do not include unique parameters because only unique JobInstances may be created and executed, and Spring Batch has no way of distinguishing between the first and second JobInstance otherwise.

@Controller
public class WebController {

    private String currentContent;

    @Autowired
    JobLauncher jobLauncher;
    
    @Autowired
    GenerateExcel exceljob; 

    @GetMapping("/")
    public ModelAndView getHome() {
        ModelAndView modelAndView = new ModelAndView("index");
        return modelAndView;

    }
    

    @PostMapping("/export")
    public String postTheFile(@RequestBody String body, RedirectAttributes redirectAttributes, Model model)
        throws IOException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException {


        setCurrentContent(body);

        Job job = exceljob.ExcelGenerator();
        jobLauncher.run(job, new JobParametersBuilder().addLong("uniqueness", System.nanoTime()).toJobParameters()
            );

        return "redirect:/";
    }

    //standard getters and setters

}



6. Batch Job

In Step1 of our Batch job, we call the getCurrentContent() method to get the content that was passed into the Thymeleaf form, create a new XSSFWorkbook, specify an arbitrary Microsoft Excel Sheet tab name, and then pass all three variables into the createWorksheet method that we will be making in the next step of our tutorial :

@Configuration
@EnableBatchProcessing
@Lazy
public class GenerateExcel {
    
    List<String> docIds = new ArrayList<String>();

    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Autowired
    WebController webcontroller;
    
    @Autowired
    CreateWorksheet createexcel;

    @Bean
    public Step step1() {
        return stepBuilderFactory.get("step1")
            .tasklet(new Tasklet() {
                @Override
                public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception, JSONException {

                    String content = webcontroller.getCurrentContent();
                    
                    System.out.println("content is ::" + content);
                    Workbook wb = new XSSFWorkbook();
                    String tabName = "some";
                    createexcel.createWorkSheet(wb, content, tabName);

                    return RepeatStatus.FINISHED;
                }
            })
            .build();
    }

    @Bean
    public Job ExcelGenerator() {
        return jobBuilderFactory.get("ExcelGenerator")
            .start(step1())
            .build();

    }

}

We have covered Spring Batch in other tutorials such as Converting XML to JSON + Spring Batch and Spring Batch CSV Processing.

7. Excel Creation Service

We use a variety of classes to create our Microsoft Excel file. Order matters when dealing with converting HTML to RichText, so this will be a focus.

7.1 RichTextDetails

A class with two parameters: a String that will have our contents that will become RichText and a font map.

public class RichTextDetails {
    private String richText;
    private Map<Integer, Font> fontMap;
    //standard getters and setters
    @Override
    public int hashCode() {
     
        // The goal is to have a more efficient hashcode than standard one.
        return richText.hashCode();
    }

7.2 RichTextInfo

A POJO that will keep track of the location of the RichText and what not:

public class RichTextInfo {
    private int startIndex;
    private int endIndex;
    private STYLES fontStyle;
    private String fontValue;
    // standard getters and setters, and the like

7.3 Styles

A enum to contains HTML tags that we want to process. We can add to this as necessary:

public enum STYLES {
    BOLD("b"), 
    EM("em"), 
    STRONG("strong"), 
    COLOR("color"), 
    UNDERLINE("u"), 
    SPAN("span"), 
    ITALLICS("i"), 
    UNKNOWN("unknown"),
    PRE("pre");
    // standard getters and setters

7.4 TagInfo

A POJO to keep track of tag info:

public class TagInfo {
    private String tagName;
    private String style;
    private int tagType;
    // standard getters and setters

7.5 HTML to RichText

This is not a small class, so let’s break it down by method.

Essentially, we are surrounding any arbitrary HTML with a div tag, so we know what we are looking for. Then we look for all elements within the div tag, add each to an ArrayList of RichTextDetails , and then pass the whole ArrayList to the mergeTextDetails method. mergeTextDetails returns RichtextString, which is what we need to set a cell value:

   public RichTextString fromHtmlToCellValue(String html, Workbook workBook){
       Config.IsHTMLEmptyElementTagRecognised = true;
       
       Matcher m = HEAVY_REGEX.matcher(html);
       String replacedhtml =  m.replaceAll("");
       StringBuilder sb = new StringBuilder();
       sb.insert(0, "<div>");
       sb.append(replacedhtml);
       sb.append("</div>");
       String newhtml = sb.toString();
       Source source = new Source(newhtml);
       List<RichTextDetails> cellValues = new ArrayList<RichTextDetails>();
       for(Element el : source.getAllElements("div")){
           cellValues.add(createCellValue(el.toString(), workBook));
       }
       RichTextString cellValue = mergeTextDetails(cellValues);

       
       return cellValue;
   }

As we saw above, we pass an ArrayList of RichTextDetails in this method. Jericho has a setting that takes boolean value to recognize empty tag elements such as
: Config.IsHTMLEmptyElementTagRecognised. This can be important when dealing with online rich text editors, so we set this to true. Because we need to keep track of the order of the elements, we use a LinkedHashMap instead of a HashMap.

    private static RichTextString mergeTextDetails(List<RichTextDetails> cellValues) {
        Config.IsHTMLEmptyElementTagRecognised = true;
        StringBuilder textBuffer = new StringBuilder();
        Map<Integer, Font> mergedMap = new LinkedHashMap<Integer, Font>(550, .95f);
        int currentIndex = 0;
        for (RichTextDetails richTextDetail : cellValues) {
            //textBuffer.append(BULLET_CHARACTER + " ");
            currentIndex = textBuffer.length();
            for (Entry<Integer, Font> entry : richTextDetail.getFontMap()
                .entrySet()) {
                mergedMap.put(entry.getKey() + currentIndex, entry.getValue());
            }
            textBuffer.append(richTextDetail.getRichText())
                .append(NEW_LINE);
        }

        RichTextString richText = new XSSFRichTextString(textBuffer.toString());
        for (int i = 0; i < textBuffer.length(); i++) {
            Font currentFont = mergedMap.get(i);
            if (currentFont != null) {
                richText.applyFont(i, i + 1, currentFont);
            }
        }
        return richText;
    }

As mentioned above, we are using Java 9 in order to use StringBuilder with the java.util.regex.Matcher.appendReplacement. Why? Well that’s because StringBuffer slower than StringBuilder for operations. StringBuffer functions are synchronized for thread safety and thus slower.

We are using Deque instead of Stack because a more complete and consistent set of LIFO stack operations is provided by the Deque interface:

    static RichTextDetails createCellValue(String html, Workbook workBook) {
        Config.IsHTMLEmptyElementTagRecognised  = true;
        Source source = new Source(html);
        Map<String, TagInfo> tagMap = new LinkedHashMap<String, TagInfo>(550, .95f);
        for (Element e : source.getChildElements()) {
            getInfo(e, tagMap);
        }

        StringBuilder sbPatt = new StringBuilder();
        sbPatt.append("(").append(StringUtils.join(tagMap.keySet(), "|")).append(")");
        String patternString = sbPatt.toString();
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(html);

        StringBuilder textBuffer = new StringBuilder();
        List<RichTextInfo> textInfos = new ArrayList<RichTextInfo>();
        ArrayDeque<RichTextInfo> richTextBuffer = new ArrayDeque<RichTextInfo>();
        while (matcher.find()) {
            matcher.appendReplacement(textBuffer, "");
            TagInfo currentTag = tagMap.get(matcher.group(1));
            if (START_TAG == currentTag.getTagType()) {
                richTextBuffer.push(getRichTextInfo(currentTag, textBuffer.length(), workBook));
            } else {
                if (!richTextBuffer.isEmpty()) {
                    RichTextInfo info = richTextBuffer.pop();
                    if (info != null) {
                        info.setEndIndex(textBuffer.length());
                        textInfos.add(info);
                    }
                }
            }
        }
        matcher.appendTail(textBuffer);
        Map<Integer, Font> fontMap = buildFontMap(textInfos, workBook);

        return new RichTextDetails(textBuffer.toString(), fontMap);
    }

We can see where RichTextInfo comes in to use here:

    private static Map<Integer, Font> buildFontMap(List<RichTextInfo> textInfos, Workbook workBook) {
        Map<Integer, Font> fontMap = new LinkedHashMap<Integer, Font>(550, .95f);

        for (RichTextInfo richTextInfo : textInfos) {
            if (richTextInfo.isValid()) {
                for (int i = richTextInfo.getStartIndex(); i < richTextInfo.getEndIndex(); i++) {
                    fontMap.put(i, mergeFont(fontMap.get(i), richTextInfo.getFontStyle(), richTextInfo.getFontValue(), workBook));
                }
            }
        }

        return fontMap;
    }

Where we use STYLES enum:

    private static Font mergeFont(Font font, STYLES fontStyle, String fontValue, Workbook workBook) {
        if (font == null) {
            font = workBook.createFont();
        }

        switch (fontStyle) {
        case BOLD:
        case EM:
        case STRONG:
            font.setBoldweight(Font.BOLDWEIGHT_BOLD);
            break;
        case UNDERLINE:
            font.setUnderline(Font.U_SINGLE);
            break;
        case ITALLICS:
            font.setItalic(true);
            break;
        case PRE:
            font.setFontName("Courier New");
        case COLOR:
            if (!isEmpty(fontValue)) {

                font.setColor(IndexedColors.BLACK.getIndex());
            }
            break;
        default:
            break;
        }

        return font;
    }

We are making use of the TagInfo class to track the current tag:

    private static RichTextInfo getRichTextInfo(TagInfo currentTag, int startIndex, Workbook workBook) {
        RichTextInfo info = null;
        switch (STYLES.fromValue(currentTag.getTagName())) {
        case SPAN:
            if (!isEmpty(currentTag.getStyle())) {
                for (String style : currentTag.getStyle()
                    .split(";")) {
                    String[] styleDetails = style.split(":");
                    if (styleDetails != null && styleDetails.length > 1) {
                        if ("COLOR".equalsIgnoreCase(styleDetails[0].trim())) {
                            info = new RichTextInfo(startIndex, -1, STYLES.COLOR, styleDetails[1]);
                        }
                    }
                }
            }
            break;
        default:
            info = new RichTextInfo(startIndex, -1, STYLES.fromValue(currentTag.getTagName()));
            break;
        }
        return info;
    }

We process the HTML tags:

    private static void getInfo(Element e, Map<String, TagInfo> tagMap) {
        tagMap.put(e.getStartTag()
            .toString(),
            new TagInfo(e.getStartTag()
                .getName(), e.getAttributeValue("style"), START_TAG));
        if (e.getChildElements()
            .size() > 0) {
            List<Element> children = e.getChildElements();
            for (Element child : children) {
                getInfo(child, tagMap);
            }
        }
        if (e.getEndTag() != null) {
            tagMap.put(e.getEndTag()
                .toString(),
                new TagInfo(e.getEndTag()
                    .getName(), END_TAG));
        } else {
            // Handling self closing tags
            tagMap.put(e.getStartTag()
                .toString(),
                new TagInfo(e.getStartTag()
                    .getName(), END_TAG));
        }
    }

7.6 Create Worksheet

Using StringBuilder, I create a String that is going to written to FileOutPutStream. In a real application this should be user defined. I appended my folder path and filename on two different lines. Please change the file path to your own.

sheet.createRow(0) creates a row on the very first line and dataRow.createCell(0) creates a cell in column A of the row.

public void createWorkSheet(Workbook wb, String content, String tabName) {
        StringBuilder sbFileName = new StringBuilder();
        sbFileName.append("/Users/mike/javaSTS/michaelcgood-apache-poi-richtext/");
        sbFileName.append("myfile.xlsx");
        String fileMacTest = sbFileName.toString();
        try {
            this.fileOut = new FileOutputStream(fileMacTest);
        } catch (FileNotFoundException ex) {
            Logger.getLogger(CreateWorksheet.class.getName())
                .log(Level.SEVERE, null, ex);
        }

        Sheet sheet = wb.createSheet(tabName); // Create new sheet w/ Tab name

        sheet.setZoom(85); // Set sheet zoom: 85%
        

        // content rich text
        RichTextString contentRich = null;
        if (content != null) {
            contentRich = htmlToExcel.fromHtmlToCellValue(content, wb);
        }


        // begin insertion of values into cells
        Row dataRow = sheet.createRow(0);
        Cell A = dataRow.createCell(0); // Row Number
        A.setCellValue(contentRich);
        sheet.autoSizeColumn(0);
        
        
        try {
            /////////////////////////////////
            // Write the output to a file
            wb.write(fileOut);
            fileOut.close();
        } catch (IOException ex) {
            Logger.getLogger(CreateWorksheet.class.getName())
                .log(Level.SEVERE, null, ex);
        }


    }

8. Demo

We visit localhost:8080.

We input some text with some HTML:
example of input for apache poi richtext app

We open up our excel file and see the RichText we created:
showing richtext in excel from apache poi app

9. Conclusion

We can see it is not trivial to convert HTML to Apache POI’s RichTextString class; however, for business applications converting HTML to RichTextString can be essential because readability is important in Microsoft Excel files. There’s likely room to improve upon the performance of the application we build, but we covered the foundation of building such an application .

The full source code is available on Github.