The Art of Being Expressive: Compiler Assumptions

Despite it's relatively ease of use, MVEL has a fairly sophisticated optimizing compiler that I'm sure most people never take full advantage of.

From what I see, most people throw some variables into a Map and use MVEL's convenience methods to evaluate and/or compile their expression.

The truth is, you can double, and in some cases triple, the performance of MVEL evaluations in a production environment if you can provide MVEL's compiler with more information up front. (It turns out that compilers like safe assumptions.)

Since MVEL supports dynamic typing (unless you've turned strong typing on), the compiler has to compile expressions defensively. That means that when it comes to assuming what the value of x or y is at runtime, MVEL needs to hedge it's bets -- maybe they'll be strings, integers or boolean.

In order to cope with this indeterminism, MVEL employs a comprehensive type coercion scheme. Even if x ends up being a boolean and y ends up being an int, the runtime result of x + y will not be a glorious stacktrace. Instead you might get a resulting value like: "true138" . As in, "when all else fails, just treat everything as a string."

But as I alluded to already, such indeterminism comes at a cost; MVEL must check these types actively at runtime, defer optimizations it could otherwise perform, if not disqualify them all together.

It turns out, that MVEL can often get dynamically typed expressions up to the same speed as strongly typed expressions by doing a little runtime analysis. Or, at least, much closer to the same speed.

One of the simplest ways it does this is through simple brute force, coupled with a dynamic deoptimization fallback. For instance, the optimizer might just optimize "x + y" based on the runtime characteristics, with a deoptimization hook inserted, for the case that x suddenly turns into something other than a boolean. If this violation of the optimization occurs, MVEL will deoptimize and fallback to safe reflection-based execution and fully-hedged coercion checks.

But sometimes safe assumptions are just not possible at runtime for reasons I won't get into here. Here I'm going to impress upon you the importance of telling MVEL (if possible) what the input types of all the variables you're about to inject are.

Consider the following example.


     Map vars = new HashMap();
     vars.put("x", 5);
     vars.put("y", 10);
     vars.put("z", 20);
    
     Serializable s = MVEL.compileExpression("x + y * z"); 
     
     long time = System.currentTimeMillis();
     for (int i = 0; i < 2000000; i++) {
           MVEL.executeExpression(s, vars);
     }

     System.out.println("time: " + time);

Let's run this. On my laptop, I get the result "time: 347".

So it took about 1/3 of a second to run that expression 2 million times. Not bad. But let's give the compiler some working assumptions up front. Let's tell the compiler what the types of all the variables we're going to inject at compile-time.


     Map vars = new HashMap();
     vars.put("x", 5);
     vars.put("y", 10);
     vars.put("z", 20);

     // create a parser context to hold the compiler configuration.
     ParserContext context = ParserContext.create();

     // simply iterate over or 'vars' map and inject the var name and it's type using context.addInput(); 
     for (Map.Entry entry : variables.entrySet()) {
          context.addInput(entry.getKey(), entry.getValue().getClass());
     }
    
     Serializable s = MVEL.compileExpression("x + y * z", context); 
     
     long time = System.currentTimeMillis();
     for (int i = 0; i < 2000000; i++) {
           MVEL.executeExpression(s, vars);
     }

     System.out.println("time: " + time);

Now let us run that again. This time I get the result on my little MacBook of "time: 97".

Wow, that's a pretty big difference. We are still in dynamic mode here, but we told the compiler up front what the types we'd be injecting are, and we managed more than a three-fold improvement in runtime performance.

What magic is at work here? Well, since the compiler now knows that x, y, and z are all going to be integers, the compiler will detect the opportunity to bypass all coercion, and produce a far more efficient execution tree. In fact, MVEL has special optimizations just for integers -- being the most common primitive -- that makes for some very snappy execution in this case.

As a general rule, you should always strive to tell the MVEL compiler as much about what you're throwing at it as possible to get the best performance. Your applications will thank you.

The Art of Being Expressive

Sunday, February 6, 2011

Compiler Assumptions

1 comment:

MVEL Project

Blog Archive