Recently, I have noticed that companies are asking about SPECjAppServer2004 results for our Enterprise Application Platform. While this isn't entirely new, it seems like it is happening more. In most cases prospective customers don't actually understand that what they are asking for isn't necessarily relevant to their environment. So, exactly what's wrong with this benchmark?
Well, before answering that question directly, let's first talk about benchmarks in general. Industry standard benchmarks are created through a consensus process within the respective organization. They involve the vendors that all have a stake in the outcome, so they all bring their agenda that includes their products strengths (they also try to avoid their products weaknesses). Of course, no one vendor gets everything they want, but they are not independently created without regard for any individual vendors products. The vendors have influence, and that skews the benchmark from the beginning. Besides the vendor influences in the creation of benchmarks, the actual implementation of the benchmarks is not very realistic to a business application. At least not any business application that I have ever written or seen.
I was in IT, developing custom business applications for over 21 years, before joining JBoss (now the middle-ware business unit of Red Hat), and I can tell you from experience, that benchmarks do not reflect real world business applications. In fact, they rarely have anything but trivial business logic in them. They also don't reflect the technology that gets used by customers. They reflect the technology that is either in a specification, or the technology that particular vendors would like to push. This is especially true, when the vendors don't have other alternatives of their own.
The other problem with benchmarks, in general, is that the numbers that any vendor creates with them will not translate directly to anything meaningful in your business. Is something that holds no direct meaning to your business a criteria that you should use in making a decision?
After contemplating my last, albeit rhetorical question, let's get back to the initial question. So, what is wrong with the SPECjAppServer2004 benchmark?
The SPECjAppServer2004 benchmark, suffers from all the ills that almost all benchmarks suffer from. First, it looks a lot like TPC-C, as it appears to be modeled after it, but written in Java, using J2EE 1.4 technologies. So, its not a realistic representation of a business application. Second, its business logic is trivial, and in no way compares to the typical complexities of business logic in real world applications.
All the real-world applications that I have worked on had millions of lines of business logic, along with millions of lines of code that were more technical in nature, for interfaces to other systems, persistence, transaction processing, etc. Third, it is clear that the author, or authors, of this benchmark have never written a business application in their lives. Or if they had, it would have been a very poor one indeed. The code uses floating point numbers to represent dollars and cents! Ouch! Real world business applications written in Java would use the BigDecimal class to represent dollars and cents. Using native types certainly will make the benchmark faster, than using BigDecimal, which uses arbitrary precision arithmetic, but it will create results that aren't correct. The more complex the real world calculations are in your business application, the larger the calculations errors will be. For example, I developed an application where there was complex discounting schemes with discount percentages carried out to four decimal places. The calculation had to be applied back to the original charges (not just a total), and in doing that you had to go back over every detail charge (could be millions and millions per account). In order to do that you had to truncate values, and roll the remainder down, and only round at the end, so that it came out correctly, and no matter what kind of slicing and dicing you did, reporting wise, the totals would always foot. You simply cannot do those kind of calculations with floating point numbers with any accuracy at all! Finally, the benchmark is heavily dependent on EJB 2.x Container Managed Persistence (CMP).
This is the perfect example of a benchmark utilizing technology that is not relevant to customers. With the extreme limitations of EJB QL for EJB 2.x CMP, there aren't many real world applications that can use CMP. In our own customer based the vast majority have turned to alternative ORM technology like Hibernate. We also know that this is true in WebSphere and Weblogic shops as well. I think its also illustrated by the fact that BEA (years before the acquisition by Oracle), announced support of Hibernate with Weblogic! Do you think they would do that, if their customers were using CMP? Just like us, their customer base turned their back on EJB 2.x entity beans with CMP a long time ago.
When you wrap all these things up, what value does a SPECjAppServer2004 result actually provide? Well, I think we can safely say that it doesn't provide any value. In my experience, the best thing that any customer can do is run their own application against the various middle-ware platforms, and compare those results. It is what I always did, when I was in IT. Industry standard benchmarks might be a tempting short-cut, but in this case, it really isn't going to tell you anything meaningful.
There simply is no substitute from seeing your own workload running on the potential solutions!

10 comments:
I think you comments on the benchmark would have some meaning if you at least published one single result that was comparable to other solutions.
I think you miss the point with the benchmark. It is designed to test YOUR platform and how it efficiently manages resources and not the backends that you mention in relation to integration. Yes a lot of code in production today is complex (and not for reasons one might think) and yes this code could drag the system down and make the platform costs look small but this benchmark is about testing YOUR platform.
I am sure if you had figures better than the competition you would have published a long time ago.
William
Your first point, is one of the problems. None of the published numbers are actually comparable to each other (at least none that I have seen to date). In order for them to be comparable, they would have to run on the exact same hardware, the exact same database, OS, storage, etc. No vendor (at least that I have seen), has ever publish a truly comparable benchmark.
In terms of missing the point. We test OUR platform for performance and scalability with each release. We just do it with our own internal code, that is more realistic to any benchmark. After all, its about the customer, and what's relevant to them, not about some silly vendor promotion saying I can top your number on something that is not relevant to a true customer workload.
In terms of the code complexity, business logic is complex, regardless of the integration, that was the main point, and benchmarks have trivial business logic. That's not what customers have in their environments. They have complex business logic, and again, its about the customers needs, not about vendors spouting numbers against each other with a benchmark that doesn't represent a true work load that customers would actually run, using technologies the customer would actually use.
If you focus on what the customer actually needs, you won't be spending much time on benchmarks that aren't relevant to customer true work loads.
Let me state this again. Yes customers have complex code but that would more than likely be the same on each platform especially if using the standard correctly. A benchmark such as SpecjAppServer is trying to determine the costs for various operations the container would perform during typical service execution (security, resource management, persistence, session management,....).
The benchmark is not perfect but it is a good goal to aim. You do have your own benchmarks? With the introduction of the Specj benchmark there was many performance gains made in all participating containers that resulted in true savings in customer applications. Customers benefits from the race irrespective of the winner. But if you are not in then ......
Look JBoss was never the fastest container and I do not think you ever claimed it to be or intended it to be. But please stop with the knocking of a benchmark without publishing an alternative benchmark and figures across the jboss releases.
The benchmark doesn't accurately reflect costs of the container, since it is dependent on EJB 2.x CMP for the persistence, which most customers abandoned long ago. All benchmarks are flawed, and this one is so flawed, in relation to what customers actually use, that it cannot be reliably counted on as a comparison of performance or cost.
It's really quite simple. I'm not sure what vested interest you have in defending the benchmark, and certainly not publishing results does hurt JBoss in the market, but spending engineering resources to tune to something that customers don't use is just a waste of time.
Yes the benchmark is based on older Java EE technology but the benchmark was published in 2001, and republished (changed) in 2002 and 2004.
JBoss was meant to have published figures then. Not 4 years later.
Yes, its old, but even then people were quickly learning that the technology was not good (EJB QL and EJB CMP to be specific). It's really not a question of age, but of capability. The limitations of EJB 2.x EJB QL are so severe that very few real world applications can use it. That also plays a large role in why the benchmark is not a good representation of any real world work load.
You are really in self denial.
I know many applications in production that have succeeded in making EJB 2.1 work within their application. The success factors included:
* The skill and experience within the team to think in terms of components when designing and implementing the application.
* The selected CMP engine implementation.
* The selected Application server platform.
* The selected database.
JBoss never had a good persistence story until JPA/Hibernate.
The specification was not great but the same could be same for many of the implementations. This was also the case with EJB 1.x were Borlands AppServer CMP outperformed JBoss and all other vendors by significant margin.
Can you stop avoiding the issue and publish some figures on the base JBoss operations across the various releases of JBoss. You did say that you did do some form of performance testing and benchmarking? Otherwise....well I think you can guess this.
If you are not interested in having the lowest footprint and runtime overhead then that is fine but stop knocking others looking to deliver this level of quality for those customers who appreciate it and possibly require it.
I'm not in self denial at all. In fact, I have direct experience with a successful CMP implementation, but it was the only one that we could ever do. The narrowness of the applicability of the technology is so severe that it just cannot be used in most circumstances. That's just a pure and simply fact.
Within most enterprises of any size, you don't have the freedom to redesign everything from scratch, and have to work with many already existing database schema's. EJB 2.x CMP just cannot cut it in that environment. Again, this is just a pure and simple fact. The fact that you haven't actually worked in an IT environment (outside of consulting), probably means you just haven't had to live through these realities. That probably explains why you think the opposite of me. Having said that, when looking at the market, the market clearly turned away from this technology, for all the reasons I have been pointing out.
I did an analysis of our customer base, within the first six months of being a part of JBoss, to see if we should invest in our EJB 2.x CMP engine. The conclusion of that analysis was that less than 2% of our customers were using our CMP engine. That number has fallen substantially as we have grown. When you also take a look at our Hibernate subscribers, many of them are Weblogic and WebSphere shops and not JBoss shops, you see just the opposite. Wide adoption across the entire customer base, and wide adoption within competing containers. Any benchmark, that doesn't reflect the realities of the market, simply does not serve the interest of our customers. That's been the main point all along in this blog.
Now, you seem to continue to want to jump in and say that I'm only knocking others, but I haven't done that at all. All I have done is describe the problems with a specific benchmark. Maybe you had a hand in writing it, I don't know, but you shouldn't take it personally if you did.
If you think the SPECjAppServer2004 benchmark is a good tool for tuning a platform runtime for optimal performance for customers, then great. We can agree to disagree. It simply doesn't match what our customers use, and what we see being used even within our competitors containers. That's simply not useful to JBoss, and in my opinion, not useful to customers or prospective customers.
Now, in terms of publishing our internal benchmarks, I simply can't do that in this blog. We do show those figures to customers, and prospects in pre-sales, but its based on our internal code that I cannot release to the public. In terms of publishing results on SPECjAppServer2004, we haven't spent the time to optimize this benchmark, because of the very issues with the benchmark that I have outlined. It just doesn't make sense to do so.
Now, you can take that anyway you want, I cannot control what you think, but I am very confident in JBoss ability to process real world workloads, as I was a long-time JBoss customer prior to joining the company, and have done many load tests with JBoss, and have also seen many of our customers deployments.
Maybe, the next version of this benchmark will be more useful for us. In fact, I am trying to get engaged in that process right now. I have recently been made the SPEC contact for JBoss.
"The fact that you haven't actually worked in an IT environment (outside of consulting), probably means you just haven't had to live through these realities."
Andrig please do not even begin to think you know what I know and have done it only makes look even more foolish than you have been up to this point.
This comment wasn't meant as a insult. Just pointing out that our personal experiences are different. Maybe I was mistaken, but from what I could tell, you haven't worked in an IT environment, building business applications.
In any case, my apologies.
So exactly why is it that you feel this benchmark is useful?
Post a Comment