How we became the 2nd fastest online retail shop in Germany

Dynatrace Performance Benchmark about website performance

Source: Dynatrace Performance Benchmark

I’ve been working for my client HSE24 for quite some time now on optimizing their website’s load performance with their team. In this article I’d like to share how and what we optimized to become the 2nd fastest retail online shop in Germany according to Dynatrace Performance Benchmark.

Actions to improve website performance

Disclaimer: This isn’t a comprehensive list. It’s just what I could remember we’ve been working on for the past 2,5 years.

Remove unnecessary event listeners

We removed all bindings to the document for scroll, touchmove, touchstart, touchend and tap as these are fired constantly while using the site.

Remove unused and unnecessary JavaScript libraries

Since Webkit supports fast clicks on mobile now, we were able to remove libs like Fastclick.js. Also, browsers nowadays have support for many advanced features like e.g. sticky positioning which makes libs like Sticky.js redundant.

Use lazy loading

We use lazyloadxt.js to load images only when they become visible to the viewport.

Hardware-accelerate animations via GPU

jQuery’s $.animate() method is convenient but very slow. We replaced as many animations with CSS3 translate3D(x,y,z) as possible. This will force the device to render via GPU and will thus be much smoother than with software rendering.

Reduce CSS effects like gradients and transparency

Both have a huge impact on performance. Use with care and remove as much as possible. Talk to your designer. Explain this in a friendly way and he or she will understand.

Reduce the amount of HTTP requests

Nowadays, bandwidth on mobile devices is no problem due to HSDPA+ and LTE. The bottleneck is latency on mobile networks. Thus, you have to reduce the number of HTTP requests to a minimum. Put all your JS code into one file, same for CSS (unless you’re on HTTP/2).

Reduce the amount of setInterval() and setTimeout()

Just don’t use these at all if possible.

Init carousels on demand

We have a lot of carousels to show our products. Some are hidden behind tab components and are thus not visible unless you click on one of them. Hence, we only initialize those which can be seen in the first place. When the user clicks a tab, we init carousels on demand.

Use frameworks/libs which care about performance

We replaced OwlCarousel with Swiper.js. What a huge difference!

Drop support for legacy browsers

Optimizing for older browsers has a huge impact on the possibilities to improve performance. Thus, we dropped support for Internet Explorer versions older than 11. Also, on mobile we only support Chrome and Safari. Buggy browsers like Android Stock Browser or Samsung’s built-in desaster is not supported.

Reduce the amount of reflows

It’s a little old but the legendary Paul Irish did a great video about reflows. Drop everything you’re doing and watch it NOW!

Load as few external scripts as possible

Scripts from external sources are a nightmare. They cause additional DNS resolution calls and you make your site dependent to external servers which can lead to slowdown and sometimes even malfunction. Thus, we try to host as much on our own as possible. We do not use CDN’s even if they have some advantages but we like to keep our destiny in our own hands.

Reference Scripts at the end of the document

Never reference your scripts in the head section of your html document. Put them at the end of <body> if possible. Also, try to use defer or async. Use with caution though!

Remove dead JavaScript code

JavaScript is a nightmare to maintain. You never know which functions are still called and which are not. Thus, we created a logger which writes all function calls during runtime to a database via Kibana. With this list we’re able to determine if a function is still called or not and remove it.

Develop for the weakest devices

Go to eBay and buy an iPhone5 as well as a Samsung Galaxy S5 (or S4 if you wanna go hardcore). When you’re done developing a feature, test it with these two devices. If everything runs smoothly, you’re good because faster devices will run it at least as good as your minimum testing devices.

Reduce images size

Our CMS is able to scale down uploaded images. So we defined templates for screen sizes of width 320px, 480px, 768px, 1024px, 1280px and 1920px. Anything beyond this will be scaled down. Also, JPEG should be compressed with setting “85” (equals 15% compression) which is a good trade-off between image size and quality.

Switch from Apache to NGINX

NGINX is great at delivering many small static files in a short period of time. You should use it, if you can.

Use a static subdomain

We deliver all our static media (scripts, stylesheets, fonts, images…) via an own subdomain static.hse24-dach.net because we do not set any cookies for it (static media does not require cookie information at all). Hence, the cookie overhead for each request should be 0.

Improve cache configuration

We set max-age to 30 days in the HTTP headers for all static media to optimize browser caching. When a file changes, we also change the reference URL to notify the browser that it should get a fresh copy of the ressource. For example, if foo.jpg has changed, we automatically also increment our cid parameter: <img src="foo.jpg?cid=2">

Pre-generate images

We used to have a web application which generates scaled variants of images on demand. This is to slow because the application has to be called for every request and server caching is quite tricky in this case. Thus, we pre-generate all scaled versions of the images we have and deploy them statically on our NGINX servers which is blazing fast. This requires multiple terabytes of data but remember that storage these days is cheap compared to CPU power.

Improve the build process

We cleaned up our Grunt build file and optimized it to get the fastest performance possible. Also, we replaced our SASS compiler which was based on Ruby with a much faster C++ version. This doesn’t have direct impact on website load performance, but the faster devs can work, the faster and better you get the job done. Also, it’s more fun :-)

Use server side includes for static HTML

We use SSI to include static HTML (like e.g. header, footer, navigation..) into our templates in order to be able to cache these HTML snippets. We only render the part of the HTML which is dynamically created by our web application server like e.g. personalized content. This is way faster than composing the whole page for each request.

Simplify content

We used to have a lot of content on our homepage. It’s still not as lightweight as I think it should be but it got much better. Check all components on your page and ask yourself: Do I really need this? Does it benefit either me or the user? If not, trash it.

Care about performance

Our technical lead used to be a developer, so he cares about performance which is a good thing. If you’re looking for a new job, I can highly recommend to check the background of your next supervisor. It’s much easier to work with a technical person than with a manager type guy.

Conclusion

You have to keep pushing forward. Don’t rest. When you stop optimizing your site will end in a big slow mess in a few years since requirements for new features continue to grow which usually eats more CPU power as well as network bandwidth. There are still many more things we’re working on to push load times even faster, like e.g. removing dead CSS code. Unfortunately this is not an easy task and there are no best practices that I’m aware of. If you have any suggestions, let me know :-)

A scientific performance comparison: Flex/Flash vs. JavaFX vs. Silverlight vs. JavaScript

I finally finished my diploma thesis I mentioned before about performance comparisons between Flex/Flash, JavaFX, Silverlight and various JavaScript engines.

Why this analysis?

During an internship at IBM Germany back in 2009, I had to develop a Visualizer based on Flex that heavily relied on its charting library API. Even on strong machines, it was not possible to create more than 20 charts on one screen at the same time. If tried, the application terminated with a timeout exception after 60 seconds because it simply took the rendering engine to long to draw all the charts at once. These experiences lead to thoughts about questions why the Flash Player sometimes performs so slowly and if other technologies like JavaFX or Silverlight could do any better. While looking for answers, I encountered two benchmarks. One is Alexey Gavrilov’s Bubblemark test which moves around bitmaps on the screen capturing the current fps. The other one is Sean Christmann’s GUIMark, which simulates a common website layout and lets it scale up and down. While Gavrilov’s attempt is rather simple, Christmann’s benchmark is a bit more complex including aspects like transparency and overlapping layers. Both tests include technologies like Flash/Flex, JavaFX, Silverlight and Javascript. All these attempts have one thing in common though: They represent only one big benchmark instead of cutting down the issue into multiple aspects. This leads to the problem that one cannot clearly see what the reason is why solution A is faster or slower than B.

For example: Moving around bitmaps, as shown in Gavrilov’s Bubblemark benchmark, may sound simple but heavily relies on multiple aspects of a RIA runtime: First, to display images, a graphic-buffer needs to be filled with the bitmap data. Then it needs to be drawn to a canvas-like component and finally shown on the screen. To move around the images, mathematical calculations are required to let the balls bounce from the walls. Furthermore, some kind of data structure like (dynamic) lists or arrays must be used in order store each ball-object in. While running the test, one never knows what was the cause for performance decreases. Was it the »physics engine«, the image processing calls, the array/list operations or something else?

This lead to the idea of developing a series of tests to drill down to the core of performance issues, which leads to two benefits: One is that developers who already know their requirements for their applications can choose the RIA technology that fits best for their needs, based on the result of these test series. The other one is that RIA manufacturers can optimize their virtual machines and browser plug-ins based on the conclusions of this thesis.

The tests

Run the tests, download the source and view the results here.

Feel free to download everything and play around with it. Most of the sources are released under the MIT license. Some others use GPL or BSD so make sure to check the license agreement in the header sections of each project/file but in general you don’t really have to worry about them since they’re all open source licenses. Just watch out for the copyleft agreement in GPL.

Flash Player 10.1 performance explosion

I am currently writing my diploma thesis at the University of Ulm on performance issues regarding Rich Internet Application technologies like Adobe Flash/Flex, JavaFX, Silverlight and various JavaScript solutions.
(Update: It’s done! :-) )

I started back in December when I first noticed that Adobe’s Flash Player seriously has some performance issues. It always was by far the slowest of all technologies.
Today, I retried some of my self-written benchmarks using the new Flash Player 10.1 RC4 and I was absolutely blown away. The new version is so fast, it’s absolutely incredible.
I am not done yet with my thesis until mid-July, so I won’t publish to much about it here but I thought it might be interesting to show just one benchmark-result here.

I don’t want to go to much into detail regarding the test implementations since these might change til July, so if you want to know more about the exact details on my benchmark, you gotta wait til I am done with my thesis. So long, think of this as a “preview” ;-) since some things might still change. The final test/benchmark will be revealed in 2 months when it’s 100% finished. I just didn’t want to post anything that’s not done yet. As already said, this is just a little teaser for the final benchmark.

Test setup
All tests were run on a Macbook Pro with an Intel Core 2 Duo at 2.53 GHz and 8 GB of RAM.
Tests, which require a plugin were running using Safari.
Additionally, I also ran some JavaScript-based tests on Firefox, Google Chrome and Safari.
The code base for all tests is basically the same, except for differences regarding Syntax, of course. This makes all test results comparable.

Results (Click to enlarge):

All values are in milliseconds [ms] => Less is better

In numbers:

  • Flash Player 10.1: 528 ms
  • JavaScript (Safari 4.0.5): 1015 ms
  • JavaScript (Google Chrome 5.0.375.29: 1039 ms
  • Silverlight 4.0.50401.0: 1300 ms
  • JavaFX 1.3 (JRE 1.6.0_17): 1392 ms
  • Firefox 3.6.3: 1449 ms
  • JavaFX 1.2 (JRE 1.6.0_17): 1635 ms
  • Flash Player 10.0: 3201 ms

Adobe, what the hell did you do???

Note: Please be aware, that I am basically testing FP 10.1 vs FP 10.0 here. This has nothing to do with the latest RC4 version of FP 10.1 since I haven’t done any tests with older release candidates yet.

Update: Added test results for JavaFX 1.3 and fixed a mistake regarding the test results for Firefox.

Update II: According to a blog entry of Tinic Uro, an engineer at Adobe Systems, the reason why Flash Player 10.1 works so well on Mac OS is Apple’s Core Animation Framework (Thanks to Matthew for the link!)
What I really liked about this blog post is the following:

“You might have noticed that Core Animation is a Cocoa API. Yes, Flash Player 10.1 is a true Cocoa app now (with a Carbon fallback to support Firefox and Opera which are not Cocoa yet).”

Flash Player is a true Cocoa application now? Nice.