Upcoming Webinar: Best Practices for Optimizing Performance with GeoServer
Dear Reader,
One of the main goals of GeoSolutions customers is to improve performance of their geospatial server. We have been improving open source tools (e.g. GeoServer and GeoWebCache) to allow for proper dissemination of large geospatial datasets in private and public cloud environments. In this post, I highlight some tricks and tips to improve performance and I also cordially invite you to a webinar on June 10th 11:00-12:00 EDT / 15:00 GMT, led by our GeoServer lead developer Andrea Aime. Check other time zones here.
We want to enable a client to play with maps backed by millions of geometries or terabytes of images. Proper configuration of GeoServer and the data, in particular for large datasets, will keep us in the happy zone!
GeoServer has been improving over the years and more options are available to the administrators to tweak its configuration. A blog posted two years ago, provides some tweaking details. Lots of things have changed since then. For example, back then the Marlin renderer was not part of the JAVA JDK. Now it is part of JDK-9 onwards (See a JaveOne presentation from Bourges about this topic). You don’t need to do this extra configuration. If you are running on JDK-8 then follow the recommendations on the blog to improve the performance when rendering images.
So, what do you do if you have a ½ terabyte of OSM data and want to use it as a basemap, or you want to show on a map data from areas of interest that are tessellated in grid cells with very high resolution, or you want to cache layer groups containing thousands of GeoTIFFs (e.g. see thread on gis.stackexchange)?
The strategies and tricks revolve on minimizing the work to be performed by GeoServer and the database once a request is being made. This requires preparing as much as possible data in advance, selecting best formats, caching, and other strategies. I will provide key strategies in this post, as an introduction to the topic.
Can I portray millions of geometries in my web client?
Let’s say you have an area composed of 1 million geometries. You are not always going to show everything. You pick and choose what you want to present depending on the zoom levels. This can be tricky because you also need to process the attribute data from the contained geometries (e.g. adding, averaging). There is this kind of illusion that the user is getting all the data all the time. The magic is done via a smart setup on the backend to minimize the features being retrieved. Several approaches to tackle this issue:
1) Reduce the number of features being returned. You want super fast performance, present maximum 1000 features.
Pre-configure and select what to show at different levels (e.g. via style optimizations). As the user zooms-in, the data presented changes. New features and new labels might show-up, while others disappear.
Create bigger geometries that are composed of smaller geometries. The geometries returned depend on the zoom level of the request or the information of the request. When done properly discrete grids are an incredible mechanism to compute fast analytics. A good read about this topic is the OGC Discrete Global Grid System (DGGS). There is also related work going on, which we are helping to advance, as part of the OGC Testbed 16.
2) Simplify the vertices of the polygon. For example, if a polygon has 500,000 vertices at a higher zoom level the same polygon can be represented in hundreds of vertices and the end user should not notice the difference. The tutorial for using GeoTools Feature-pregenralized module explains this in detail.
Can I serve petabytes of raster data with GeoServer?
The answer is yes, but you need to properly configure them. Regarding formats GeoTIFF is the champion. It is very flexible, can be tiled and can be fined tuned for performance.
How do you structure a GeoTIFF? There are several structures such as structuring the data in a Single GeoTiff, in a Mosaic or in a Pyramid.
The structure strategy basically depends on the size of the data, and the dimensions.
- If single granules are < 20 Gb, using Single GeoTiffs are a good choice.
- If files are > 20 Gb or you have too many dimensions (e.g. numerical models might have multiple times, elevation and others), then use ImageMosaics.
- If the dataset if tremendously large and the data needs to be served at different resolutions, then ImagePyramid is the best option.
With the above approaches we have served satellite imagery covering the entire planet, as well as long time series of meteorological models, and the like, up to several petabytes of backend data.
As said before, I have only provided some tips and tricks in this post. Don’t miss our webinar and register, so you will have the opportunity to hear more about this topic and ask anything you want to Andrea.
Hope to see you virtually on June 10th, meanwhile stay safe and keep strong!
Cordially,
Luis