Sunday, August 18, 2019

Unsupervised remote land use detection of small Caribbean islands: a non-exhaustive exploration with Google Earth Engine

Remote land use detection is a complex problem, regardless if your approach is a type of supervised or unsupervised classification.

The intention of this article is to share my exploration with Google Earth Engine with the aim of educating you in case you are dealing with the same problem and my learnings turn out to be somewhat useful 🙂. For this short exploration, I have limited my scope to the detection of urban and suburban land use, i.e. developed land that is or was in use by humans within the Caribbean geographic region of small islands as defined by the United Nations Department of Economic and Social Affairs.

In the below widget you can preview a subset of small Caribbean islands and visualise results of time series detections of accumulating objects (the red dots) from satellite imagery. These red dots represent for the most part manmade objects. Note however that these do not necessarily represent accumulating land usage changes, but mostly represent objects that already exists and were detected in subsequent passes by that same satellite. Additionally, only a couple of small islands are included in this exploration. Further below in this article I will dive a bit deeper and elaborate as to how exactly these results were produced, what the current limitations are and possible future next steps.

Select a destination:

Anguilla land use detection timeseries

Time series of a small island in the Caribbean showing accumulation of detected structures from radar backscatter data, from June 2016 till February 2019. The percentage is the share of red pixels divided by the total number of masked pixels. Note that Google Earth Engine averages results per zoom level so pixels might appear larger than they actually are.


Why detect land use?

Remote land use detection allows for monitoring of areas to aid in both risk management due to climate change and conservation efforts due to ongoing economic developments. By having a robust system one can detect changes in land usage remotely; given there are operational sensors such as satellites in space that continuously collect electromagnetic radiation data. Assuming that we can establish a relationship with additional independent datasources it should be also possible to predict these changes within a certain degree of accuracy.

The geographic region of small islands in the Caribbean

The Caribbean is known for its beautiful beaches and is in general an appealing region to spend your vacation (I can attest to this coming from Aruba). However, Caribbean islands are surrounded by the ocean and like some urban cities have human populations near coastal areas. Depending on the elevation, coastal populations can be negatively impacted by rising sea levels and natural disasters in the near future. Additionally, this group of small islands are diverse, which poses a challenge on itself when it comes to land use detection. Think about detection of small town areas when compared to nations with large urban zones, different types of vegetation and seasonal patterns, topographies, infrastructure, etc. For these reasons, I find the Caribbean a good place to start exploring the development of a robust land use detection system.

Google Earth Engine

Google Earth Engine (GEE) has its pros and cons. While the use of Google cloud brings scalable geospatial analysis to your browser, the learning curve can be steep if you are not a programmer and the engine is also somewhat of a black box. The latter is likely a tradeoff in order to provide cloud-based speed and scalability as this is where most of the heavy lifting is carried out. From my experience it's very different than working with datasets locally in a tool such as NumPy. You'll get used to it eventually and you'll come to appreciate its speed for prototyping ideas.

Unsupervised remote land use detection

Ok, let us get a bit more into the details. As mentioned at the beginning of the article there are a couple of approaches to detect land usage. Within this context we consider detection a classification problem. Classification can be done by unsupervised, semi-supervised or by fully-supervised learning. It can have a single-class output (one score for developed land) or multi-class output (several scores for multiple land cover types, for example bare, vegetation or water). With a supervised approach you are given a vector dataset with labelled boundaries in geographical coordinates that define areas of developed land. Your satellite images are your unlabelled raster dataset. The task at hand is to then train your unlabelled dataset on your labelled dataset with the goal to predict developed land areas on new or unseen data. Supervised approaches can be time consuming and tedious when it comes to creating a large number of labels to increase accuracy. A close relative of the supervised approach is the is semi-supervised approach where both labelled and unlabelled data are trained and combined in a way that produces a higher degree of accuracy. Finally we have unsupervised training where there aren't any labels and training relies on clustering algorithms that extract features through kernel convolutions or similarity functions (e.g. SNIC).

Because this exploration is meant to be short and non-exhaustive, I've opted to solely focus on experimenting with unsupervised approaches and a single-class output. GEE provides a comprehensive catalog of public datasets to explore. I've focused on the ones that provide the best resolution for both multispectral and radar sensors. These are currently the satellites from the European Sentinel missions (10m² per pixel). Particularly, Sentinel-1.

Synthetic Aperture Radar

Each of the two satellites of the Sentinel-1 constellation, satellite 1A and 1B carries onboard a Synthetic Aperture Radar (SAR) instrument. Having experimented with Sentinel-2 data, which produces multispectral (coloured) images I tend to agree that a supervised approach would be more suited for these types of images, for example by using Deep Neural Networks as classifiers. This is because the visual patterns that represent manmade structures such as roads and buildings are complex and not easy to cluster with shallow techniques. One general difficulty when dealing with spectral data is that they collect radiation passively and in wavelengths that are obstructed by clouds. While the Caribbean is known for its abundant sun, a cloud-free image usually entails compositing an image over an entire year. This timespan can lower your change detection rate as changes over the course of a 12-month period are aggregated into a single image. In terms of training a model there is also the challenge of seasonal landscape changes (e.g. varying degrees of rainfall-induced vegetation) that has to be dealt with. I thus opted instead to focus on radar (an active sensor) that can penetrate clouds and relies on backscatter information. In case you are familiar with electromagnetic polarimetry, some satellites are able to measure up to four different types of polarisation channels, referred to as quad-polarised satellites. These allow for fancy decomposition techniques that can distinguish between things like single and double-bounce scattering. Unfortunately, Sentinel-1 is dual-polarised instead of quad-polarised and the HH-HV dual polarisation channels are only available at the poles for the monitoring of sea ice. Next to that, because of how GEE works, the Sentinel-1 data that is available is solely Ground Range Detected and thus only contains the amplitude (backscatter coefficient) of microwave signals. Finally, there weren't any VH-polarised images available in descending orbit (satellite 1B), at least from my experience with GEE while doing spot checks within the Caribbean. Satellite 1A does contain for almost all destinations both VV co-polarised and VH cross-polarised channels. Thus I was left with VV-VH dual-band cross-polarisation SAR 1A images (ascending orbits). Nonetheless, this turned out to be enough to continue prototyping ideas in new GEE territory.

Increasing the signal-to-noise ratio

The core idea behind this exploration is that SAR imagery when thresholded is similar to a point cloud. Each pass of the satellite produces a slightly different image, so as more passes are carried out the "signal" continues to accumulate over time. Due to the limitations described in the previous section, there is one parameter that has to be manually adjusted per destination, which is a minimum for thresholding. This minimum is first set after normalising the pixels followed by some histogram matching between the first and followup images. As a side note, there isn't any native function in GEE that does histogram matching at the time of writing, so I conjured up my own and have asked here if there is a better way to accomplish this on GEE servers.

Limitations and future work

Using Sentinel-1 in GEE does have its limitations for unsupervised remote land use detection but it also carries potential. I've jotted down a list of items that in my opinion merit further exploration:
  • Reduction of false-positives, for example on the islands of Bonaire and Antigua steep hills are detected as manmade structures. No radiometric terrain flattening is being applied in GEE, so maybe searching for backscatter coefficients of Sentinel-1 could be useful here. Besides misclassified dihedral angles, there is also the issue with scaleability in GEE. As the area of interest gets larger the scale of GEE Reducers gets adjusted to higher levels. While this ensures that the server request does not fail for histogram matching it does however reduce the overall accuracy as pixel values are incrementally averaged. This is the reason why for example the Bahamas group of islands is not present and why the scope of this exploration is limited to only relatively small islands.
  • Because of how electromagnetic waves bounce off of flat surfaces, roads which are manmade structures could get misclassified as water in SAR and this can lead to significant gaps in accuracy. Think for example on the exclusion of the airport landing strip of a tiny island such as Saba. Similarly, farmland is underrepresented in the results. For these two examples, either looking deeper at backscatter decomposition techniques or resorting to a semi-supervised approach could be useful.
  • The island of Barbuda, which has suffered tragic widespread damage from Hurricane Irma has to be looked at more closely, images returned have a narrow range with lots of noise.
  • Sentinel-1 was launched on October 2014 and has a designed lifespan of 7 years. In relative terms, 2014 is therefore still quite recent. Ideally one can go back in time and look 30 years back but then one would have to resort to LANDSAT collections which works at best at 30m² per pixel.
Thanks for making it this far and in case this information was helpful to you than great! If not then I'm sorry that you had to read it all through. The disclaimer here is that currently I'm not studying remote sensing but obviously do find it a very interesting field with potential to help steer decision making of our home planet for not just our own protection but the general wellbeing of home. Any feedback is highly appreciated, thanks!

No comments: