Friday, 30 January 2026

Spatial Capabilities in Enso Analytics - Creating Spatial Points and Spatial Joins

In our last post on spatial capabilities we looked at visualizations, inputting spatial files and spatial expressions. In this post we will look at creating points and spatial joins.

A Spatial Question?

Let's do something more interesting and answer a spatial question that needs a join. The question we are going to answer is which county in the UK has the most castles? I will start by showing you the answer and then talk through the steps.



With the dataset that I have the answer is Lincolnshire with 13 castles. 

How did we get there? Lets walk through it step by step.

Creating Spatial Objects Using st_point

Like in the last blog post we got the castle points using the overpass-api. Looking inside the get_castles component in our main workflow we see this.



This calls the overpass api with an overpass query to get the castles in the UK and then extracts the castle name from the json that is returned from that API.

Then back in the parent workflow we trim the data down to the three columns we are interested in:  Name, lat and long


Right now this is still just a standard table of data. To do some spatial processing on it we need to load it into DuckDB and convert those lat longs to spatial objects, which we do with the new st_point component. This component takes a longitude and latitude column and returns a DuckDB spatial object as we can see below:


Reading In the Boundary Data from a Shapefile

I found the UK county boundaries as a shape file on the geoboundaries website (https://www.geoboundaries.org/countryDownloads.html) which I downloaded to my local machine and unzipped.

We then read this in using the read_spatial_file component on our DuckDB connection


I wrapped this into a User Defined Component called get_county_boundaries to keep our parent workflow looking clean and easy to read.


Spatial Join

Now we have both our boundaries and our points as spatial objects in DuckDB we can use the DuckDB spatial engine to do a Spatial join. We do this using the same join component that we would do a regular data join with, but because these are spatial tables we get new spatial join options!


Here we choose Contains, as we want to match the counties to the castles when the castle is contained by the counties boundary. We choose a Left_Outer join so we can keep all of our counties even if they don't contain any castles.


After the join we now have all of our castle data with their containing county attached.

The Answer

The last thing to answer our question, is to aggregate the data to county level, count how many castles and then finally sort:


Other Questions

Of course now we have built the workflow we can re-use it to answer other questions? Like which county has the most pubs?

This time I found a data file at https://www.getthedata.com/open-pubs and loaded it directly into DuckDB using read_file


The rest of the workflow is the same as the castle one.

In the next post we will look at what DuckDB gives us in terms of handling larger data files, including my favourite csv file: the UK companies house dataset!

No comments:

Post a Comment