Day 1: Points

Day one of the 30 day map challenege is points!

One of the datasets I use all the time in testing is named ns-water_point from GeoArrow Data. When I made this test data I didn’t think very hard about it…the Nova Scotia Geospatial Data Directory has a hydrological section and I spent the better part of a decade there working on lakes…it seemed like fun! Other than a few sanity checks I never really looked at the data, though, even though I read in the points one all the time as an example because it’s the smallest.

Day one of the 30 day map challenege seemed like a good time to resolve this. What is ns-water_point, anyway!?

Let’s roll. I’ll be using SedonaDB, so let’s load it:

# pip install "apache-sedona[db]"
import sedona.db

sd = sedona.db.connect()
sd.options.interactive = True

Next, let’s get the GeoParquet url from the GeoArrow Data page:

url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-point_geo.parquet"

Next I’ll read it in as a view. Views in SedonaDB are lazy and this is really more like giving it a name to make the SQL less verbose.

sd.read_parquet(url).to_view("points", overwrite=True)

Next let’s use lonboard to get a handle on what exactly this is!

import lonboard

lonboard.viz(sd.view("points"))
/Users/dewey/gh/dewey.dunnington.ca3/.venv/lib/python3.13/site-packages/lonboard/_geoarrow/ops/reproject.py:116: UserWarning: Input being reprojected to EPSG:4326 CRS.
Lonboard is only able to render data in EPSG:4326 projection.
  warnings.warn(

It took a little zooming to figure this one out but…they’re rocks. Like, the rocks that are in the middle of a lake that you really don’t want to bump into when you’re boating. I figured this out by zooming in to Gaspereau Lake, which is a place I’ve done a bit of canoeing. A fun part about the rocks on that lake is that most of them are just below the water’s surface. The water is tea-coloured and you can see about 6 inches into it on a good day, which means that canoeing in most parts of Gaspereau Lake is a an exerecise in hitting a lot of rocks.

Ok…this is a map challenege, so let’s make a proper map. I got myself a box in lon/lat space from WKTMap and we’ll get ourselves the points for the Gaspereau Lake area. One slight hiccup is that we need our intersecting box that we want to filter on to be in the same CRS as the data. Luckily, we can use ST_Transform() to get our box in the right coordinate space, and we can introspect the schema to get the right CRS to project it to.

box = "POLYGON ((-64.624157 44.941838, -64.509315 44.941838, -64.509315 44.991027, -64.624157 44.991027, -64.624157 44.941838))"
points_crs_json = sd.view("points").schema.field("geometry").type.crs.to_json()

sd.sql(f"""
SELECT * FROM '{url}'
WHERE ST_Intersects(
  geometry,
  ST_Transform(ST_SetSRID(ST_GeomFromWKT('{box}'), 4326), '{points_crs_json}')
)
""").to_memtable().to_view("df_points_subset")

I used to_memtable() here since views are lazy and I actually want to resolve this in memory so that I don’t have to download the data multiple times while I’m fidding with the map.

Next we’ll need a backdrop. I’ll use the ns-water_poly layer for the background which is (mostly) lakes. Here I’ll use a cool feature of SedonaDB…you can use a subquery as an argument to ST_Intersects(). So here I’ll only select the lakes that are relevant to the rectangle of the points we’ll be plotting.

poly_url = "https://github.com/geoarrow/geoarrow-data/releases/download/v0.2.0/ns-water_water-poly_geo.parquet"

sd.sql(f"""
SELECT geometry FROM '{poly_url}'
WHERE ST_Intersects(geometry, (SELECT ST_Envelope_Aggr(geometry) FROM df_points_subset))
""").to_memtable().to_view("poly_backdrop", overwrite=True)

Next I’ll do some plotting with geopandas. This is matplotlib and not the prettiest of plots, but the data are already in a meters-based coordinate system so they plot with the right proportions.

import matplotlib.pyplot as plt

ax = sd.view("poly_backdrop").to_pandas().plot(color="lightblue")
sd.view("df_points_subset").to_pandas().plot(color="black", ax=ax)
ax.set_xticks([])
ax.set_yticks([])

plt.rcParams['figure.dpi'] = 600
plt.rcParams['savefig.dpi'] = 600

png

And there we have it! It turns out my test data has been rocks for all these years and I’ve even run into some of them.

Software Engineer & Geoscientist