This release of the s2 package for R is an exciting one! Most users will use s2 implicity through its use in sf; however, the s2 package can be used on its own and these improvements will likely be available in future releases of sf.

Most of the hours spent on this release went into a massive refactor of the initial implementation so that the internals are more useful elsewhere (e.g., maybe a Python package), but that refactor also opened up opportunities to speed things up and add a few nice features. Let’s get to it!

### Unions and within-distance queries are faster

A long standing issue with the current release of s2 is that `s2_union_agg()`

are slow. The initial implementation accumulated geographies as it went; the new version starts by grouping geographies into pairs and recursively merges the pairs together until one final feature has been built. This is about 10 times faster than the previous version and should scale much better.

```
bench::mark(s2_union_agg(s2_data_countries()))
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:> <dbl> <bch:byt> <dbl>
#> 1 s2_union_agg(s2_data_countries()) 205ms 205ms 4.88 871KB 0
```

Another long standing issue was that the within-distance query was unusably slow. This is an important type of query because it’s much like checking for an intersection but with a tolerance. In GEOS you can create a buffered feature; in S2 that functionality isn’t implemented yet, and the workaround used a lot of memory. The new release adds `s2_prepared_dwithin()`

and fixes `s2_dwithin_matrix()`

such that it efficiently uses preselection based on a simplified S2 cell covering. For the test case I was using it resulted in a query that was 30 times faster although I’m sure timing will vary depending on how useful preselection is for a given use case.

```
points <- s2_point(runif(1e4, -1, 1), runif(1e4, -1, 1), runif(1e4, -1, 1))
geog <- s2::as_s2_geography(as_s2_lnglat(points))
countries <- s2_data_countries()
bench::mark(
s2_dwithin_matrix(countries, geog, 1e6),
s2_dwithin_matrix(geog, countries, 1e6),
check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc
#> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt>
#> 1 s2_dwithin_matrix(countries, geog, 1e+06) 162ms 165ms 6.08 133.2KB
#> 2 s2_dwithin_matrix(geog, countries, 1e+06) 414ms 416ms 2.40 80.7KB
#> # … with 1 more variable: `gc/sec` <dbl>
```

For the pairwise `s2_dwithin()`

it’s sometimes faster to use preselection based on an index and sometimes not. For most calls where the length of the second argument is 1 (common, I think), you can now use the new `s2_prepared_dwithin()`

to speed up your query.

```
bench::mark(
s2_dwithin(geog, countries[1], 1e6),
s2_prepared_dwithin(geog, countries[1], 1e6)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl>
#> 1 s2_dwithin(geog, countries[1], 1e+06) 9.23ms 9.52ms 105.
#> 2 s2_prepared_dwithin(geog, countries[1], 1e+06) 3.38ms 3.48ms 282.
#> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>
```

### Plotting

Testing spherical geometry operations is hard, particularly because most plotting tools are Cartesian-centric and make it hard to visualize geodesic edges like those in S2 geographies. The S2 library has tools that make projecting spherical geometries relatively simple and I was tired of “flying blind” when trying to make sure an operation “worked”. Thus, enter `s2_plot()`

! It only does one projection (orthographic with customizable centre) and is not designed for presentation-ready plots but was a total game changer for me debugging the features in this release.

```
s2_plot(s2_data_countries("Antarctica"))
```

### Convex hulls and point-on-surface on the sphere

Thanks to Sylvain Piry, the underlying S2 library’s convex hull implementation is now hooked up in R! You can now do spherical convex hulls item-wise (`s2_convex_hull()`

) or as an aggregate function (`s2_convex_hull_agg()`

).

```
s2_plot(s2_convex_hull(s2_data_countries("Antarctica")))
s2_plot(s2_data_countries("Antarctica"), add = TRUE)
```

Thanks to Kyle Butts, the s2 R package has an implementation of “point on surface” (`s2_point_on_surface()`

), which is originally a GEOS function that returns a deterministic point that definitely intersects a feature and is sort of in the middle of it (perhaps as a starting point for label placement). The implementation for points and lines is readily translated from GEOS; however, the polygon implementation required a new approach (we use the centre of the largest S2 cell that will fit inside the polygon).

```
s2_plot(s2_data_countries())
s2_plot(s2_point_on_surface(s2_data_countries()), add = TRUE)
```

### Distance-constrained k-nearest neighbours

Almost a year ago, Roger Bivand opened an issue noting that distance-constrained k-nearest neighbours was slow when using S2 as a backend. This helped push a few features forward (including the faster `s2_dwithin_matrix()`

implementation), but the direct fix was to add a `max_distance`

argument to match the `min_distance`

argument in `s2_closest_edges()`

.

```
cities <- s2_data_cities()
city_names <- s2_data_tbl_cities$name
knn_1000_km <- s2_closest_edges(
cities[city_names == "Paris"], cities,
k = 20,
min_distance = 0,
max_distance = 1e6
)
city_names[knn_1000_km[[1]]]
#> [1] "Ljubljana" "San Marino" "Prague" "The Hague" "Dublin"
#> [6] "Berlin" "Andorra" "Geneva" "Vaduz" "Monaco"
#> [11] "London" "Bern" "Luxembourg" "Amsterdam" "Brussels"
```

### Cells and Cell Unions

The previous release quietly added an `s2_cell()`

vector class; however, the most used feature of the underlying s2 library that involved s2 cells (the “covering”) wasn’t implemented. This release adds a `s2_cell_union()`

vector class and `s2_covering_cell_ids()`

, which implements the covering logic returning cell union vector. The covering is a simplified version of a geography that is kind of like a bounding box but potentially more accurate and sphere-friendly.

```
(covering <- s2_covering_cell_ids(s2_data_countries("Canada")))
#> <s2_cell_union[1]>
#> List of 1
#> $ : s2_cell[1:8] 4b , 4d , 4f , 51 , 53 , 55 , 56b, 89
s2_plot(s2_cell_polygon(unlist(covering)), border = "red")
s2_plot(s2_data_countries(), add = T)
```

The cell union class supports predicates like contains and intersects that are usually faster than their geographical counterparts. You can use transformers like intersection and union to approximate the result an operation rather than actually compute it which is potentially much faster.

```
countries <- s2_data_countries()
canada <- s2_data_countries("Canada")
countries_approx <- s2_covering_cell_ids(s2_data_countries())
may_intersect_canada <- s2_cell_union_intersects(countries_approx, covering)
s2_data_tbl_countries$name[may_intersect_canada]
#> [1] "The Bahamas" "Canada"
#> [3] "Cuba" "Greenland"
#> [5] "Iceland" "Russia"
#> [7] "United States of America"
bench::mark(
s2_cell_union_intersects(countries_approx, covering),
s2_intersects(countries, canada),
check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec`
#> <bch:expr> <bch:t> <bch:t> <dbl>
#> 1 s2_cell_union_intersects(countries_approx, covering) 23.3µs 24.1µs 39981.
#> 2 s2_intersects(countries, canada) 584.5µs 593.6µs 1664.
#> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>
```

Another compelling use case for s2 cell manipulation is grouping geographical features such that groups contain relatively nearby features. This kind of grouping is useful because it means that the bouding box/covering of a group can be used more effectively for preselection.

```
# use the aggregate union to find cells to use for groups
aggregate_union <- s2_covering_cell_ids_agg(s2_data_countries(), max_cells = 10)
groups_cell <- unlist(aggregate_union)
# assign every feature exactly one group
group_id <- rep(NA_integer_, length(countries_approx))
for (i in seq_along(groups_cell)) {
group_id[s2_cell_union_intersects(countries_approx, groups_cell[i])] <- i
}
# sample a group to see if they're vageuly close together
split(s2_data_tbl_countries$name, group_id)[[9]]
#> [1] "Argentina" "Antarctica"
#> [3] "French Southern and Antarctic Lands" "Australia"
#> [5] "Chile" "Falkland Islands"
#> [7] "New Zealand"
```

The cell and cell union vector classes are still experimental and feedback about missing features or rough edges (pun intended) are welcome!

### Planar import and export

Many of the issues related to using S2 as a geometry backend for spherical coordinates are in some way or another due to the difference between edges in planar space and spherical space. Almost every data set in longitude/latitude assumes GeoJSON-like handling of edges, meaning that points are connected by straight lines assuming normal planar rules. For example, the following polygon is totally valid in most GIS software’s definition of “valid geometry”:

```
poly <- "POLYGON ((-70 50, 0 50, 0 70, -35 52, -70 70, -70 50))"
wk::wk_plot(wk::wkt(poly), border = "red")
```

If you try to import this into s2, you’ll get the dreaded error:

```
s2_geog_from_text(poly)
#> Error in wk_handle.wk_wkt(wkt, s2_geography_writer(oriented = oriented, : Loop 0 is not valid: Edge 0 crosses edge 2
```

That’s because s2 interprets the polygon’s boundary like this:

```
poly_boundary <- wk::wk_linestring(wk::wkt(poly))
s2_plot(s2_geog_from_text(poly_boundary), col = "red")
s2_plot(s2_data_countries(), add = TRUE)
```

The latest release of s2 adds the ability to keep the usual GIS rules for edges when importing features by adding points to edges to keep them within a given tolerance of the original. Using a smaller tolerance will add more points, so using the largest possible tolerance for your use-case is recommended.

```
s2_plot(
s2_geog_from_text(
poly,
planar = TRUE,
tessellate_tol_m = 1e3
),
border = "red"
)
s2_plot(s2_data_countries(), add = TRUE)
```

Similarly, when exporting geographies from s2 into other GIS software, the previous release made it difficult to export edges in the way that other GIS software expected them. One example of this is exporting s2 cells, which by definition have geodesic edges.

```
cell <- s2_cell_parent(as_s2_cell(s2_lnglat(-64, 45)), 2)
cell_geod <- s2_as_binary(s2_cell_polygon(cell))
cell_planar <- s2_as_binary(s2_cell_polygon(cell), planar = TRUE)
wk::wk_plot(wk::as_wkb(cell_geod))
wk::wk_plot(wk::as_wkb(cell_planar), border = "red", add = TRUE)
```

Usually vertices of normal GIS layers are close enough together that this won’t make a meaningful difference in the output. If you are getting invalid geometry errors from s2 input or output, though, it’s another tool you can use to work around the problem.

### Internal refactoring

The major change in this release was a big refactor of the initial implementation to make it possible for other to re-use some of the code we’d written to make something like simple features work with s2 as a backend. The result was a standalone library and a lot of code quality improvements! There is interest in the Python geospatial community to add Python bindings and perhaps a set of geoarrow/Arrow query engine bindings in the future!

This release also eliminates a dependency on an old version of the wk package and commits to the infrastructure in recent wk releases. The most useful thing this does is add the ability to speed up conversions between sf and s2 representations of a geometry (particularly in the sf direction).

```
fast_as_s2 <- function(x) {
wk::wk_handle(x, s2_geography_writer())
}
fast_as_sfc <- function(geog) {
wk::wk_handle(geog, wk::sfc_writer())
}
nc_sfc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))$geometry
nc_big_sfc <- rep(nc_sfc, 100)
nc_big_s2 <- as_s2_geography(nc_big_sfc)
bench::mark(
fast_as_s2(nc_big_sfc),
sf::st_as_s2(nc_big_sfc),
check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 fast_as_s2(nc_big_sfc) 203ms 204ms 4.88 81.6KB 0
#> 2 sf::st_as_s2(nc_big_sfc) 224ms 224ms 4.46 5.68MB 2.23
bench::mark(
fast_as_sfc(nc_big_s2),
sf::st_as_sfc(nc_big_s2),
check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 fast_as_sfc(nc_big_s2) 15.1ms 15.8ms 55.9 4.36MB 9.31
#> 2 sf::st_as_sfc(nc_big_s2) 744.4ms 744.4ms 1.34 155.52MB 14.8
```

The handler/writer implementations allow some other wk infrastructure to work out of the box:

```
wk::wk_flatten(as_s2_geography("MULTIPOINT (0 1, 2 3, 4 5)"))
#> <geodesic s2_geography[3] with CRS=OGC:CRS84>
#> [1] POINT (0 1) POINT (2 3) POINT (4 5)
```

### What’s next?

In addition to maintaining the features added in this release, there are some exciting features coming up! Buffers on the sphere were just added to the upstream google/s2 release, and the ability to simplify features together (e.g., polygon coverages) has always been available in s2 but never exposed in the R package. These plus modernizing the s2 package build system is up next!