wk version 0.7.0!

wk version 0.7.0!

Another release of the wk package is upon us! This release is a mix of exciting new features and bugfixes for some corner cases I encountered when working on the geoarrow package. Let’s get to it! Sit back, relax, and enjoy the…well, let just do this thing.

library(wk)
packageVersion("wk")
#> [1] '0.7.0'

Grids support (i.e., raster, sort of)

A few months ago I spent some time prototyping a package that I called grd. At the time I saw wk and grd as handling separate problems, but we’re now up to two kids and the extra maintenance of another package seemed daunting. Also, I found myself typing grd::grd(nx = something, ny = something) a lot, which suggested that it might have a place in the wk package, whose scope is something along the lines of “all the nuts and bolts that user-facing spatial packages need to do their thing”.

This release adds the grd_rct() and grd_xy() classes, which represent a grid of rectangles and a grid of points, respectively. They are stored as a bounding box (as an rct()) and data, which is an array() or anything that can be indexed like one. In the common case where you just want a bunch of evenly-spaced points or rectangles, you can use the shortcut grd():

bbox <- rct(0, 0, 10, 10)
grid <- grd(bbox, nx = 5, ny = 6)
plot(grid, border = "black")

Under the hood, the grid is just the bbox plus an array with six rows and five columns.

grid
#> <wk_grd_rct [6 x 5 x 0] => [0 0 10 10]>
#> List of 2
#>  $ data: logi[1:6, 1:5, 0 ] 
#>  $ bbox: wk_rct[1:1] [0 0 10 10]
#>  - attr(*, "class")= chr [1:2] "wk_grd_rct" "wk_grd"

By default grd() creates a grid of rectangles, but you can also create a grid of points:

grd(bbox, nx = 5, ny = 6, type = "centers")
#> <wk_grd_xy [6 x 5 x 0] => [1 0.8333333 9 9.166667]>
#> List of 2
#>  $ data: logi[1:6, 1:5, 0 ] 
#>  $ bbox: wk_rct[1:1] [1 0.833 9 9.17]
#>  - attr(*, "class")= chr [1:2] "wk_grd_xy" "wk_grd"

Both of these are compact representations of potentially huge numbers of features. If you need to explode them into a one-feature-per-cell form, you can use as_rct() or as_xy():

as_rct(grid)
#> <wk_rct[30]>
#>  [1] [0 8.333333  2 10.000000] [0 6.666667  2  8.333333]
#>  [3] [0 5.000000  2  6.666667] [0 3.333333  2  5.000000]
#>  [5] [0 1.666667  2  3.333333] [0 0.000000  2  1.666667]
#>  [7] [2 8.333333  4 10.000000] [2 6.666667  4  8.333333]
#>  [9] [2 5.000000  4  6.666667] [2 3.333333  4  5.000000]
#> [11] [2 1.666667  4  3.333333] [2 0.000000  4  1.666667]
#> [13] [4 8.333333  6 10.000000] [4 6.666667  6  8.333333]
#> [15] [4 5.000000  6  6.666667] [4 3.333333  6  5.000000]
#> [17] [4 1.666667  6  3.333333] [4 0.000000  6  1.666667]
#> [19] [6 8.333333  8 10.000000] [6 6.666667  8  8.333333]
#> [21] [6 5.000000  8  6.666667] [6 3.333333  8  5.000000]
#> [23] [6 1.666667  8  3.333333] [6 0.000000  8  1.666667]
#> [25] [8 8.333333 10 10.000000] [8 6.666667 10  8.333333]
#> [27] [8 5.000000 10  6.666667] [8 3.333333 10  5.000000]
#> [29] [8 1.666667 10  3.333333] [8 0.000000 10  1.666667]
as_xy(grid)
#> <wk_xy[30]>
#>  [1] (1 9.1666667) (1 7.5000000) (1 5.8333333) (1 4.1666667) (1 2.5000000)
#>  [6] (1 0.8333333) (3 9.1666667) (3 7.5000000) (3 5.8333333) (3 4.1666667)
#> [11] (3 2.5000000) (3 0.8333333) (5 9.1666667) (5 7.5000000) (5 5.8333333)
#> [16] (5 4.1666667) (5 2.5000000) (5 0.8333333) (7 9.1666667) (7 7.5000000)
#> [21] (7 5.8333333) (7 4.1666667) (7 2.5000000) (7 0.8333333) (9 9.1666667)
#> [26] (9 7.5000000) (9 5.8333333) (9 4.1666667) (9 2.5000000) (9 0.8333333)

As a reminder, the xy() and rct() classes are record-style vectors and can go zero-copy into data frames:

head(as.data.frame(as_rct(grid)))
#>   xmin     ymin xmax      ymax
#> 1    0 8.333333    2 10.000000
#> 2    0 6.666667    2  8.333333
#> 3    0 5.000000    2  6.666667
#> 4    0 3.333333    2  5.000000
#> 5    0 1.666667    2  3.333333
#> 6    0 0.000000    2  1.666667

grd() objects also have st_as_sfc() and st_geometry() methods, so you can use them to create grids there, too. This is much like st_make_grid() but is usually faster (although as_rct() and as_xy() are much much much faster, so use them when you can!).

bench::mark(
  as_rct = as_rct(grd(bbox, nx = 1e2, ny = 1e2)),
  st_as_sfc = st_as_sfc(grd(bbox, nx = 1e2, ny = 1e2)),
  st_make_grid = st_make_grid(bbox, n = c(1e2, 1e2)),
  check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 3 × 6
#>   expression        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 as_rct        122.6µs 146.16µs   5549.       326KB     40.0
#> 2 st_as_sfc       6.7ms   6.88ms    140.       736KB     12.0
#> 3 st_make_grid  150.8ms 164.06ms      5.71     984KB     30.5

If you have a matrix or array whose first two dimensions are y and x, the grd_rct()/grd_xy() classes can help you keep your data anchored in space. For example, the ubiquitous R dataset volcano with an approximate bounding box:

bbox <- rct(
  5917000,       1757000 + 870,
  5917000 + 610, 1757000,
  crs = "EPSG:2193"
)

(grid <- grd_rct(volcano, bbox))
#> <wk_grd_rct [87 x 61] => [5917000 1757000 5917610 1757870] with crs=EPSG:2193>
#> List of 2
#>  $ data: num [1:87, 1:61] 97 97 98 98 99 99 99 99 100 100 ...
#>  $ bbox: wk_rct[1:1] [5917000 1757000 5917610 1757870]
#>  - attr(*, "class")= chr [1:2] "wk_grd_rct" "wk_grd"
plot(grid)

You can do matrix-like subsetting on grids and it will calculate the appropriate bounding box for your subset:

plot(grid[1:10, ])

The grd subsetting and plot also knows about R nativeRaster objects, which means you can attach bounding boxes to images read using (e.g.) png::readPNG(..., native = TRUE):

grid_png <- grd_rct(png::readPNG("mt-eden.png", native = TRUE), wk_bbox(grid))
plot(grid_png)

The operators that help with bounding box calculation, downsampling, and index math are fairly extensive but I would stop short of calling it “raster support”. The wk package is about nuts and bolts: the nuts and bolts around grids and bounding boxes are finicky to get right but not really all that hard. In its current form grid support is probably not all that useful but hopefully in the next release I’ll be able to fill in the missing pieces.

Example data

When writing geoarrow I needed a lot of data with very specific corner cases: empties, nulls, geometries with and with out CRSes, and nested geometry collections. When revamping the s2 package I needed them as well, and when playing with lightweight GeoPackage support I used pretty much the same data. All of those package used wk for various bits, so I just added the data there. You can access it via wk_example() or in a more raw form via wk_example_wkt.

plot(wk_example("nc"))

PROJJSON CRS support

The wk package doesn’t do any CRS calculation on its own but does provide tools for CRS propagation (e.g., when subsetting an xy(), the result has the same CRS as the original) and comparison (i.e., when combining two xy() vectors, the two vectors must have the same CRS). When prototyping the geoarrow package I ran into another problem, which was that the GeoParquet metadata standard requires PROJJSON. This release adds the wk_crs_projjson() generic, which always gives a PROJJSON string from a CRS representation if one can be calculated.

cat(wk_crs_projjson(st_crs(4326)))
#> {
#>   "$schema": "https://proj.org/schemas/v0.4/projjson.schema.json",
#>   "type": "GeographicCRS",
#>   "name": "WGS 84",
#>   "datum_ensemble": {
#>     "name": "World Geodetic System 1984 ensemble",
#>     "members": [
#>       {
#>         "name": "World Geodetic System 1984 (Transit)"
#>       },
#>       {
#>         "name": "World Geodetic System 1984 (G730)"
#>       },
#>       {
#>         "name": "World Geodetic System 1984 (G873)"
#>       },
#>       {
#>         "name": "World Geodetic System 1984 (G1150)"
#>       },
#>       {
#>         "name": "World Geodetic System 1984 (G1674)"
#>       },
#>       {
#>         "name": "World Geodetic System 1984 (G1762)"
#>       },
#>       {
#>         "name": "World Geodetic System 1984 (G2139)"
#>       }
#>     ],
#>     "ellipsoid": {
#>       "name": "WGS 84",
#>       "semi_major_axis": 6378137,
#>       "inverse_flattening": 298.257223563
#>     },
#>     "accuracy": "2.0"
#>   },
#>   "coordinate_system": {
#>     "subtype": "ellipsoidal",
#>     "axis": [
#>       {
#>         "name": "Geodetic latitude",
#>         "abbreviation": "Lat",
#>         "direction": "north",
#>         "unit": "degree"
#>       },
#>       {
#>         "name": "Geodetic longitude",
#>         "abbreviation": "Lon",
#>         "direction": "east",
#>         "unit": "degree"
#>       }
#>     ]
#>   },
#>   "scope": "Horizontal component of 3D system.",
#>   "area": "World.",
#>   "bbox": {
#>     "south_latitude": -90,
#>     "west_longitude": -180,
#>     "north_latitude": 90,
#>     "east_longitude": 180
#>   },
#>   "id": {
#>     "authority": "EPSG",
#>     "code": 4326
#>   }
#> }

For non-sf CRS representations (e.g., a string like "OGC:CRS84"), the wk package now contains a cache (basically just so that geoarrow can write geoparquet files without a PROJ installation):

cat(wk_crs_projjson("OGC:CRS84"))
#> {"type":"GeographicCRS","name":"WGS 84 (CRS84)","datum_ensemble":{"name":"World Geodetic System 1984 ensemble","members":[{"name":"World Geodetic System 1984 (Transit)","id":{"authority":"EPSG","code":1166}},{"name":"World Geodetic System 1984 (G730)","id":{"authority":"EPSG","code":1152}},{"name":"World Geodetic System 1984 (G873)","id":{"authority":"EPSG","code":1153}},{"name":"World Geodetic System 1984 (G1150)","id":{"authority":"EPSG","code":1154}},{"name":"World Geodetic System 1984 (G1674)","id":{"authority":"EPSG","code":1155}},{"name":"World Geodetic System 1984 (G1762)","id":{"authority":"EPSG","code":1156}},{"name":"World Geodetic System 1984 (G2139)","id":{"authority":"EPSG","code":1309}}],"ellipsoid":{"name":"WGS 84","semi_major_axis":6378137,"inverse_flattening":298.257223563},"accuracy":"2.0","id":{"authority":"EPSG","code":6326}},"coordinate_system":{"subtype":"ellipsoidal","axis":[{"name":"Geodetic longitude","abbreviation":"Lon","direction":"east","unit":"degree"},{"name":"Geodetic latitude","abbreviation":"Lat","direction":"north","unit":"degree"}]},"scope":"Not known.","area":"World.","bbox":{"south_latitude":-90,"west_longitude":-180,"north_latitude":90,"east_longitude":180},"id":{"authority":"OGC","code":"CRS84"}}

Record class operators

The xy(), rct(), and crc() classes are very efficient representations of point, rectangle, and circle geometries, respectively. R is really good at the data.frame, and that’s basically how xy() and friends are able to be so much faster than list-based representations like geos::geos_geometry(), sf::st_sfc(), and s2::s2_geography(). This release adds accessors functions to get components of these objects and a few handy rectangle operators that I’d written for another package a few years ago:

points <- xy(1:5, 1:5)
xy_x(points)
#> [1] 1 2 3 4 5

box <- rct(2, 2, 4, 4)
rct_xmin(box)
#> [1] 2
rct_contains(box, points)
#> [1] FALSE  TRUE  TRUE  TRUE FALSE
rct_intersection(box, rct(3, 3, 5, 5))
#> <wk_rct[1]>
#> [1] [3 3 4 4]

What’s next?

I feel fortunate to have cleared out the open issue queue for this release, although I’m sure that as the geoarrow package moves out of the prototype phase that more will pop up! In the meantime, I’m excited about what the wk package has become: a package that takes care of the nuts and bolts so that other packages can focus on big picture problems.

Avatar
Dewey Dunnington
Geoscientist, Programmer, Educator