PostGIS Schema Design for Excavation Units
Excavation units form the foundational spatial framework for archaeological field recording, stratigraphic sequencing, and artifact provenience tracking. A production-grade implementation within the Artifact & Feature Spatial Database Design framework must enforce geometric validity, maintain strict coordinate reference system (CRS) alignment, and support automated ingestion from total stations, RTK-GNSS, and UAV photogrammetry pipelines. This guide details the schema architecture, version-pinned transformation workflows, and database-level constraints required for audit-ready spatial integrity.
1. Core Schema Architecture & Spatial Constraints
The excavation_units table enforces non-overlapping boundaries, mandatory metadata, and spatial indexing optimized for bounding-box queries. Database-level constraints replace fragile application-side validation, ensuring that only topologically sound geometries are committed.
-- Requires PostGIS >= 3.2, PostgreSQL >= 14
CREATE TABLE IF NOT EXISTS excavation_units (
unit_id VARCHAR(20) PRIMARY KEY,
phase_code VARCHAR(10) NOT NULL,
supervisor VARCHAR(100),
recorded_date DATE NOT NULL DEFAULT CURRENT_DATE,
geom GEOMETRY(Polygon, 32633) NOT NULL,
-- Enforce valid, non-self-intersecting polygons with measurable area
CONSTRAINT valid_geom CHECK (ST_IsValid(geom) AND ST_IsSimple(geom)),
CONSTRAINT positive_area CHECK (ST_Area(geom) > 0.001),
-- Prevent spatial overlap at the database level
CONSTRAINT no_overlap EXCLUDE USING gist (geom WITH &&)
);
-- Spatial index for rapid bounding-box queries and spatial joins
CREATE INDEX idx_excavation_units_geom ON excavation_units USING GIST (geom);
-- Composite index for phase-filtered queries
CREATE INDEX idx_excavation_units_phase ON excavationation_units (phase_code);
The EXCLUDE USING gist (geom WITH &&) constraint guarantees zero geometric overlap during INSERT/UPDATE operations. Combined with ST_IsValid() and area thresholds, this eliminates post-hoc topology cleaning and prevents stratigraphic ambiguity. For sites requiring multi-polygon unit definitions, replace GEOMETRY(Polygon, 32633) with GEOMETRY(MultiPolygon, 32633) and adjust the ST_IsSimple() check accordingly.
2. Version-Pinned CRS Validation & Transformation Pipeline
Field-collected coordinates routinely arrive in mixed projections (e.g., EPSG:4326 WGS84 from GNSS receivers, local arbitrary grids from legacy surveys, or EPSG:3003/32632 UTM zones). Ingesting mismatched CRS data corrupts spatial joins and invalidates area calculations. The following pipeline validates incoming GeoDataFrames, enforces a site-standard SRID, and logs transformation discrepancies before database insertion.
Pinned Dependencies (requirements.txt)
geopandas==0.14.4
psycopg2-binary==2.9.9
pyproj==3.6.1
pandas==2.2.2
shapely==2.0.4
Transformation & Validation Module
import geopandas as gpd
import psycopg2
from psycopg2 import sql, OperationalError, IntegrityError
from pyproj import CRS
import logging
import pandas as pd
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
datefmt="%Y-%m-%d %H:%M:%S"
)
TARGET_SRID = 32633 # UTM Zone 33N (replace with site-specific EPSG)
DB_CONN_STRING = "postgresql://user:pass@host:5432/archaeology_db"
def validate_and_transform_crs(
gdf: gpd.GeoDataFrame,
target_srid: int = TARGET_SRID
) -> gpd.GeoDataFrame:
"""Validate CRS metadata, transform to target projection, and log discrepancies."""
if gdf.crs is None:
raise ValueError("Input GeoDataFrame lacks CRS definition. Assign before ingestion.")
source_crs = gdf.crs.to_epsg()
if source_crs is None:
raise ValueError("CRS cannot be resolved to an EPSG code. Verify input metadata.")
if source_crs != target_srid:
logging.info(f"Transforming CRS from EPSG:{source_crs} to EPSG:{target_srid}")
gdf = gdf.to_crs(epsg=target_srid)
else:
logging.info(f"CRS already matches target EPSG:{target_srid}. No transformation required.")
# Enforce 2D geometry for excavation units (strip Z/M if present)
if gdf.geometry.has_z.any() or gdf.geometry.has_m.any():
gdf.geometry = gdf.geometry.apply(lambda geom: geom if geom.is_empty else geom.force_2d())
logging.warning("Z/M coordinates stripped to enforce 2D compliance.")
return gdf
def insert_units_to_postgis(gdf: gpd.GeoDataFrame, conn_str: str = DB_CONN_STRING):
"""Batch insert validated units with explicit error routing."""
with psycopg2.connect(conn_str) as conn:
with conn.cursor() as cur:
for _, row in gdf.iterrows():
try:
cur.execute(
sql.SQL("""
INSERT INTO excavation_units (unit_id, phase_code, supervisor, recorded_date, geom)
VALUES (%s, %s, %s, %s, ST_GeomFromWKB(%s, %s))
ON CONFLICT (unit_id) DO UPDATE SET
geom = EXCLUDED.geom,
phase_code = EXCLUDED.phase_code,
supervisor = EXCLUDED.supervisor,
recorded_date = EXCLUDED.recorded_date
"""),
(
row["unit_id"],
row["phase_code"],
row.get("supervisor"),
row.get("recorded_date", pd.Timestamp.now().date()),
row["geom"].wkb,
TARGET_SRID
)
)
except IntegrityError as e:
logging.error(f"Constraint violation for unit {row['unit_id']}: {e}")
conn.rollback()
except Exception as e:
logging.error(f"Unexpected error inserting unit {row['unit_id']}: {e}")
conn.rollback()
conn.commit()
logging.info("Batch ingestion complete.")
3. Pipeline Routing & Conflict Resolution
A production spatial ingestion pipeline must route data through deterministic stages to prevent silent corruption. The recommended flow is:
- Raw Acquisition → Total station/GNSS exports (CSV, Shapefile, GeoJSON) or photogrammetric orthomosaics.
- Schema Alignment → Map field columns to
unit_id,phase_code,supervisor,recorded_date. - CRS Validation → Execute
validate_and_transform_crs(). Reject datasets with unresolvable EPSG codes. - Topology Pre-Check → Run
gdf.geometry.is_valid.all()andgdf.geometry.area.min() > 0.001in-memory before database commit. - Database Insertion → Execute
insert_units_to_postgis()withON CONFLICTrouting for idempotent updates. - Audit Logging → Capture transformation deltas, constraint violations, and commit timestamps in a separate
ingestion_audittable.
When EXCLUDE USING gist triggers an overlap violation, the pipeline should route the offending geometry to a quarantine table for manual review rather than halting the entire batch. This preserves field productivity while maintaining database integrity.
4. Integration with Heritage Workflows
Once unit boundaries are committed, downstream processes rely on precise spatial joins to associate contexts, features, and artifacts. Automated synchronization pipelines, such as those detailed in Automating Artifact Attribute Synchronization, depend on the excavation_units.geom column remaining immutable after initial validation. Any boundary adjustments should be versioned through a temporal extension (e.g., pg_temporal or application-level audit tables) to preserve stratigraphic provenance.
Spatial queries linking units to adjacent sections, trench profiles, and find distributions are optimized through the GIST index. For complex relational modeling, refer to Spatial Relationship Modeling in Heritage DBs to implement ST_Contains, ST_Touches, and ST_DWithin joins that enforce archaeological context rules without degrading query performance.
5. Operational Compliance & Performance Tuning
- CRS Authority: Always verify the target EPSG against the official EPSG Geodetic Parameter Dataset. UTM zones must match the site’s longitudinal band; using an incorrect zone introduces systematic meter-scale distortion.
- Index Maintenance: Run
REINDEX INDEX idx_excavation_units_geomquarterly or after bulk boundary revisions to prevent index bloat. - Query Optimization: Use
SET enable_seqscan = off;during development to force spatial index usage, but revert to default in production. The PostGIS query planner reliably selects GIST scans forST_IntersectsandST_Withinpredicates. - Documentation: Consult the official PostGIS Reference Manual for function signatures and constraint behaviors specific to your database version.
By enforcing strict geometric constraints, version-pinned transformation routines, and deterministic pipeline routing, excavation unit schemas become reliable spatial anchors for multi-season archaeological programs. This architecture scales from single-trench surveys to landscape-scale heritage management systems while maintaining reproducible, audit-ready spatial provenance.