Spatial Tech logo
  • About 
  • GIS 
  •    Toggle theme
    •   Light
    •   Dark
    •   Auto
  •  
    •   Light
    •   Dark
    •   Auto
  1. Home
  2. Python
  3. Geospatial Data Processing with Python: A Network Data Case Study

Geospatial Data Processing with Python: A Network Data Case Study

Posted on September 1, 2024 • 3 min read • 621 words
Python
 
gis
 
Network Data
 
Utility Data
 
Python
 
gis
 
Network Data
 
Utility Data
 
Share via
Spatial Tech
Link copied to clipboard

Learn how to handle missing values and mismatched data in network datasets.

On this page
 

  • The Challenge
  • The Approach
    • Libraries and Tools
  • Deliverables
    • Conclusion

Geospatial Data Processing with Python: A Network Data Case Study

This blog post details a Python exercise mimicking a real-world scenario of transforming messy network data into a format suitable for business system.

The Challenge  

The provided data contains:

  • Spans (line segments between poles) lacking proper alignment with poles.
  • Extraneous poles cluttering the dataset - with missing values.

Our objective is to generate two GeoJSON files with following specifications:

spans.geojson:

  • circuitID: Identifier for the circuit the span belongs to.
  • spanID: Unique identifier for the span.
  • phasingType: Phasing type of the span (single-phase, two-phase, three-phase).
  • pointA: Pole ID of the first pole in the span.
  • pointB: Pole ID of the second pole in the span.

poles.geojson:

  • poleID: Unique identifier for the pole.
  • heightInFt: Height of the pole in feet.

The Approach  

We’ll use Python geospatial libraries to accomplish the data wrangling. Here’s a breakdown of the key steps:

  • Matching Spans to Poles: We’ll employ geospatial techniques to determine which poles belong to each span based on their proximity. This establishes the pointA and pointB values for each span.
  • Property Mapping: We’ll meticulously map properties from the input files to the properties required by Overstory, ensuring all necessary fields are populated.
  • Missing Value Imputation: When data points are missing (e.g., pole height), we’ll strategically fill them in using the values from the nearest pole or span geographically. This ensures a complete dataset.

Libraries and Tools  

We’ll briefly utilise any open-source libraries that streamline the process. Some potential options include:

GeoPandas: For geospatial data manipulation. Shapely: For geometric object construction.

Deliverables  

The final product includes a python script that creates two geojson files with aligned lines and data columns.

Necessary scripts or infrastructure to execute the code (e.g., Makefile). The two generated GeoJSON files: spans.geojson and poles.geojson. Clear instructions or scripts for easy execution (ideally within five minutes).

Working with vector data

Import Libraries

import geopandas as gpd
from shapely.geometry import LineString, Point
import os
import pandas as pd

Find current working directory, create output file and export geojson files

#get current working directory
cwd = os.getcwd()

#create output directory if not exist
if not os.path.exists(f"{cwd}/output/"):
    os.makedirs(f"{cwd}/output/")

#read poles and lines layers
poles = gpd.read_file(f'{cwd}/data/input_data.gpkg', layer= 'poles')
lines = gpd.read_file(f'{cwd}/data/input_data.gpkg', layer= 'lines')

Iterage through eachline and snap it to the nearest point - Add Point A,B columns in the lines layer

###Stage 1 snap spans to nearest poles and add Point A Point B columns
print('Snapping lines')
for index, line in lines.iterrows():
    
    line_coords = list(line.geometry.coords)
    
    #find nearest pole for endpoint A
    end_point_1 = Point(line.geometry.coords[0]) #line endpoint A
    nearest_pole_id_1 = poles.distance(end_point_1).idxmin() #nearest pole id to point A
    nearest_pole_1 = poles.loc[nearest_pole_id_1] #nearest pole
    lines.at[index, 'Point A'] = nearest_pole_1['poleID']
    line_coords[0] = nearest_pole_1.geometry.coords[0]
    
    #find nearest pole for endpoint B
    end_point_2 = Point(line.geometry.coords[-1])
    nearest_pole_id_2 = poles.distance(end_point_2).idxmin() #nearest pole id to point A
    nearest_pole_2 = poles.loc[nearest_pole_id_2] #nearest pole
    lines.at[index, 'Point B'] = nearest_pole_2['poleID']
    line_coords[-1] = nearest_pole_2.geometry.coords[0]
    
    #update line geometry with snapped coordinates
    lines.at[index, "geometry"] = LineString(line_coords)

Identify missing values in pole layer and populate

###Stage 2 populate missing values from the nearest pole (only applicable for poles height)
#lines[lines.isna().any(axis=1)]

poles_nan = poles[poles['height'].isna()] #find nan rows

for index, pole in poles_nan.iterrows():
    
    distances = poles.distance(pole.geometry) #calculate distances to all other poles
    sorted_distances = distances[distances > 0].sort_values() #sort distances and filter out the pole itself
    
    #check the nearest non-NaN pole
    for nearest_pole_id in sorted_distances.index:
        nearest_pole = poles.loc[nearest_pole_id]
        if pd.notna(nearest_pole['height']):
            poles.at[index, "height"] = nearest_pole['height']#assign nearest pole height
            break  #stop once we've assigned the nearest valid height

Save output geoJSON files

#save output files
print("Writing output")

try:
    lines.to_file(f"{cwd}/output/spans.geojson")
    poles.to_file(f"{cwd}/output/poles.geojson")
except Exception as e:
    print(f"Error saving files: {e}")

Conclusion  

with this exercise, learner will gain valuable experience in data cleaning, geospatial analysis and data preparation for specific systems. This skillset proves beneficial in various data-driven domains - it proves data wrangling from raw format to business compatible data format.

 WaterpyBal: Open-Source Groundwater Modeling with Python
On this page
  • The Challenge
  • The Approach
    • Libraries and Tools
  • Deliverables
    • Conclusion
Follow us

Spatial Tech

   
Copyright © 2025 Spatial Tech All rights reserved. Powered by Hinode  .
Spatial Tech
Code copied to clipboard