Data Engineer

Remote
Full Time
Mid Level

Data Engineer

This is a demanding role on a small, high-leverage team. You'll be one of a handful of people responsible for the data behind IPinfo's location and context products, working on ambiguous problems with messy inputs and owning your pipelines end to end - including understanding every line you ship.

What you'll do

  • Make sense of large, unfamiliar datasets sourced from publicly-contributed (and therefore inconsistent) datasets like OpenStreetMap and Overture, as well as error-prone device datasets with sometimes dozens of poorly-documented columns.  Your job is to wade through these datasets, figure out what is going on, and extract a meaningful signal.
  • Maintain and extend BigQuery data pipelines, writing efficient, transparent code that achieves complex data tasks while avoiding bloat and spaghetti.
  • Work with particular expertise on Geospatial data, knowing the suite of BigQuery geospatial tools like the back of your hand, while dealing with the particular headaches and challenges that geospatial data poses.  Occasionally working in python as well.
  • Use AI tooling to move quickly while fully owning every line in your PRs.
  • Communicate problems and solutions clearly using our internal issue-tracking platform; writing concise, reproducible records of the problem, the proposed solutions, and why you made the calls you did, so others can follow and build on them.
  • Work occasionally on web-based dashboards to provide visibility to our data pipelines for data engineers as well as others at the company.

What we're looking for

Must have

  • Advanced SQL - window functions, CTEs, query restructuring for performance, and an understanding of why a query is slow and how to fix it. BigQuery is a strong plus.
  • Strong communication skills - you know how to talk and write about complex problems and data pipelines productively.
  • A track record of turning messy, ambiguous data into reliable, interpretable signals, with the judgment to explain your calls.
  • An internet record of significant experience as a data scientist or engineer, on Github, StackOverflow, in the academic literature or on a personal blog, or strong references to back up a track record on proprietary code bases.
  • Clean-code discipline: you don't ship code without tests, code review, readable abstractions.  You prefer subtractive solutions to additive solutions.
  • Fast learning - comfort becoming productive in unfamiliar domains (internet measurement, geospatial reasoning, internal tooling) with little hand-holding.
  • AI-assisted development paired with full ownership - you can read, debug, and defend everything the tools produce.
  • Geospatial fundamentals: coordinate systems, spatial joins, containment, polygon operations.

Nice to have

  • Cloud tooling and workflow orchestration (CI/CD, Docker, Airflow, etc.).
  • JavaScript and web dashboards (e.g. Retool, Mapbox, internal validation and visualization tooling).
  • Exposure to the science of internet measurement: BGP/ASN, rDNS, RTT-based geolocation, CGNAT, mobile vs. fixed-line IP behavior, geofeeds.
  • Strong Python for geospatial data work - comfortable with the data and geospatial stack (pandas, geopandas, shapely) and writing code that holds up in production, not just in a notebook.
Share

Apply for this position

Required*
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*