HomePostsApr 23, 2025

Personal Data Pipeline Update

👋 I'm curently looking for a new role. Hire me!

I started my journey with the Personal Data Pipeline project early last year and you can see in that post my initial motivations and how far I got with the concept. What I wanted out of getting this live was to get a sense of whether or not this idea would actually work and if it resonated with "the world" at large. I didn't get a 6-figure Patreon sponsorship or anything but I connected with a few like-minded folks, learned about some interesting pieces of technology out there, and got the attention of a company that offered me a job. All-in-all, I would call that a success!

Now, almost a year after that post went live, I'm thinking about this project again and what comes next. I'm still a big believer in being mindful about generating data in for other companies, I still want a local copy of as much of that data as I can, and I still want to figure out how to get the most of out that data combined with the all the notes that I take. The values I outlined in that post are the same and feel just as important as before:

It feels good that the reasons I originally started working on this are the same as before but those good intensions can't actually materialize something that works or exists. So I've been thinking more about what I would actually want out of a system like this. In no particular order:

It sort expands to infinity from there but it all centers around gathering, organizing, storing, and managing all of my data right here on the laptop where I'm writing this post in a local Markdown file. The common first step for of all these things is "get the data and store it in a useful format."

So, to that end, I'm going to focus on working with DuckDB under the hood to pull in the raw JSON and store it in a tabular format that's easy to work with and reason about. All that logic is currently represented in a recipe, like this one. But that method overloads the output step quite a bit and would require duplication across recipes when source data needs cleanup or when tables are joined. It also leaves out an opportunity to work with your data using raw queries once it's all stored in a table format.

With this middle step, we have something more like this:

Made with D2

A few notes on the above:

This feels like a good milestone for this project and a place where I can start thinking about what I really want to do with the data that I have.

Beyond that ... I definitely don't have all the answers and if I'm going to solve even a small part of this, I have to keep talking to people that care about this in their own lives and operate with similar values. To that end, I'm going to start working on this project in a more formalized way and writing more about what I'm thinking, what I've found, and where I'm going with this.

As always, please reach out if you have any questions, comments, or ideas!

< Take Action >

Suggest changes on GitHub ›

Comment via:

Email › GitHub ›

Subscribe via:
RSS › Twitter › GitHub ›

< Read More >

Tags
Software Engineering Personal Data Obsidian Open Source Portfolio
Older

Aug 07, 2024

My Values for Technical Leadership

My professional values as an engineer, architect, and technical leader.