In the weeks since my previous post on Working with Arrow and DuckDB in Rust,
I've found a few gripes that I'd like to address.
Memory usage of query_arrow and stream_arrow
In the previous post, I use...
How (and why) to work with Arrow and DuckDB in Rust
My day job involves wrangling a lot of data very fast.
I've heard a lot of people raving about several technologies like DuckDB,
(Geo)Parquet, and Apache Arrow recently.
But despite being an "ear...
Over the past two weeks, I've been focused on optimizing some data pipelines.
I inherited some old ones which seemed especially slow,
and I finally hit a limit where an overhaul made sense.
The pipeli...
Today I had a rather peculiar need to search through features from TIGER
matching specific attributes.
These files are not CSV or JSON, but rather ESRI Shapefiles.
Shapefiles are a binary format which...
I've lived in South Korea for quite some time,
and during my stay here I've become reasonably fluent in the language.
People often ask how long it took to become fluent
and if I have any tips for thei...
I just listened to a fantastic Two's Complement podcast episode
(transcript)
in which Matt and Ben discussed a data structure I'd never heard of before:
the sequence lock.
It is not very well known,
b...
Copying and Unarchiving From a Server Without a Temp File
Sometimes I want to copy files from a remote machine--usually a server I control.
Easy; just use scp, right?
Well, today I had a subtly different twist to the usual problem.
I needed to transfer a ~10...
Yesterday I had the pleasure of attending KWDC 24,
an Apple developer conference modeled after WWDC,
but for the Korean market.
Regrettably, I only heard about it a few days prior
through a friend at ...