Skip to content
Onto Something

When we started building Halcyon a couple of years ago, the vision was simple: aggregate energy regulatory data and make it accessible. From the very beginning – and still today –  practitioners across sectors have told us that navigating the many sources and repositories is, to put it frankly, pretty much a nightmare. To this day, I've not had a single person say, "Oh, don't do that, it's really not a problem.” This gave us a lot of encouragement and strength – the type that comes from knowing you are on a worthy mission. That feeling that you are onto something

It is fair to say we have learned a lot as we've worked towards this vision. Building and maintaining crawlers such that we can reliably ingest more than 2k documents per day was a big lift. Reliably ingesting not just the volume, but the scope of 50 states and a handful of interstate and federal entities was another milestone, and it allowed us to start building repeated data subscriptions. Today, we now provide monthly views of the US gas power plant market, battery energy storage, large load tariffs, and utility cost of capital for dozens of customers (with more to come next year). Along the way, we built an alerting service to address a very common use case: "I just need a simple way to track what is going on.” We also developed a novel format for viewing a docket, standardized across regulatory bodies - we call it the Docket Profile Page in a nod to the familiar structure of social media.

All the while, our minds kept coming back to another use case – Search. An early customer quote stuck with me, "You will never find what you're looking for if you don't know exactly where to look." That seemed crazy to me. For our very first customer demo, we built a simple prototype that allowed a chat-type interaction with Pacific Northwest utility Avista's 2023 Integrated Resource Plan and he told us, "I feel like I am talking to the author." 

Our first iteration of the product, released in April of 2024, landed with a thud. Part of the issue was that we had only collected data from 6 states at the time, and it was hard for a user to know what we did and did not have. But the bigger issue was the UI–it was a blank text box with little guidance on what to ask or where to start. So, we decided to walk a mile in our customers’ shoes. For the rest of 2024 and the first half of 2025, we ran queries on their behalf. Our team of data scientists took on that work directly –  hearing firsthand from customers where we were wrong, and feeding that information back to the engineers, explaining what they needed from the product. In the next 9 months, through July of 2025, we completed more than 300 research projects for customers. With each one, we got better and better. And with each improvement came more of that familiar feeling of being onto something.  

Here is the key lesson we learned from this process: how to search for the right materials, and only after finding those materials, how to interrogate. Internally, our teams would start with keyword searches and filters to find the materials they wanted to interrogate, and only then would they run a query (semantic search) that leveraged vector search internally and an LLM on the backend. This two-step process was a bit more complicated, but it worked extraordinarily well. Our confidence grew with each turn of the cranks. Onto something more. 

Turns out making this work in the product (as opposed to internal tooling) is quite challenging. We now have more than 5M documents and counting. We've broken this material into more than a billion searchable text snippets, making it possible to surface exactly the documents you’re looking for almost instantly. To do that, we needed a system that could answer the question: find me all the documents with the keyword 'Meta', ‘data center’, or both, in a fraction of a second. We completely built our search infrastructure around Vespa, which is a modern miracle of HPC (high-performance computing), and cranked up our Kubernetes instance in Google Cloud. With some time, effort, and iteration, it worked beautifully. We started to feel like we really were onto something! 

Six weeks ago, we began testing the new system, which internally we call QFS (queries from search). The name connotes the idea that it's a 2-step process: first, search and find the source material you care about; second, query that material to surface exactly what you're looking for. With lots of feedback from our beta testers, we have been iterating–some wanted to filter on "any of" and others wanted "all of" the keywords. We settled on both. 

After two years of work, we are proud of the capability we’ve built. You can see it at work in our data subscriptions, which are essentially an expression of this capability in a structured format. But you can also experience it for yourself. We’re giving meaningful and, more importantly, useful answers to hundreds of queries a week. It’s open, it’s easy, and it’s already becoming part of workflows. If you’ve not yet given it all a try, we invite you to! https://app.halcyon.io/search

As I wrap up Halcyon’s last post of the year, I’ll say it one last time: we are onto something.  As we head into 2026, we will be on to more as well.  Thank you to all those already on the journey with us, and an advance thanks to those who will join.  Wishing all a happy and restful end of 2025.

Bruce