By Mick Hittesdorf, OneTick Senior Cloud Architect
Since my last post on open-source data formats, I have continued to hear stories from OneTick customers and prospects alike about the increasing demand for open, scalable, future‑proof data formats.
As leading financial institutions seek to modernize their data platforms, it has become clear: to collect, process, and store immense volumes of tick data, market participants require data formats that are fast, compact, and interoperable.
With open data formats, such as Parquet and Iceberg, developers, data engineers, and quants benefit from:
Legacy siloed formats and proprietary systems lock data away inside a single vendor’s database, creating technical debt, hindering data access and collaboration, and adding to storage costs when data must be copied from one data store to another.
Where I see firms struggle the most is when migrating or combining data sources from multiple sources, especially when different teams need or prefer a specific technology stack or tool chain. Bespoke ETL pipelines are expensive to develop, operate and maintain.
The solution? Store your data in open table formats like Parquet and Iceberg once and query it with your tool of choice - with the best tool for the job. A quant can use OneTick for time series analysis, while an operations analyst can use Databricks or Snowflake to query the same physical data for daily reporting or business intelligence dashboards.
Adoption of open data formats avoids vendor lock‑in, reduces friction in cross‑team workflows, and future-proofs your strategy.
Apache Parquet is already a cornerstone of modern data engineering so I won’t spend too much time talking about it. It should be your go-to for analytic data sets, especially for write-once-read-many (WORM) use cases, where data is rarely (if ever) updated once persisted, and interoperability and portability across tech stacks and Cloud storage tiers is paramount.
Parquet’s columnar storage optimizes for analytics-heavy workloads, which is great for read‑intensive tasks like VWAP, time‑window aggregations, or cross‑instrument backtesting at scale. When it comes to efficiency, Parquet’s standard compression and encoding schemes shrink storage and speed up queries.
These technical strengths of Parquet are compelling, but what’s most important, in my opinion, is that Parquet enables interoperability and data integration in the modern data ecosystem: Spark, Python/Pandas, Trino, and the OneTick tick analytics platform all speak Parquet fluently.
And this is why I argue that Parquet is the backbone for open, scalable data strategies, as discussed with colleagues from FMADIO, Databento, and The Open Markets Initiative at the 2025 STAC Summit in Chicago.
That said, for use cases that go beyond WORM, especially where change data capture (CDC) strategies are required, Parquet is insufficient. Making any kind of change to Parquet tables requires re-writing data, which can be a lengthy, expensive, and compute-intensive process.
Enter the new generation of open AND transactional data formats, such as Delta Lake and Apache Iceberg.
Apache Iceberg fundamentally is a transactional and metadata management layer that sits on top of standard Parquet files. Iceberg addresses the limitations of Parquet by adding safe UPDATE/UPSERT, and DELETE operations to Parquet, making it suitable for a full range of analytics and operational use cases.
The firms that will succeed are those that invest in high quality data and modern data infrastructure first. You can't build a stable house on a shaky foundation, and you can't create intelligent data solutions on fragmented, siloed data of questionable data quality.
Likewise, with an antiquated market data platform, managing your market data and extracting maximum value from your data will be increasingly challenging. You need a modern market data platform that embraces open data formats, Cloud native architecture, and support for the unique challenges of real-time and historical time-series analytics.
At OneTick, we have made Parquet (and soon Apache Iceberg) a native data format for OneTick archives. We’re also in the midst of converting all the data in OneTick Cloud to Parquet to make it easier to share data with our customers.
Whether you’re using tick data to perform research and analysis, backtesting, or compliance, in your Cloud or ours, open data formats should be a cornerstone of your modern data platform strategy.
Is your firm still managing multiple proprietary data siloes? Are you tired of importing, exporting and wrangling data rather than focusing on alpha, compliance, or insight? If so, it’s time to ask: what are we locking ourselves into? Open formats offer a clear path forward.
Let’s talk about how to break down silos, take control of your own data, and set yourself up for long‑term success. Schedule a OneTick demo today.
— Mick Hittesdorf
OneTick Senior Cloud Architect