The Case for Open Data Formats: Parquet and Iceberg

By Mick Hittesdorf, Senior Cloud Architect at OneMarketData

Over the last year, through my interactions with OneTick customers seeking to migrate their data platforms to the Cloud, a recurring theme has emerged- the importance of open, scalable, future‑proof data formats.

Every day, the world's financial exchanges publish hundreds of billions of messages. To collect, process, and store this immense volume of market data, market participants require data formats that are fast, compact, and interoperable.

Why Interoperability Matters

Legacy siloed formats and proprietary systems lock data away inside a single vendor’s database, creating technical debt, hindering data access and collaboration, and adding to storage costs when data must be copied from one data store to another.

Where I see firms struggle the most is when migrating or combining data sources from multiple sources, especially when different teams need or prefer a specific technology stack or tool chain. The solution? Open data formats like Parquet and Iceberg.

Adoption of open data formats avoids vendor lock‑in, reduces friction in cross‑team workflows, and future-proofs your strategy.

Parquet: Columnar Analytics Done Right

Apache Parquet is already a cornerstone of modern data engineering so I won’t spend too much time talking about it. It should be your go-to for analytic data sets, especially for write-once-read-many (WORM) use cases, where data is rarely (if ever) updated once persisted, and interoperability and portability across tech stacks and Cloud storage tiers is paramount.

Parquet’s columnar storage optimizes for analytics-heavy workloads, which is great for read‑intensive tasks like VWAP, time‑window aggregations, or cross‑instrument backtesting at scale. When it comes to efficiency, Parquet’s standard compression and encoding schemes shrink storage and speed up queries.

These technical strengths of Parquet are compelling, but what’s most important, in my opinion, is that Parquet enables interoperability and data integration in the modern data ecosystem: Spark, Python/Pandas, Trino, and the OneTick tick analytics platform all speak Parquet fluently.

And this is why I argue that Parquet is the backbone for open, scalable data strategies, as discussed with colleagues from FMADIO, Databento, and The Open Markets Initiative at this year’s STAC Summit.

Apache Iceberg: Adding transactional behavior to Parquet

That said, for use cases that go beyond WORM, especially where change data capture (CDC) strategies are required, Parquet is insufficient. Making any kind of change to Parquet tables requires re-writing data, which can be a lengthy, expensive, and compute-intensive process.

Enter the new generation of open AND transactional data formats, such as Delta Lake and Apache Iceberg.

Iceberg has been getting a lot of attention lately, especially given recent announcements regarding the unification of Delta Lake and Iceberg via the latest Iceberg v3 spec.

Apache Iceberg fundamentally is a transactional and metadata management layer that sits on top of standard Parquet files. Iceberg addresses the limitations of Parquet by adding safe UPDATE/UPSERT, and DELETE operations to Parquet, making it suitable for a full range of analytics and operational use cases.

Key Benefits of Open Data Formats

With open data formats, such as Parquet and Iceberg, developers, data engineers, and quants benefit from:

Portability: Data lakes built on open formats can be easily consumed by the OneTick time series database, as well as open source tools like Pandas, Cloud data warehouses like Databricks, Snowflake, and BigQuery, and AI/ML tools like TensorFlow, PyTorch, and Ray.
Governance: Tables stored in central data catalogs (e.g. Iceberg REST catalog) enforce consistent structure and semantics, enable discovery, and help with version tracking—no surprise schema drift.
Longevity: If you choose Parquet or Iceberg today, your data remains queryable in the future, regardless of vendor or toolchain.

Final Thoughts

The firms that will succeed are those that invest in high quality data and modern data infrastructure first. You can't build a stable house on a shaky foundation, and you can't create intelligent data solutions on fragmented, siloed data of questionable data quality.

Likewise, with an antiquated market data platform, managing your market data and extracting maximum value from your data will be increasingly challenging. You need a modern market data platform that embraces open data formats, Cloud native architecture, and support for the unique challenges of real-time and historical time-series analytics.

At OneMarketData, we have made Parquet (and soon Apache Iceberg) a native data format for OneTick archives. We’re also in the midst of converting all the data in OneTick Cloud to Parquet to make it easier to share data with our customers.

Whether you’re using tick data to perform research and analysis, backtesting, or compliance, in your Cloud or ours, open data formats should be a cornerstone of your modern data platform strategy.

Want to learn more?

Is your firm still managing multiple proprietary data siloes? Are you tired of importing, exporting and wrangling data rather than focusing on alpha, compliance, or insight? If so, it’s time to ask: what are we locking ourselves into? Open formats offer a clear path forward.

Let’s talk about how to break down silos, take control of your own data, and set yourself up for long‑term success. Schedule a OneTick demo today.

— Mick Hittesdorf

OneTick Cloud Product Architect at OneMarketData | OneTick

OneTick Blog

5 min read

The Case for Open Data Formats: Parquet and Iceberg

Why Interoperability Matters

Parquet: Columnar Analytics Done Right

Apache Iceberg: Adding transactional behavior to Parquet

Key Benefits of Open Data Formats

Final Thoughts

Want to learn more?

Written by Mick Hittesdorf

Previous Post

Let's Get Serious About AI in Capital Markets

Next Post

Coding Assistant for onetick-py

Post a Comment

Featured

Our Locations

OneTick Blog

5 min read

The Case for Open Data Formats: Parquet and Iceberg

Why Interoperability Matters

Parquet: Columnar Analytics Done Right

Apache Iceberg: Adding transactional behavior to Parquet

Key Benefits of Open Data Formats

Final Thoughts

Want to learn more?

Written by Mick Hittesdorf

Previous Post

Let's Get Serious About AI in Capital Markets

Next Post

Coding Assistant for onetick-py

Post a Comment

Featured

Sign up for updates

Our Locations