By Armen Avetisyan, Senior Software Engineer at OneTick
onetick-py is a high-performance Python library for tick data processing, built on OneTick’s powerful analytics engine. It features a Pandas-like API, enabling operations on tick data using native Python expressions, built-ins, and vectorized computations. Under the hood, it translates Python operations into OneTick query language, executing on a specialized tick server optimized for low-latency and high-throughput analytics. With access to data from 200+ exchanges from our on-demand market data service, it supports advanced time-series analysis, aggregation, filtering, and event-driven computations at scale.
onetick-py is used internally at OneMarketData and is also distributed to select clients alongside our products. Due to the limited number of users writing onetick-py code, developers primarily rely on documentation and direct communication via Slack or other channels, creating a bottleneck when debugging errors occur—often requiring extra time to navigate documentation or wait for responses.
While its interface resembles Pandas, effectively writing onetick-py queries requires a deep understanding of tick data and the overall OneTick architecture to ensure well-structured and optimized code. The complexities of analyzing time-series data across multiple symbols further add to the learning curve, making query writing challenging for developers new to onetick-py.
So Why Do we need coding assistance?
Unlike general-purpose programming languages, domain-specific languages (DSLs) like onetick-py have specialized syntax, semantics, and execution models tailored to specific use cases. However, this specialization also introduces challenges when leveraging large language models (LLMs) for code generation.
Major LLM vendors, such as OpenAI, do not include onetick-py in their training datasets. As a result, out-of-the-box models lack an understanding of its unique constructs and execution behavior. Consequently, attempts to generate onetick-py code using generic LLMs often produce incorrect or inefficient queries that fail to fully utilize OneTick’s capabilities.
To address this limitation, DSLs like onetick-py must develop their own knowledge bases. This involves integrating domain-specific documentation, examples, and best practices to ensure that generated code is accurate, optimized, and aligned with OneTick’s query language.
how the coding assistant works
To streamline development and support both internal teams and clients working with onetick-py, an AI-powered Coding Assistant was introduced. The Coding Assistant provides a web-based interface where users can ask coding questions and receive onetick-py specific code snippets. By leveraging generative AI, it helps developers write, debug, and optimize queries more efficiently, reducing reliance on manual documentation and peer assistance.
The Coding Assistant is particularly valuable for onboarding new employees responsible for writing onetick-py code, accelerating their learning process. Additionally, it significantly enhances the experience for clients using onetick-py, making it easier for them to develop well-structured queries without deep prior expertise.
The Coding Assistant is built on Langchain / LangGraph, a framework that enables seamless integration with various language models. This architecture provides flexibility in selecting and fine-tuning models, allowing the assistant to adapt to different use cases. LangGraph excels at constructing workflows that can revisit and reuse specific components, enhancing efficiency and adaptability.
Langfuse is utilized for tracing queries and their results, ensuring transparency and insight into the assistant’s operations.
Leveraging generative AI models from OpenAI, the assistant translates natural language prompts into functional onetick-py scripts. This empowers developers to generate code efficiently, reducing manual effort and accelerating the development process.
System Architecture
The overall system architecture is represented in the following diagram:
Index builder
The Index Builder retrieves and updates documentation from multiple sources, including:
- onetick-py documentation webpage
- Slack
- Jira
- Confluence
It uses an embedding model to generate data embeddings, storing them in a pgvector vector database. This component is responsible for continuously gathering and updating documentation on a weekly basis.
Workflow of the coding assistant
The Coding Assistant’s workflow consists of several key steps:
- User Interaction via Web Interface
- The Streamlit web application serves as the primary user interface.
- Users enter their queries through the chat input.
- Retrieval and Context Generation
- The retriever processes the user’s question and, using embeddings, identifies the most relevant documentation.
- The most relevant documents are retrieved and combined with system prompts that provide a detailed explanation of onetick-py.
- Few-shot learning is applied, supplying example onetick-py code snippets.
- Code Generation
- All relevant information is passed to the LLM for evaluation.
- The model generates onetick-py code as an output.
- Code Evaluation and Execution
- The generated code is evaluated using pylint to detect syntax errors.
- The code is executed in an isolated environment to identify runtime errors.
- Error Handling and Reflection Mechanism
- If no errors are found, the generated code is returned to the user via the web UI.
- If errors are detected, the assistant engages its reflection mechanism:
- The generated code, error messages, and original prompts are sent to the regeneration step [3].
- This iterative process repeats up to two more times if errors are detected.
We are currently experimenting with integrating our Coding Assistant into Jupyter AI to enable seamless functionality within the Jupyter Lab environment.
Evaluating the coding assistant
Evaluation tests are conducted to assess the performance of models and prompts, ensuring the reliability and accuracy of code generation. This process helps determine how changes in documentation, prompts, or models impact the overall functionality of the Coding Assistant. By systematically evaluating these factors, we can transition between models while measuring their effect on code generation quality.
We run evaluation tests by comparing the generated code output against the expected result. If a test fails, a tracking mechanism in Langfuse logs the outcome as True (successful) or False (failed). In case of failure, the system also records the error message for further analysis.
Based on test results, prompts can be refined and the evaluation test suite can be re-run to continuously improve the Coding Assistant’s performance.
Conclusion
The onetick-py Coding Assistant represents a significant advancement in simplifying and accelerating the development of onetick-py queries. By leveraging generative AI models, LangGraph for workflow optimization, and Langfuse for tracing, the assistant provides an intuitive interface for developers to generate code efficiently.
With its ability to retrieve relevant documentation, apply few-shot learning, and iterate on code corrections using a reflection mechanism, the assistant significantly reduces the learning curve for new developers and enhances productivity for experienced users. Its integration into Jupyter AI further extends its usability, making onetick-py development more seamless and accessible.
Through continuous evaluation and refinement of models and prompts, the Coding Assistant is designed to evolve, ensuring that it remains a reliable tool for both internal teams and clients. As we further enhance its capabilities, it will continue to bridge the gap between natural language queries and optimized onetick-py code, streamlining the development process and improving overall efficiency.
Want to try it out for yourself? Set up a meeting with the OneTick team today to guide you through the process.
— Armen Avetisyan
OneTick Senior Software Engineer at OneTick