Trip.com is one of the world's largest online travel agencies with over 400 million users worldwide. Artnova is trip.com's reporting platform, supporting all business units such as hotels, flights, corporate travel, vacations, marketing, train tickets, etc.
Trip.com's Artnova platform initially used Apache Hive as its data lake with Trino as its query engine. However, due to the vast volume of data involved coupled with the need for low latency and high concurrency, Trino could not meet many of their critical use cases. To address this, Trip.com had to replicate and transfer their data into StarRocks which was serving as their high-performance data warehouse.
This approach, while solving performance issues, unfortunately introduced additional problems:
- Data freshness was still undermined despite StarRocks' ingestion being relatively fast. This affected the flexibility and timeliness of queries.
- Maintaining additional ingestion tasks and designing table schemas and indexes introduced complexity to the entire data pipeline.
Duplicating data to a proprietary data warehouse was complex and expensive, and trip.com was only able to move their most business- critical workloads to StarRocks. This was not acceptable and an architectural overhaul was necessary.