Why Apache Iceberg is a Threat to Snowflake
And What Impact Could it Have on Salesforce (CRM) & Confluent (CFLT)
đ Apache Iceberg is in the headlines again after Databricksâs acquisition of Tabular (rumored to be ~$2bn). Apache Iceberg in my opinion is a big threat to $SNOW as it impacts what is likely to be up until now, one of $SNOWâs biggest strengths/bull case: Data Gravity. In my opinion, Iceberg essentially makes the game a compute/analytics gameâŚand this is where $SNOW is lacking, especially in faster growing use cases that Databricks serves.
What is Apache Iceberg? Like many Apache projects, Iceberg originally started out inside of $NFLX to help them with usage of their multi petabyte tables. Iceberg stores all metadata in files while other formats use a relational database/table format as the way to store metadata, which became a bottleneck when trying to scale. Iceberg helps customers save on both storage and compute.
â Why is Apache Iceberg Better?
1) Customers dont need to copy data to specific places for different compute engines. You can use Flink or Sparq or SNOW and all of them can access Iceberg in one place. This also means you donât need to pay extra storage costs for each of these vendors/use cases.
2) Given the way Iceberg stores your data, it has advanced filtering features that help you fine tune your data files, so data that isnât relevant to your SQL query will be avoided, resulting in a noticeable performance boost.
3) Schema Flexibility is crucial and Iceberg allows customers to change fields without disrupting existing queries.
âłď¸What is the Impact to $SNOW?
$SNOW CFO laid out the bear case perfectly at the Barclays TMT Conference a while back: âCustomers all want Iceberg, they all want to be able to have their own dataâŚyou can understand why, so they donât have to pay for the cost of storage twiceâŚ.Nobody wants to get locked in on their data. Everyone wants to have their data in open file formats, that itâs easy to move the data out and inâŚ.makes it cheaper to run your queries on that.â
The two biggest impacts are that customers will:
1) no longer store data on SNOW, which impacts storage revenue, but more importantly results in SNOW losing any sort of data gravity which helps with retention.
2) Makes queries more performant given the medidata architecture mentioned above, which also means less consumption/computation, meaning less SNOW revenues.
đ Final Thoughts: Impacts to $CRM & $CFLT
$CRM: Will be interesting to see if $CRM vision for Data Cloud plays out successfully, as one of itâs key pitches is that most of the data in a Datawarehouse is CRM data, so why pay $SNOW for double storage costs if CRM already houses it? Wonder if Iceberg vs CRM becomes a debate in the future or if this simply just validates CRMâs proposition.
$CFLT: Continue to be perplexed on where $CFLT stands longer term with all these shifts. In theory this should help Flink ease of use, as Flink queries on top of Iceberg are more performant and easier to scale. But I also worry that $CFLT has neither a real storage layer (Kafka bad for that) and still needs to commercialize Flink vs Databricks Sparq (not an easy task). $CFLT was rumored to be in the mix for this acquisition, so they must have seen something they were missing.