"If I have seen further, it is by standing on the shoulders of ducks..."
--Isaac Newton (probably... I may have misremembered the quote...)
Why DuckDB?
In the first article in this series about Enso Analytics and DuckDB we looked at why those two pieces of technology worked and fitted so well together. But why did we want to add DuckDB to Enso Analytics at all? Why go to the effort of adding a whole new complex piece of technology to the product?
DuckDB has functionality and capabilities that previous to the last release Enso Analytics did not. And not only do we gain all of the functionality DuckDB has built so far, but we will gain all of the innovation and new features that DuckDB will add in the future. This is why this is such an accelerator for what Enso can do both today and tomorrow.
So in this article we will look at a high level what some of that functionality looks like.
Spatial
This is the big one and in many ways what started us on the journey to DuckDB. Our users have been asking us to add spatial capabilities to Enso for a while now (after all everything happens somewhere) and it was going to take us too long to build that functionality from scratch. Bringing in a piece of technology that has a rich set of existing functionality and is open source so we can build on top of it allows us to accelerate both ours and our users spatial journeys.
If you want to read more about the details I'd suggest reading some of the DuckDB documentation here https://duckdb.org/2023/04/28/spatial
Because Enso is a dual textual and visual language you don't have to write raw SQL to use the spatial capabilities: you can build your workflow using Enso's easy to use visual programming language.
More on what this looks like in a future post in the series dedicated to spatial.
File Formats
Another feature request we have had is reading and writing Parquet files. Well again with DuckDB we get this functionality for free. (and with some rather nice performance too!).
Compressed csv files? ✅
From the DuckDB website:
.csv.gz). DuckDB can decompress these files on the fly. In fact, this is typically faster than decompressing the files first and loading them due to reduced IO."Sounds good! And now is available in Enso too!
Fast Csv Reader
And while we are on file formats, the DuckDB csv reader is rather good too. It is less forgiving than the built in Enso csv reader, but for well formatted csv files it is very fast. Take for example my favourite csv file - the UK companies house file (https://download.companieshouse.gov.uk/en_output.html). This is a 2.7 GB csv file with 5.6 million rows. Prior to this release I would have said files this size belong in a database. Today I still do, but that database is DuckDB and it ships directly with Enso.
What Does Enso Bring to DuckDB?
Now all this is great, but you might be saying to yourself I can do all of this in DuckDB already. Why do I need Enso?
Well in the same way that DuckDB has features that Enso doesn't. Enso has features that DuckDB doesn't, so using the two pieces of technology together means you get to use *all* of that functionality, including:
- a rich visual programming environment with live updates showing the results of your changes as you make them
- easy interaction with web APIs: use Enso to query a web API and then combine the results of that with data in DuckDB
- version control and workflow sharing with your team. Easily keep track of and share your analytical pipeline with your team members.
And more! Check out what makes Enso Analytics unique at our website www.ensoanalytics.com


No comments:
Post a Comment