What is Amazon Athena?

The answers companies need from their data can sometimes be elusive. We live in an age where data is in great abundance, especially with the expansion into cloud storage. But the tools to analyze and process that data are not always easy to use, overly accessible, or even that effective. The problem? Data has to reside somewhere, and most companies have to think about how it is stored, who will access it, how to make that secure, and most importantly how to make data access reliable and fast.

That’s where Amazon Athena can help. It’s a query service in that companies are able to run SQL queries against their data as though it resides in a local data center. It’s serverless in that you don’t have to manage the infrastructure at all or use database software to manage it. And, it’s extremely fast. Your staff can run SQL queries and expect results even on large datasets in a matter of seconds.

To use Amazon Athena, the data is first housed on Amazon S3 (Simple Storage Service), which is an object storage service that runs in the cloud. Amazon S3 is what makes the data accessible and safe to use, while Amazon Athena is the query service that provides the power to derive the results you need from the data. This means you don't need to concern yourself about designing databases.

One way to think about Athena? It’s somewhat similar to a Google search. You know the data is out there, but it’s often hard to find the data sets you actually need. A query is similar to a Google search in that you create the parameters for the SQL query you need to perform. The difference here is that you're using cloud computing services instead of a search engine.

This is not something that requires setup or configuration, which is typically the case with a local data store and can involve an ETL (Extract, Transform, Load) which prepares data in a database for a query by isolating the dataset. Instead, your query can run without using ETL and therefore simplifies the process — you run the query from an easy-to-use web console. You point to your data in S3, configure the schema, and start the query.

One example of how this might work involves a retail company that sells a large number of products with thousands or even million os SKUs (stock-keeping units). A company might want to know if there are SKUs that should be retired. Normally, this might require preparing a complex ETL to configure and prep the data for SQL queries. Because of how the object storage works within Amazon S3 and because of the integration without other Amazon Web Services (such as AWS Glue Data Catalog), the queries work without any prep.

This means companies can run a point-of-sale transactional query like the one related to retired SKUs or perform other queries faster and with better results.

Benefits of Amazon Athena

As with most Amazon Web Services, the major benefit to using Amazon Athena is that it provides great flexibility in how you run queries without the added complexity. One example of this is with a pharmaceutical company using the cloud for genomic research. Your staff might decide to run multiple queries against the data set, but normally each one requires setup and configuration to create a cloud database that can accept the queries. With Athena, the staff can run multiple concurrent queries all at the same time but trust the results will be clean and accessible within seconds. These actionable results from queries will mean that companies have access to clean, reliable data to make better decisions and continue their research.

Another advantage to Athena related to this is a lower cost. Companies don’t have to manage the footprint required for the datasets, so if they do run multiple queries or need to make decisions related to a vast treasure trove of data, they don’t have to first improve the IT infrastructure or configure their data storage to handle the higher number of requests. Athena expands and retracts performance variables as needed for the queries at hand.

As mentioned earlier, Athena is flexible enough to handle a variety of tasks related to database queries. It runs standard SQL and supports standard data formats such as CSV, JSON, ORC, Avro, and Parquet. Athena uses Presto — an open-source SQL query engine — with ANSI SQL support, so it is not a proprietary query service users will have to learn from the ground up. Athena lets you run quick SQL queries but also supports more complex joins and arrays.

In the end, the power comes into play with Amazon Athena because it runs within Amazon S3, so all of the benefits of that object storage platform for your database carry over to Athena in terms of reducing complexity, providing the endpoint security and performance needed, and allowing companies to run multiple queries without having to manage or configure the infrastructure. Companies can focus more on the actual queries and results, not on the platform itself.

No comments yet.

Leave a Reply

in development