Securing Amazon Athena: IAM, Lake Formation, and Encryption

Securing Amazon Athena: IAM, Lake Formation, and Encryption#

When you run a query in Amazon Athena, more is happening behind the scenes than meets the eye. Data flows across multiple AWS services—from S3 buckets holding your data, through the Glue Catalog for metadata, into Athena’s query engine, and back out to a results location. At each step, security decisions matter. Getting them wrong means exposing sensitive data, losing audit trails, or accidentally granting a junior developer access to production analytics that should be off-limits. Getting them right means your team can move fast while sleeping soundly at night.

This article walks you through the complete security picture for Amazon Athena: the IAM permissions you actually need, the encryption options that protect data at rest and in transit, and the fine-grained access controls offered by AWS Lake Formation. We’ll move beyond theoretical concepts and dig into practical scenarios you’ll encounter as you build analytics pipelines and share data across teams.

Understanding the Athena Security Perimeter#

Before we discuss permissions, let’s map out what Athena touches. When you execute a query, several things happen in sequence. First, Athena needs permission to read the query definition itself and start the execution. Then it needs to read table metadata from the Glue Data Catalog. Next, it reads the actual data files from S3—often from multiple locations. Finally, it writes the query results to an S3 result location, typically something like s3://my-results-bucket/athena-results/.

Each of these operations requires explicit IAM permissions. Moreover, data can be encrypted at multiple points: in the S3 buckets where source data lives, in the result location, and even within Athena’s query execution process. On top of that, Lake Formation can layer additional access controls that work in conjunction with IAM, creating a defense-in-depth approach to data governance.

Understanding this architecture is crucial because a single missing permission can silently break a query, while overly broad permissions can accidentally expose data you meant to keep private.

The Foundation: IAM Permissions for Athena Queries#

Let’s start with the most basic question: what IAM permissions does a principal need to run a query in Athena?

The primary permission is athena:StartQueryExecution. This allows a user or role to initiate a query execution. However, this permission alone isn’t sufficient—it’s just the entry point. Think of it as having the keys to the front door, but you still need access to every room inside the house.

Once a query starts, Athena must read metadata about your tables from the Glue Data Catalog. This requires the glue:GetDatabase and glue:GetTable permissions. If you’re creating tables dynamically or modifying them, you’d also need glue:CreateTable, glue:UpdateTable, and related permissions, but for read-only query execution, these two are essential. The Glue Catalog tells Athena the schema, location, and serialization format of your data—without access to this metadata, Athena can’t even understand what you’re querying.

Next comes the actual data. Athena needs s3:GetObject permissions on the S3 paths where your source data lives. This is where many security implementations stumble. You might grant someone permission to run Athena queries broadly, but if the underlying S3 buckets use restrictive policies, those queries will fail partway through or return incomplete results. The S3 path matters too—you can be precise about which prefixes are queryable by specifying them in the resource ARN.

Finally, Athena needs to write the query results somewhere. This requires s3:PutObject permission on the results bucket. By default, Athena writes a .csv or .parquet file (depending on your output format) plus metadata and query logs. All of these go to the location you specify in the workgroup’s result configuration.

Let me show you a practical example. Here’s a minimal IAM policy for a user who needs to run queries against a specific S3 dataset:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "athena:StartQueryExecution",
        "athena:GetQueryExecution",
        "athena:GetQueryResults",
        "athena:StopQueryExecution"
      ],
      "Resource": "arn:aws:athena:us-east-1:123456789012:workgroup/primary"
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetTable"
      ],
      "Resource": [
        "arn:aws:glue:us-east-1:123456789012:catalog",
        "arn:aws:glue:us-east-1:123456789012:database/my_database",
        "arn:aws:glue:us-east-1:123456789012:table/my_database/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-data-bucket/analytics/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-results-bucket/athena-results/*"
    }
  ]
}

Notice a few things here. First, the athena:StartQueryExecution action is scoped to a specific workgroup—we’ll talk more about workgroups shortly, but this is important for isolation. Second, the glue:GetTable permission is scoped to the specific database and tables, not all tables in the catalog. Third, the S3 permissions use path-level granularity. You’re not granting access to the entire bucket, just to the specific prefixes where data lives or results should land.

You might also notice I included athena:GetQueryExecution and athena:GetQueryResults. These let the user retrieve the status and results of their queries after they finish executing. Without these, starting a query would feel like shouting into the void—you wouldn’t be able to see what happened.

Workgroups as Security Boundaries#

Amazon Athena workgroups deserve special attention because they’re one of your strongest tools for isolating teams and enforcing consistent security policies. A workgroup is a logical grouping of Athena resources—queries, result locations, and configuration settings. You might have a finance-analytics workgroup, a marketing-analytics workgroup, and a product-analytics workgroup, each with its own result location, encryption settings, and IAM policies.

Workgroups serve as a scope for IAM permissions. Rather than granting someone blanket access to run queries across your entire Athena installation, you grant them access to specific workgroups. This means a data analyst in the marketing team can run queries only in the marketing workgroup, and any queries they execute automatically write results to the marketing results bucket.

Each workgroup also has its own IAM policy that acts as an additional filter. Let’s say your organization grants a developer the IAM permission to run queries in any workgroup. But the marketing workgroup has an attached policy that explicitly denies access to certain S3 prefixes where sensitive financial data lives. The developer still won’t be able to query that data, because the workgroup policy acts as an additional gate.

Here’s an example of a workgroup-level policy that restricts which S3 buckets queries can access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::marketing-data/*",
        "arn:aws:s3:::marketing-results/*"
      ]
    },
    {
      "Effect": "Deny",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::financial-data/*"
    }
  ]
}

This policy allows queries to read from the marketing data bucket and write to the marketing results bucket, but explicitly denies access to the financial data bucket, no matter what the user’s principal IAM policy says. This two-layer approach—principal policy plus workgroup policy—gives you control at multiple levels and makes it harder to accidentally over-grant permissions.

Encrypting Query Results#

The data sitting in your S3 result bucket needs protection just like any other sensitive data. Athena supports three encryption options for query results: SSE-S3 (server-side encryption with S3-managed keys), SSE-KMS (server-side encryption with AWS Key Management Service), and CSE-KMS (client-side encryption with KMS).

SSE-S3 is the simplest option. S3 automatically encrypts your results when you write them, and automatically decrypts them when you read them. You don’t manage encryption keys—AWS handles it for you. This provides encryption at rest, protecting data if someone gains physical access to AWS infrastructure or if a disk is decommissioned. However, SSE-S3 doesn’t give you fine-grained control over who can decrypt results; anyone with S3 read permissions can access the data.

SSE-KMS gives you more control. You specify a KMS key, and S3 uses that key to encrypt results. Now, in addition to S3 permissions, a user needs KMS permissions to decrypt results. Specifically, they need kms:Decrypt permission with the key ARN. This creates a layered security model: even if someone gains S3 access, they can’t read the results without also having KMS decrypt permissions.

CSE-KMS is the most restrictive option. Athena encrypts the results on the client side before they ever reach S3, using a KMS key you specify. Only users with KMS permissions can decrypt these results. This is useful when you want to ensure that even AWS service personnel or cloud administrators can’t access the data.

To configure encryption for a workgroup, you set it in the workgroup’s result configuration. Here’s what that looks like in the AWS Management Console flow, expressed as a configuration:

For SSE-S3:

EncryptionConfiguration:
  EncryptionOption: SSE_S3

For SSE-KMS:

EncryptionConfiguration:
  EncryptionOption: SSE_KMS
  KmsKey: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012

For CSE-KMS:

EncryptionConfiguration:
  EncryptionOption: CSE_KMS
  KmsKey: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012

Which option should you choose? SSE-S3 is fine for non-sensitive analytics where you’re confident that S3 permissions are sufficiently restrictive. SSE-KMS is the sweet spot for most organizations—it provides strong encryption and fine-grained access control without significant operational overhead. CSE-KMS makes sense for highly sensitive data where you want maximum assurance, though it does add some performance overhead since encryption happens on Athena’s compute rather than being offloaded to the S3 service layer.

One important detail: the encryption configuration applies to query results, not to the source data being queried. If your source data in S3 is encrypted (which it should be), that’s a separate encryption configuration on those S3 buckets. Athena can read encrypted source data just fine—as long as the principal executing the query has the appropriate KMS permissions for that data’s encryption key.

Fine-Grained Access Control with AWS Lake Formation#

IAM policies and Athena workgroups handle broad access control, but they’re not surgical instruments. If you have a table with 50 columns and you want to grant a user access to only 10 of those columns, an IAM policy won’t help you. Similarly, if you have a table with 1 million rows but you want a user to see only rows where department = 'Marketing', Athena alone can’t enforce that at the permission level.

This is where AWS Lake Formation enters the picture. Lake Formation is AWS’s data governance service, and it sits on top of the Glue Data Catalog to provide column-level, row-level, and tag-based access controls.

Column-level access control is straightforward. With Lake Formation, you can grant a user permission to read a table, but only certain columns within that table. If your customers table has columns like customer_id, name, email, phone, credit_card_number, and ssn, you might grant a marketing analyst access to customer_id, name, and email, but deny access to credit_card_number and ssn. When that analyst queries the table, Athena enforces this at the query engine level—if their query tries to select a column they don’t have access to, the query fails.

Row-level filtering is more sophisticated. Lake Formation allows you to define row-level filters based on expressions. For example, you might grant a regional sales manager access to the sales table, but only rows where region = 'US-West'. When they query the table, Lake Formation injects a filter condition into their query, so they see only the rows they’re authorized to see. This happens transparently—the user doesn’t need to write the filter themselves; Lake Formation applies it automatically.

Tag-based access control brings another dimension. You tag columns and tables with business-relevant labels—for example, PII (personally identifiable information), Financial, Public, or Confidential. Then you grant permissions based on these tags. If a user is tagged as FinanceTeam, they get access to all columns and rows tagged as Financial. If they’re tagged as PublicAnalytics, they can only access columns and rows tagged as Public. This approach scales well as your data grows because you don’t need to update permissions for every new table; you just tag it and the permission rules apply automatically.

To use Lake Formation with Athena, you first need to set up Lake Formation and register your S3 data lake location. Then you grant permissions through the Lake Formation console, not through IAM. The permissions look different from IAM policies—they’re more declarative and data-centric.

Here’s a conceptual example: you register the s3://my-data-lake/ location with Lake Formation. You create a database marketing_db within Lake Formation. You grant a principal named marketing_analyst these Lake Formation permissions:

Database: marketing_db — SELECT
Table: marketing_db.campaigns — SELECT
Columns: campaign_id, campaign_name, start_date, budget — SELECT

If the analyst tries to query all columns:

SELECT * FROM marketing_db.campaigns;

Athena will return only the columns they have access to. If they try to query restricted columns:

SELECT cost_per_click, competitor_analysis FROM marketing_db.campaigns;

Athena returns an access denied error because they lack permission for those columns.

Lake Formation integrates deeply with Athena. When you run a query, Athena checks both the principal’s IAM permissions (for the workgroup, for Glue access, for S3 access) and the principal’s Lake Formation permissions. If either denies access, the query fails. This is called the “greedy union” model—your permissions must pass both gates.

Combining IAM, Workgroups, and Lake Formation#

The real power emerges when you combine these three security mechanisms. Here’s a realistic scenario:

Your organization has a data lake in S3 containing customer data, sales data, and financial data. You have three teams: Marketing, Sales, and Finance. Each team should have a dedicated Athena workgroup. The Finance team’s workgroup can access all data; the Marketing team’s workgroup can access only marketing and customer data; the Sales team’s workgroup can access only sales and customer data.

Within the customer data, there are sensitive columns like SSN and credit card information that even the Finance team shouldn’t see in raw form (they should use redacted views instead).

Here’s how you’d structure this:

IAM Layer: Grant each team’s principal role permission to run queries in their workgroup:
- arn:aws:iam::123456789012:role/MarketingAnalyst → can run in marketing-workgroup
- arn:aws:iam::123456789012:role/SalesAnalyst → can run in sales-workgroup
- arn:aws:iam::123456789012:role/FinanceAnalyst → can run in finance-workgroup
Workgroup Layer: Attach policies to each workgroup restricting S3 access:
- marketing-workgroup: Can read from s3://datalake/marketing/* and s3://datalake/customers/*, write to s3://results/marketing/
- sales-workgroup: Can read from s3://datalake/sales/* and s3://datalake/customers/*, write to s3://results/sales/
- finance-workgroup: Can read from s3://datalake/*, write to s3://results/finance/
Lake Formation Layer: Grant column-level permissions:
- Marketing: Access to customers table but only columns customer_id, name, email, account_status
- Sales: Access to customers table and only columns customer_id, name, company_size, industry
- Finance: Access to all columns in all tables
- All teams: Denied access to credit_card_number and ssn columns (enforced via Lake Formation even if someone bypasses other controls)

With this setup, if a Marketing analyst tries to access the Finance workgroup (maybe they have the IAM permission), the workgroup policy blocks them. If they somehow access the Sales workgroup, the workgroup policy blocks their S3 access. And even if someone with the right workgroup access tries to select the SSN column, Lake Formation blocks them at the query engine level.

Practical Considerations and Common Pitfalls#

When implementing Athena security, several practical issues come up repeatedly:

The Glue Catalog permission gap: Many teams forget that querying data requires Glue Catalog access. They grant S3 permissions generously but restrict Glue permissions, and queries mysteriously fail with “table not found” errors. Always include glue:GetDatabase and glue:GetTable in your IAM policies.

Encryption key access: When using SSE-KMS or CSE-KMS for result encryption, remember that users need both S3 permissions (to write the encrypted file) and KMS permissions (to encrypt and decrypt). A user might have S3 write permissions but lack kms:Decrypt permission, and suddenly they can’t decrypt their own query results. Similarly, when querying encrypted source data, users need KMS permissions for the keys that encrypted that data.

Workgroup result location ownership: The result location for a workgroup should typically be owned by a service role, not individual users. Otherwise, when Athena writes results, permission errors occur. The workgroup itself (via its attached IAM policy) should have write permissions to the result location.

Lake Formation delegation: Lake Formation has a concept of “data lake admins” who manage permissions for others. If you’re not a data lake admin, you can’t grant Lake Formation permissions. Many teams struggle with this when first setting up Lake Formation because they assume their principal IAM permissions transfer to Lake Formation. They don’t—Lake Formation has its own permission model.

Testing permissions: Use the IAM policy simulator to validate permissions before deploying. Try to predict what a user can and can’t access, then test it. Better to discover a missing permission in development than have a critical query fail in production.

Conclusion#

Securing Amazon Athena isn’t a single decision but a layered approach. IAM provides the foundational access control, determining who can run queries and access which S3 and Glue resources. Athena workgroups add a second layer, scoping permissions to logical team boundaries and result locations. Encryption protects data at rest, giving you choices from simple S3-managed encryption to KMS-based approaches with fine-grained key management. And AWS Lake Formation adds surgical precision for data governance, enabling column-level and row-level access control that IAM alone can’t express.

As you build your analytics infrastructure, start with a clear security model: which teams need access to which data, and at what granularity? Then work outward from IAM to workgroups to Lake Formation, adding each layer as your requirements demand. Document your decisions, test thoroughly with realistic scenarios, and regularly audit permissions to ensure they still align with your intentions.

The effort you invest in security at the start pays dividends as your data lake grows. Queries run faster when they’re not blocked by missing permissions. Data remains protected. Teams move with confidence. And you sleep better knowing that sensitive data is genuinely secured, not just hoped-to-be secured.