How to list files in an S3 bucket folder using boto3 and Python

If you want to list the files/objects inside a specific folder within an S3 bucket then you will need to use the list_objects_v2 method with the Prefix parameter in boto3.

Below are 3 examples codes on how to list the objects in an S3 bucket folder.

What the code does is that it gets all the files/objects inside the S3 bucket named radishlogic-bucket within the folder named s3_folder/ and adds their keys inside a Python list (s3_object_key_list). It then prints each of the object keys in the list and also prints the number of files in the folder.

This will list the object keys recursively. If there is a subfolder within the target folder, the Python script below will also list the files inside the subfolder.

The code below will work even if the target subdirectory has over 1000 items.

Example 1: Code to list all S3 object keys in a directory using boto3 resource

import boto3

# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')

# Get the S3 Bucket
s3_bucket = s3_resource.Bucket(name='radishlogic-bucket')

# Get the iterator from the S3 objects collection
s3_object_iterator = s3_bucket.objects.filter(Prefix='s3_folder/')

# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []

# loop through all the objects inside the S3 bucket
for s3_object in s3_object_iterator:

    # Get the key of each S3 object
    s3_object_key = s3_object.key

    # Add the s3_object_key to the list of S3 object keys
    s3_object_key_list.append(s3_object_key)

# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
    print(s3_object_key)

# Print the number of objects inside the S3 Bucket
print('Number of files in folder:', len(s3_object_key_list))

This code is similar to listing all the files inside the S3 bucket using boto3 resource. Instead of using s3_bucket.objects.all() we use the s3_bucket.objects.filter(Prefix='s3_folder/') to filter the results to the target folder.


Example 2: Code to list all S3 object keys in a directory using boto3 client paginator

import boto3

# Initialize boto3 to use s3 client
s3_client = boto3.client('s3')


# Get the paginator for list_objects_v2
s3_paginator = s3_client.get_paginator('list_objects_v2')

# Set the S3 Bucket to the paginator
s3_page_iterator = s3_paginator.paginate(
    Bucket='radishlogic-bucket',
    Prefix='s3_folder/'
)

# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []

# Get the S3 response for each page of the iterator
for s3_page_response in s3_page_iterator:

    # Get the list of S3 objects for each page response
    for s3_object in s3_page_response['Contents']:

        # Get the key of each S3 object
        s3_object_key = s3_object['Key']

        # Add the s3_object_key to the list of S3 object keys
        s3_object_key_list.append(s3_object_key)


# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
    print(s3_object_key)

# Print the number of objects inside the S3 Bucket
print('Number of files in folder:', len(s3_object_key_list))

This is the same code when listing all S3 objects inside a bucket using boto3 client paginator, except that we are adding the Prefix parameter when calling list_objects_v2 paginator to limit the results to the target directory.


Example 3: Code to list all S3 object keys in a directory S3 Bucket using boto3 client nextContinuationToken

import boto3

# Initialize boto3 to use s3 client
s3_client = boto3.client('s3')

# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []

# Arguments to be used for list_object_v2
operation_parameters = {
    'Bucket': 'radishlogic-bucket',
    'Prefix': 's3_folder/'
}

# Indicator whether to stop the loop or not
done = False

# while loop implemented as a do-while loop
while not done:

    # Calling list_objects_v2 function using the unpacked operation_parameters
    s3_response = s3_client.list_objects_v2(**operation_parameters)

    # Get the list of s3 objects for every s3_response
    for s3_object in s3_response['Contents']:

        # Get the S3 object key
        s3_object_key = s3_object['Key']

        # Add the s3_object_key to the list of S3 object keys
        s3_object_key_list.append(s3_object_key)

    # Get the next continuation token
    nextContinuationToken = s3_response.get('NextContinuationToken')

    if nextContinuationToken is None:
        # If the next continuation token does not exist, set the done indicator to True to exit the loop
        done = True
    else:
        # If the next continuation token exists, update the operation_parameters
        operation_parameters['ContinuationToken'] = nextContinuationToken

# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
    print(s3_object_key)

# Print the number of objects inside the S3 Bucket
print('Number of files in folder:', len(s3_object_key_list))

Similar to the code for listing all S3 objects within a bucket using boto3 client’s nextContinuationToken. The only difference is the addition of the Prefix parameter in the list_objects_v2 call to narrow down the results to our target folders.


Output Format

When using the list_objects_v2 method in boto3 for an S3 bucket, the resulting object keys include the entire path of the objects stored in the bucket. For instance, if an object named ‘example.txt’ is stored in a folder named ‘s3_folder’, the object key will appear as ‘s3_folder/example.txt‘ in the output.


Subfolder within a Folder

To list objects in a subfolder within a folder, you will need to put the path of the subfolder in the Prefix parameter.

For example, the subfolder ‘folder1’ is within the ‘s3_folder’, then Prefix will have the value of ‘s3_folder/folder1/’.

Same output format as above, it will list the full path of the S3 object in the output such as ‘s3_folder/folder1/sample.csv’.


Use these methods above to manage file/object keys within S3 Bucket folders. If you have any questions or some issues, don’t hesitate to ask in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.