If you want to list the files/objects inside a specific folder within an S3 bucket then you will need to use the list_objects_v2
method with the Prefix
parameter in boto3.
Below are 3 examples codes on how to list the objects in an S3 bucket folder.
- Example 1: List all S3 object keys in a directory using boto3 resource
- Example 2: List all S3 object keys in a directory using boto3 client paginator
- Example 3: List all S3 object keys in a directory S3 Bucket using boto3 client nextContinuationToken
What the code does is that it gets all the files/objects inside the S3 bucket named radishlogic-bucket within the folder named s3_folder/ and adds their keys inside a Python list (s3_object_key_list
). It then prints each of the object keys in the list and also prints the number of files in the folder.
This will list the object keys recursively. If there is a subfolder within the target folder, the Python script below will also list the files inside the subfolder.
The code below will work even if the target subdirectory has over 1000 items.
Example 1: Code to list all S3 object keys in a directory using boto3 resource
import boto3
# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')
# Get the S3 Bucket
s3_bucket = s3_resource.Bucket(name='radishlogic-bucket')
# Get the iterator from the S3 objects collection
s3_object_iterator = s3_bucket.objects.filter(Prefix='s3_folder/')
# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []
# loop through all the objects inside the S3 bucket
for s3_object in s3_object_iterator:
# Get the key of each S3 object
s3_object_key = s3_object.key
# Add the s3_object_key to the list of S3 object keys
s3_object_key_list.append(s3_object_key)
# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
print(s3_object_key)
# Print the number of objects inside the S3 Bucket
print('Number of files in folder:', len(s3_object_key_list))
This code is similar to listing all the files inside the S3 bucket using boto3 resource. Instead of using s3_bucket.objects.all()
we use the s3_bucket.objects.filter(Prefix='s3_folder/')
to filter the results to the target folder.
Example 2: Code to list all S3 object keys in a directory using boto3 client paginator
import boto3
# Initialize boto3 to use s3 client
s3_client = boto3.client('s3')
# Get the paginator for list_objects_v2
s3_paginator = s3_client.get_paginator('list_objects_v2')
# Set the S3 Bucket to the paginator
s3_page_iterator = s3_paginator.paginate(
Bucket='radishlogic-bucket',
Prefix='s3_folder/'
)
# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []
# Get the S3 response for each page of the iterator
for s3_page_response in s3_page_iterator:
# Get the list of S3 objects for each page response
for s3_object in s3_page_response['Contents']:
# Get the key of each S3 object
s3_object_key = s3_object['Key']
# Add the s3_object_key to the list of S3 object keys
s3_object_key_list.append(s3_object_key)
# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
print(s3_object_key)
# Print the number of objects inside the S3 Bucket
print('Number of files in folder:', len(s3_object_key_list))
This is the same code when listing all S3 objects inside a bucket using boto3 client paginator, except that we are adding the Prefix
parameter when calling list_objects_v2
paginator to limit the results to the target directory.
Example 3: Code to list all S3 object keys in a directory S3 Bucket using boto3 client nextContinuationToken
import boto3
# Initialize boto3 to use s3 client
s3_client = boto3.client('s3')
# Initialize the resulting s3_object_key_list to an empty list
s3_object_key_list = []
# Arguments to be used for list_object_v2
operation_parameters = {
'Bucket': 'radishlogic-bucket',
'Prefix': 's3_folder/'
}
# Indicator whether to stop the loop or not
done = False
# while loop implemented as a do-while loop
while not done:
# Calling list_objects_v2 function using the unpacked operation_parameters
s3_response = s3_client.list_objects_v2(**operation_parameters)
# Get the list of s3 objects for every s3_response
for s3_object in s3_response['Contents']:
# Get the S3 object key
s3_object_key = s3_object['Key']
# Add the s3_object_key to the list of S3 object keys
s3_object_key_list.append(s3_object_key)
# Get the next continuation token
nextContinuationToken = s3_response.get('NextContinuationToken')
if nextContinuationToken is None:
# If the next continuation token does not exist, set the done indicator to True to exit the loop
done = True
else:
# If the next continuation token exists, update the operation_parameters
operation_parameters['ContinuationToken'] = nextContinuationToken
# Print each S3 object key inside the list
for s3_object_key in s3_object_key_list:
print(s3_object_key)
# Print the number of objects inside the S3 Bucket
print('Number of files in folder:', len(s3_object_key_list))
Similar to the code for listing all S3 objects within a bucket using boto3 client’s nextContinuationToken. The only difference is the addition of the Prefix
parameter in the list_objects_v2
call to narrow down the results to our target folders.
Output Format
When using the list_objects_v2
method in boto3
for an S3 bucket, the resulting object keys include the entire path of the objects stored in the bucket. For instance, if an object named ‘example.txt’ is stored in a folder named ‘s3_folder’, the object key will appear as ‘s3_folder/example.txt‘ in the output.
Subfolder within a Folder
To list objects in a subfolder within a folder, you will need to put the path of the subfolder in the Prefix parameter.
For example, the subfolder ‘folder1’ is within the ‘s3_folder’, then Prefix
will have the value of ‘s3_folder/folder1/’.
Same output format as above, it will list the full path of the S3 object in the output such as ‘s3_folder/folder1/sample.csv’.
Use these methods above to manage file/object keys within S3 Bucket folders. If you have any questions or some issues, don’t hesitate to ask in the comments below.