To get the object versions of a single object in AWS S3, we will need to use the list_object_versions()
method of boto3. Below are 3 methods to list all the versions of a single S3 object using Python and boto3.
- Example 1: Code to list all versions of an S3 object file using boto3 resource
- Example 2: Code to list all versions of an S3 object file using boto3 client paginator
- Example 3: Code to list all versions of an S3 object file using boto3 client Next Marker
You can scroll to the codes below to quickly access the Python scripts.
All 3 codes do the same thing. The function get_file_versions_list()
will accept the bucket name (bucket_name
) and the target S3 object key (object_key
), then it will use boto3 to get the list of object versions and delete markers of the target S3 object. It will then sort that list from latest to oldest, and then count the number of versions the object has.
- Glossary
- Example Codes
- Output
- Working with Prefix Parameter
- Working with Delete Markers
- Sorting the Response
Getting the list of versions and the delete markers was not as straight as I thought it would be. In the latter part of this article, I will be discussing the complexity of the prefix parameter and the delete markers.
Glossary
If you are new to Amazon Web Services (AWS), S3, and boto3, here are some terms that might help you understand the code below.
Amazon Web Services (AWS) is a comprehensive, widely-used cloud computing platform offering various services like computing power, storage, and databases among others, provided by Amazon. S3 is under the storage services of AWS.
AWS S3 (Simple Storage Service) is a scalable, secure, and highly available object storage service offered by Amazon Web Services (AWS). In simple terms, you can put your files here.
AWS S3 bucket is a container within the Amazon Simple Storage Service (S3) that stores objects such as files and data, accessible via unique URLs.
AWS S3 object refers to the individual data entity stored within an S3 bucket, consisting of the data itself, metadata, and a unique key identifier. In short, objects are files inside S3. I will interchange the terms object and file since they are the same thing in AWS S3.
S3 object key is a unique identifier that acts like a file name within an AWS S3 bucket, specifying the location of a particular object.
boto3 is the official software development kit (SDK) for Python, facilitating the interaction and management of Amazon Web Services (AWS) resources through code.
Examples Codes
Example 1: Code to list all versions of an S3 object file using boto3 resource
import boto3
# Define a function to get the list of versions of target S3 object
def get_file_versions_list(bucket_name, object_key):
# Initialize boto3 to use S3 resource
s3_resource = boto3.resource('s3')
# Get the S3 Bucket
s3_bucket = s3_resource.Bucket(name=bucket_name)
s3_object_versions_iterator = s3_bucket.object_versions.filter(
Prefix=object_key
)
# Initialize the resulting list of object versions
s3_object_versions_list = []
# loop through all the object versions of returned by the iterator
for s3_object_version in s3_object_versions_iterator:
# Make sure that only the target object key will be included
if object_key == s3_object_version.key:
# Build the object version dictionary
object_version = {
'Key': s3_object_version.key,
'VersionId': s3_object_version.version_id,
'LastModified': s3_object_version.last_modified,
'IsLatest': s3_object_version.is_latest
}
# Check if delete marker
if s3_object_version.size != None:
# Not Delete Marker
object_version['Size'] = s3_object_version.size
object_version['StorageClass'] = s3_object_version.storage_class
object_version['IsDeleteMarker'] = False
else:
# Delete Marker
object_version['IsDeleteMarker'] = True
# Add the version dictionary to the list of versions
s3_object_versions_list.append(object_version)
# Sort the resut as object version list from latest to oldest
s3_object_versions_list = sorted(s3_object_versions_list, key=lambda x: x['LastModified'], reverse=True)
# Return the sorted list of version of the target object
return s3_object_versions_list
# Set the Bucket Name and Object Key
BUCKET_NAME = 'radishlogic-bucket2'
OBJECT_KEY = 's3_folder/file.txt'
# Call the function that gets the list of versions of target S3 object file
file_versions_list = get_file_versions_list(
bucket_name=BUCKET_NAME,
object_key=OBJECT_KEY
)
# Print the versions of the target S3 object file
for file_version in file_versions_list:
print(file_version)
print(f'S3 object {OBJECT_KEY} in bucket {BUCKET_NAME} has {len(file_versions_list)} versions.')
Example 2: Code to list all versions of an S3 object file using boto3 client paginator
import boto3
# Define a function to get the list of versions of target S3 object
def get_file_versions_list(bucket_name, object_key):
# Initialize boto3 S3 Client
s3_client = boto3.client('s3')
# Get the paginator for list_object_versions
s3_paginator = s3_client.get_paginator('list_object_versions')
# Set the S3 Bucket and Prefix for the paginator
s3_page_iterator = s3_paginator.paginate(
Bucket=bucket_name,
Prefix=object_key
)
# Initialize the resulting list of object versions
s3_object_versions_list = []
# Get the S3 response for each page of the iterator
for s3_page_response in s3_page_iterator:
# Get the versions of the file, if there are no versions return an empty list
for s3_object_version in s3_page_response.get('Versions', []):
# Make sure that only the target object key will be included
if object_key == s3_object_version['Key']:
# Build the object version dictionary
object_version = {
'Key': s3_object_version['Key'],
'VersionId': s3_object_version['VersionId'],
'LastModified': s3_object_version['LastModified'],
'Size': s3_object_version['Size'],
'StorageClass ': s3_object_version['StorageClass'],
'IsLatest': s3_object_version['IsLatest'],
'IsDeleteMarker': False
}
# Add the version dictionary to the list of versions
s3_object_versions_list.append(object_version)
# Get the delete markers of the file, if there are no versions return an empty list
for s3_object_delete_marker in s3_page_response.get('DeleteMarkers', []):
# Make sure that only the target key will be included
if object_key == s3_object_delete_marker['Key']:
# Build the object version dictionary, but tag as Delete Marker
object_version = {
'Key': s3_object_delete_marker['Key'],
'VersionId': s3_object_delete_marker['VersionId'],
'LastModified': s3_object_delete_marker['LastModified'],
'IsLatest': s3_object_delete_marker['IsLatest'],
'IsDeleteMarker': True
}
# Add the version dictionary to the list of versions
s3_object_versions_list.append(object_version)
# Sort the resut as object version list from latest to oldest
s3_object_versions_list = sorted(s3_object_versions_list, key=lambda x: x['LastModified'], reverse=True)
# Return the sorted list of version of the target object
return s3_object_versions_list
# Set the Bucket Name and Object Key
BUCKET_NAME = 'radishlogic-bucket2'
OBJECT_KEY = 's3_folder/file.txt'
# Call the function that gets the list of versions of target S3 object file
file_versions_list = get_file_versions_list(
bucket_name=BUCKET_NAME,
object_key=OBJECT_KEY
)
# Print the versions of the target S3 object file
for file_version in file_versions_list:
print(file_version)
print(f'S3 object {OBJECT_KEY} in bucket {BUCKET_NAME} has {len(file_versions_list)} versions.')
Example 3: Code to list all versions of an S3 object file using boto3 client Next Marker
import boto3
# Define a function to get the list of versions of target S3 object
def get_file_versions_list(bucket_name, object_key):
# Initialize boto3 to use s3 client
s3_client = boto3.client('s3')
# Initialize the resulting list of object versions
s3_object_versions_list = []
# Arguments to be used for list_object_v2
operation_parameters = {
'Bucket': bucket_name,
'Prefix': object_key
}
# Indicator whether to stop the loop or not
done = False
# while loop implemented as a do-while loop
while not done:
# Calling list_object_versions function using the unpacked operation_parameters
s3_response = s3_client.list_object_versions(**operation_parameters)
# Get the versions of the file, if there are no versions return an empty list
for s3_object_version in s3_response.get('Versions', []):
# Make sure that only the target object key will be included
if object_key == s3_object_version['Key']:
# Build the object version dictionary
object_version = {
'Key': s3_object_version['Key'],
'VersionId': s3_object_version['VersionId'],
'LastModified': s3_object_version['LastModified'],
'Size': s3_object_version['Size'],
'StorageClass ': s3_object_version['StorageClass'],
'IsLatest': s3_object_version['IsLatest'],
'IsDeleteMarker': False
}
# Add the version dictionary to the list of versions
s3_object_versions_list.append(object_version)
# Get the delete markers of the file, if there are no versions return an empty list
for s3_object_delete_marker in s3_response.get('DeleteMarkers', []):
# Make sure that only the target key will be included
if object_key == s3_object_delete_marker['Key']:
# Build the object version dictionary, but tag as Delete Marker
object_version = {
'Key': s3_object_delete_marker['Key'],
'VersionId': s3_object_delete_marker['VersionId'],
'LastModified': s3_object_delete_marker['LastModified'],
'IsLatest': s3_object_delete_marker['IsLatest'],
'IsDeleteMarker': True
}
# Add the version dictionary to the list of versions
s3_object_versions_list.append(object_version)
if s3_response['IsTruncated']:
# Set the Next Marker for reference to next call of list_object_versions
operation_parameters['KeyMarker'] = s3_response['NextKeyMarker']
operation_parameters['VersionIdMarker'] = s3_response['NextVersionIdMarker']
else:
done = True
# Sort the resut as object version list from latest to oldest
s3_object_versions_list = sorted(s3_object_versions_list, key=lambda x: x['LastModified'], reverse=True)
# Return the sorted list of version of the target object
return s3_object_versions_list
# Set the Bucket Name and Object Key
BUCKET_NAME = 'radishlogic-bucket2'
OBJECT_KEY = 's3_folder/file.txt'
# Call the function that gets the list of versions of target S3 object file
file_versions_list = get_file_versions_list(
bucket_name=BUCKET_NAME,
object_key=OBJECT_KEY
)
# Print the versions of the target S3 object file
for file_version in file_versions_list:
print(file_version)
print(f'S3 object {OBJECT_KEY} in bucket {BUCKET_NAME} has {len(file_versions_list)} versions.')
Output
Below is the output I got when I did my test. The output is the same regardless which of the Python script methods you get.
{'Key': 's3_folder/file.txt', 'VersionId': '3Y0p1GhLtylx99oksIwP8YmwoS1srktL', 'LastModified': datetime.datetime(2023, 12, 2, 3, 14, 37, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': True, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': '_hbw3KP72Rojw5arBYbU6sKAqgHZC2iT', 'LastModified': datetime.datetime(2023, 12, 2, 3, 14, 32, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'hb2KW7LgYPjJU6mXwqjAgEAs9l1UFUgM', 'LastModified': datetime.datetime(2023, 12, 2, 3, 13, 50, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'CkHlJBnJRAj3k69bWC70OTTX9aljEfdO', 'LastModified': datetime.datetime(2023, 12, 2, 3, 13, 29, tzinfo=tzutc()), 'IsLatest': False, 'IsDeleteMarker': True}
{'Key': 's3_folder/file.txt', 'VersionId': 'dkDVdwW0GQSI.wHVZBT2Da4RZZJsrofO', 'LastModified': datetime.datetime(2023, 12, 2, 2, 34, 51, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'znJMUZWVTFmjl8ud2DJVKUtawarIkl72', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 53, tzinfo=tzutc()), 'IsLatest': False, 'IsDeleteMarker': True}
{'Key': 's3_folder/file.txt', 'VersionId': 'd_s2rOvz0dSGEg3pr0KQQY8mCcQHgeCq', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 24, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'ZjKSQj3hjo7B7JLpBnY43.C4NAEOnWmp', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 18, tzinfo=tzutc()), 'IsLatest': False, 'IsDeleteMarker': True}
{'Key': 's3_folder/file.txt', 'VersionId': 'gXtm5.Z8CbQfwOuT_4cwTXwbHeAvNWaD', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 15, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': '4OjZlzcRVsviuYXr617eS0fd7isy6JL9', 'LastModified': datetime.datetime(2023, 12, 2, 2, 29, 27, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'AW1OmzXHuh2mLv7kxRyD.JS3.F6jxy_1', 'LastModified': datetime.datetime(2023, 12, 2, 2, 28, 51, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
S3 object s3_folder/file.txt in bucket radishlogic-bucket2 has 11 versions.
Working with the Prefix Parameter
I was planning this to only be a quick tutorial to use Python boto3 to get the list of versions of a file in S3, as it turns out it was not as easy as I imagined.
The list_object_versions()
does not filter S3 objects based on their key, but it filters them by the prefix of the object key.
Luckily, you can specify the whole key of your target object as the prefix
parameter of list_object_versions()
. Unluckily, the target object key might be a prefix of another existing object in the S3 Bucket and the boto3 function that we will use will return also the versions of this object file.
An example of this is the following files.
Object Key | Comments |
s3_folder/file.doc | Target object’s key |
s3_folder/file.docx | The target object’s key is a prefix of this object key. |
To filter this in the Python scripts above, we will only add the object version if the key is equal to our target object’s key. This is the same for the delete markers.
Working with the Delete Markers
I consider delete markers as an object’s version. That is why I find it weird that the delete markers are grouped separately in the response of list_object_versions()
. In the response, object versions are located in the 'Versions'
dictionary key while delete markers are located in the DeleteMarkers
dictionary key.
In another twist, when using boto3 resource, delete markers are in the same group as the object versions. The catch is, that there is no clear indicator if a version is a delete marker. I did find out that if the .size
and the .storage_class
attribute of the ObjectVersion is None (null) then that is a delete marker.
In the codes above for getting the object versions using boto3 resource, I only checked the .size
attribute.
Interestingly, an object can have multiple delete markers. To create this, I deleted an object, then wrote the file again, then deleted it, and wrote the file again. AWS S3 was able to show the historical events when this happened. In fact, in one of the test objects that I have, I had created a total of 4 delete markers that exists at the same time for a single object.
Sorting the Response
I did some tests on boto3 client’s method list_object_versions()
, and here’s what I found out.
The object versions (without the delete markers) are sorted based first by the keys in ascending order, and second by the Last Modified date in descending order. In short, for a single object, the latest version will appear first.
This is the same for Delete Markers, sorted first by the object keys in ascending order, then by the Last Modified date in descending order.
Then there’s the MaxKeys
parameter which has a default value of 1000 object versions. What the MaxKeys
parameter does is limit the number of object versions per call of the list_object_versions()
.
This begs the question, is the MaxKeys parameter the combination of the object versions and the delete markers? The answer to this is yes. The total number of object versions and delete markers of a response will not exceed 1000 object versions.
What if the object versions is more than 1000 in a single object? Then the response will be truncated or cut in chunks of 1000 object versions. You will need to call list_object_versions()
either by paginator or by the Next Marker, as shown in Example code 2 and 3, respectively.
Once we have combined the object versions and delete markers in a single list, we need to sort all of them again in order properly sort them from latest to earliest.
Another interesting part is when we get the object versions via boto3 S3 resource. This method automatically combines the object versions and the delete markers in a single iterator. Unfortunately, it does not sort them proper by Last Modified date.
The order of the return of s3_bucket.object_versions.filter()
is the object versions first then the delete markers last. It’s not properly sorted by Last Modified date, that is why it still needs to go to another sorting.
To demonstrate, here’s the output of the code using boto3 resource without sorting the result. As you can see the delete markers are all at the end.
{'Key': 's3_folder/file.txt', 'VersionId': '3Y0p1GhLtylx99oksIwP8YmwoS1srktL', 'LastModified': datetime.datetime(2023, 12, 2, 3, 14, 37, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': True, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': '_hbw3KP72Rojw5arBYbU6sKAqgHZC2iT', 'LastModified': datetime.datetime(2023, 12, 2, 3, 14, 32, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'hb2KW7LgYPjJU6mXwqjAgEAs9l1UFUgM', 'LastModified': datetime.datetime(2023, 12, 2, 3, 13, 50, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'dkDVdwW0GQSI.wHVZBT2Da4RZZJsrofO', 'LastModified': datetime.datetime(2023, 12, 2, 2, 34, 51, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'd_s2rOvz0dSGEg3pr0KQQY8mCcQHgeCq', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 24, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'gXtm5.Z8CbQfwOuT_4cwTXwbHeAvNWaD', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 15, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': '4OjZlzcRVsviuYXr617eS0fd7isy6JL9', 'LastModified': datetime.datetime(2023, 12, 2, 2, 29, 27, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'AW1OmzXHuh2mLv7kxRyD.JS3.F6jxy_1', 'LastModified': datetime.datetime(2023, 12, 2, 2, 28, 51, tzinfo=tzutc()), 'Size': 23, 'StorageClass ': 'STANDARD', 'IsLatest': False, 'IsDeleteMarker': False}
{'Key': 's3_folder/file.txt', 'VersionId': 'CkHlJBnJRAj3k69bWC70OTTX9aljEfdO', 'LastModified': datetime.datetime(2023, 12, 2, 3, 13, 29, tzinfo=tzutc()), 'IsLatest': False, 'IsDeleteMarker': True}
{'Key': 's3_folder/file.txt', 'VersionId': 'znJMUZWVTFmjl8ud2DJVKUtawarIkl72', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 53, tzinfo=tzutc()), 'IsLatest': False, 'IsDeleteMarker': True}
{'Key': 's3_folder/file.txt', 'VersionId': 'ZjKSQj3hjo7B7JLpBnY43.C4NAEOnWmp', 'LastModified': datetime.datetime(2023, 12, 2, 2, 33, 18, tzinfo=tzutc()), 'IsLatest': False, 'IsDeleteMarker': True}
I hope the above helps you get the list of versions of a single object file using Python boto3.
If you have questions or you want to share your experience working on this, let us know in the comments below.