Welcome to the Inedo Forums! Check out the Forums Guide for help getting started.

If you are experiencing any issues with the forum software, please visit the Contact Form on our website and let us know!

Proget 6.0.4: can't remove docker image blob via API



  • we have a custom retention policy. for this purpose, I wrote a program that calls an api to remove docker images from proget. it works in this way

    1. I get an image digest from heade "Docker-Content-Digest"

    https://proget.server.com/v2/<feed name>/library/<repository name>/manifests/<image tag>

    1. call DELETE request

    https://proget.server.com/v2/<feed name>/library/<repository name>/manifests/<digest>

    and we have a retention Rule customized on a feed
    "cached connector packages prerelease packages not requested for 30 days with all package usage removed for more than 30 days from the feed"

    What am I doing wrong ? Why aren't blobs being removed from disk?


  • inedo-engineer

    Hi @araxnid_6067 ,

    This behavior is expected, and it's handled via Garbage Collection for Docker Registries:

    Unlike packages, a Docker image is not self-contained: it is a reference to a manifest blob, which in turn references a number of layer blobs. These layer blobs may be referenced by other manifests in the registry, which means that you can't simply delete referenced layer blobs when deleting a manifest blob.
    This is where garbage collection comes in; it's the process of removing blobs from the package store when they are no longer referenced by a manifest. ProGet performs garbage collection on Docker registries through the "FeedCleanUp" scheduled job.

    So basically, it will get deleted when the corresponding FeedCleanUp job runs. It's default to every night, and you can see the logs on the Admin > Manage Feed page.

    Cheers,
    Alana



  • @atripp thanks. yes, it works and clean up some blobs from disk.

    but I found on the disk a very old blobs that were created more than a year ago. Is it possible to see in the database or through the API where this blobs is used?


  • inedo-engineer

    Hi @araxnid_6067,

    You can see where those blobs are used by querying the [DockerImageLayers_Extended] view in SQL server. The file name should match the [Blob_Digest] column and that will show you which repository that uses. If you need to see tags too, you will need to join that with [DockerRepositoryTags_Extended] view on the [DockerRepository_Id] column.

    Hope this helps!

    Thanks,
    Dan



  • @Dan_Woolf thank you.

    so if I don't find a blob in DockerImageLayers_Extended it can be removed from disk manually ?


  • inedo-engineer

    @araxnid_6067 is it in the [DockerBlobs] table? If not, then ProGet doesn't know about it, and then it's safe to delete

    Otherwise, it might still be referenced by a manifest, but ProGet doesn't have that relation in the database.

    You'd have to parse [ManifestJson_Bytes] to find out. If you're comfortable with SQL, you could do a "hack" query to convert that column a VARCHAR, then use OPENJSON or a LIKE query to search all manifests for that digest.

    Howver, that's what ProGet does during feed cleanup.



  • @atripp @Dan_Woolf thank you both so much!


  • inedo-engineer

    Hi @araxnid_6067,

    We are always happy to help. Please let us know if you have any other questions for us!

    Thanks,
    Dan



  • I face the same problem (probably).
    Even if I delete all images that use blob, it won't delete blob from disk during feed cleanup.
    I checked digest in [DockerBlobs_Usage] - no entries, parsed [DockerImages].[ManifestJson_Bytes] for that digest - no entries.
    It exists only in [DockerBlobs] and if i try to look for it in any other tables which I can join via [DockerBlobs].[DockerBlob_Id] - failed to find any records.

    And that's not only one blob, looks like all unused blobs never deleted from storage.
    I use windows images, they are huge, so I want to reclaim space from deleted images.

    I use proget 6.0.10


  • inedo-engineer

    Hi @pariv_0352 ,

    Are you able to see the results of the "DockerGarbageCollection" job?

    This is actually what's responsible for deleting those images, and it runs nightly by default.

    Let me share the code to it; if you can understand the database structure already, then hopefully it will help you to identify why it's not workikng, and what you might be able to look at in the logs to help troubleshoot:

    [ScheduledTaskProperties(
    ScheduledTaskTypes.DockerGarbageCollection,
    "Deletes unreferenced Docker blobs.")]
    public sealed class DockerGarbageCollectionTask : ScheduledTaskBase
    {
        public override Task ExecuteAsync(ScheduledTaskContext context) => this.GarbageCollectAsync();
    
        private async Task GarbageCollectAsync()
        {
            using var db = new DB.Context();
    
            this.PercentComplete = 0;
    
            var usedBlobs = new HashSet<DockerDigest>();
    
            this.LogDebug("Gathering list of all rooted blobs...");
    
            foreach (var image in db.DockerImages_GetImages(Feed_Id: null))
            {
                if (image.ContainerConfigBlob_Id.HasValue)
                    usedBlobs.Add(image.ContainerConfigBlobDigest);
    
                DockerManifest manifest;
                try
                {
                    manifest = new DockerManifest(image.ManifestJson_Bytes);
                }
                catch (Exception ex)
                {
                    this.LogError($"Image {image.Image_Digest} has invalid manifest: {ex.Message}");
                    continue;
                }
    
                // "Fat images" do not have blobs as layers.
                if (manifest.Layers == null)
                    continue;
    
                foreach (var l in manifest.Layers)
                    usedBlobs.Add(l.Digest);
            }
    
            this.LogDebug($"Found total of {usedBlobs.Count} rooted blobs; finding unreferenced blobs...");
    
            var allBlobs = (await db.DockerBlobs_GetBlobsAsync(Feed_Id: null))
                .Where(b => b.Feed_Id == null)
                .Select(b => DockerDigest.Parse(b.Blob_Digest));
    
            var unreferencedBlobs = allBlobs
                .Where(d => !usedBlobs.Contains(d))
                .ToList();
    
            this.LogDebug($"Found {unreferencedBlobs.Count} unreferenced blobs.");
    
            using var fileSystem = new DirectoryFileSystem(ProGetConfig.Storage.DockerBlobStorageLibrary);
    
            for (int i = 0; i < unreferencedBlobs.Count; i++)
            {
                this.PercentComplete = (i + 1) * 100 / unreferencedBlobs.Count;
    
                var digest = unreferencedBlobs[i];
                this.LogInformation($"Deleting blob {digest}...");
                if (!ProGetConfig.Feeds.RetentionDryRun)
                {
                    await db.DockerBlobs_DeleteBlobAsync(Feed_Id: null, Blob_Digest: digest.ToString());
                    await fileSystem.DeleteDockerBlobAsync(digest);
                }
            }
        }
    }


  • Hi @atripp.

    Looks like I misundestood logic of "FeedCleanup" job. Every time I run that job, it always says that Found 0 unreferenced blobs. and does nothing.
    But "DockerGarbageCollection" job (which I didn't notice before you said about it) really cleaned storage from unused blobs.

    What for FeedCleanup job then? I checked logs, it never found anything unreferenced.


  • inedo-engineer

    Hi @pariv_0352,

    The "FeedCleanup" job will do two main things; clearing the Multipart Upload Temp files and running retention rules. For Docker feeds, it will also run the deletion of blobs, but only for Feed specific blobs. If you are using common blob storage (which is enabled by default in feeds created in ProGet 5.3+), then that is where the "DockerGarbageCollection" job comes into play. That one handles cleaning up/deleting the blobs that are shared across multiple feeds.

    Hope this helps!

    Thanks,
    Rich



  • Thank you, @rhessinger, for clarification.

    Yes, I use common blob storage, and all that makes sense now.


Log in to reply
 

Inedo Website HomeSupport HomeCode of ConductForums GuideDocumentation