Coding for Performance – Getting All Files

In my last Coding for Performance article, I explained the best way to get File objects from a set of File ID values.  But what if you don't have the ID's?  What if you want to get all the files in the Vault, and you don't have any information to start with?  I came up with 3 techniques you can use to get this information.  Let's see wich one performs the best.

Technique 1: Start at the root folder and recursively scan through each folder one level at a time, gathering up the files along the way.  The cost is going to be 2 API calls per folder in the Vault.  For each folder, we will be calling GetLatestFilesByFolderId and GetFoldersByParentId with the recurse parameter set to false.

Technique 2:  Gather up all the folders and get all the files in all the folders with 1 API call.  You can get all the folders by calling GetFolderRoot then calling GetFoldersByParentId with the recurse parameter set to true.  Next, call GetLatestFilesByFolderIds and pass in all folder IDs.  The result will be all files.  So we get it all in 3 API calls.

Technique 3: Do a file search with no search conditions, which will result in finding all files.  FindFilesBySearchConditions is the functions that I will be using.  I will be passing in null for the folderIds parameter, which will result in a search across all folders.  The number of API calls depends on the paging setup.  Divide the total number of files by the paging size and that's how many API calls are made.

NOTE: I'm going to assume that there are no file shares for this test.  In other words, each file lives in one and only one folder. 

NOTE: Technically there are 2 variables here, the number of folders and the number of files.  To simplify things, I'll assume a constant ratio between files and folders.  This is usually the case in the real world.  The more files you have, the more folders you will probably have.  I will be using a ratio of about 10 files for every 1 folder.

 

Here is the resulting graph

So what happened here?  It looked like Technique 2 was slightly better than Technique 3, but it curved up at the end.  The reason is that Technique 3 tries to get to much information in a single call.  First, it gets every folder in the Vault with a single call to GetFoldersByParentId.  Next, it gets every file in the Vault with a single call to GetLatestFilesByFolderIds.

As I mentioned in my last performance article, you need an upper bound on these calls.  You can't just get millions of objects in a single call.  Both Technique 1 and Technique 3 have boundaries built-in, which keeps things nice an linear.  Technique 3 wins out because it makes less API calls than Technique 1.  Another nice thing is that Technique 3 is folder independent.  It will have the same performance regardless of the number of folders or the folder structure.


Comments

4 responses to “Coding for Performance – Getting All Files”

  1. nathaniel.dickerson@adecco.com Avatar
    nathaniel.dickerson@adecco.com

    Doug,
    I have a application that needs to query my vault. If I know the local ( absolute ) folder and want to return a list of all the vault files in it to know what file names are in it.
    Is there a quick way of scoping the search condition to all files in a folder?
    Thx – NLD

  2. would like to remove the address from the name. Thx

  3. If you know the Vault path, you can call FindFilesBySearchConditions and pass in the Folder Id. The search will only be applied to objects in that folder and any sub-folders.
    Going from a local folder on disk to a Vault folder is a bit tricky. I think most people start at the root Vault folder, get the working folder (Connection.WorkingFoldersManager.GetWorkingFolder()) and use that to map between local path and Vault path. This only works if the sub-folders don’t define their own working folder in another location.

  4. What are you asking? You want to remove your email address from your posts in the comments section?

Leave a Reply to Doug RedmondCancel reply

Discover more from Autodesk Developer Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading