Replication – The Vault Mirror Bug

Replication is one of those "game changer" features.  It's a very powerful feature, but it has the potential to impact any customizations you write.  Code that works perfectly well in a single workgroup environment may behave incorrectly in a multi-workgroup environment.  Even worse, the incorrect behavior may be intermittent, making it hard to isolate what the exact problem is.

To illustrate my point, I'd like to go over a bug that can show up when running Vault Mirror in a multi-workgroup environment.  First, let me quickly go over how the Partial Mirror command works.  When you sync data in Vault Mirror, it remembers the time.  When you run a Partial Mirror command later, it does a search for all new files since the date of the last sync.  This way it only downloads down the new files. 

On a single workgroup, Partial Mirror works perfectly.  It will always find the all the new files added.  However in a replicated environment, it may miss files.  In these cases there are no errors or messages.  It just doesn't download all the files it's supposed to.

Let me describe the issue in detail.  In this example, there are 2 workgroups, A and B.  Vault Mirror is on workgroup A and is set to run a Partial Mirror every minute.  Also, in this example, a minute is about how long it takes to replicate SQL data between A and B.  Let's now add a file to workgroup B and see if Vault Mirror detects the new file.  Here is an example timeline:

  • 4:05:00 – Partial Mirror runs on site A. Scans for files with a CreateDate between 4:04:00 and 4:05:00.
    Result: No hits
  • 4:05:32 – File X added to site B.
  • 4:06:00 – Partial Mirror runs on site A. Scans for files with a CreateDate between 4:05:00 and 4:06:00.
    Result: No hits
  • 4:06:28 – File X replicated to site A. CreateDate is still 4:05:32.
  • 4:07:00 – Partial Mirror runs on site A. Scans for files with a CreateDate between 4:06:00 and 4:07:00.
    Result: No hits

Did you see what happened there?  Because there is a delay replicating the file data from B to A, Vault Mirror was not able to detect the new file.  Another way to look at is is that the CreateDate doesn't accurately reflect the time when the file was added to the database.  There is not data field that will tell you the time of replication.

 

How to fix

In my mind Vault Mirror already has a fix, the Full Mirror command.  It will catch any files that slip through the cracks.  In other words, if you run Full Mirror on Site A at 4:05, it's guaranteed to pick up all the files that are on Site A at 4:05.

Another solution is to construct 2 queries, one for replicated data and one for local data.  Watch Folder has this same bug, and I used this approach to fix.  The program remembers 2 sets of dates and uses them to run 2 separate queries.  Query 1:  Remember the last time the command was run, do a search on all new files owned by local workgroup since the time of the last command.  Query 2:  Remember the greatest CreateDate of a replicated file, do a search on all files not owned by the local workgroup since the saved date.

The second solution is not 100%, but I thought it was good enough for Watch Folder.  The second solution incorrectly assumes that each workgroup replicates at the same time.  Transferring ownership manually on files will also screw up this algorithm.

 

Conclusion

These are the only 2 solutions I can think of at the moment, but I'm sure there are more.  The bigger concern is that there other potential bugs that can happen in a replicated environment.  This is just an example of one such bug, and the purpose is to get you thinking in terms of replicated data.

The best way to locate and fix these issues is to test on a replicated environment.  I don't think it's possible to catch these issues on a single workgroup.


Comments

Leave a Reply

Discover more from Autodesk Developer Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading