Wednesday, May 19, 2010

Understanding ReadDirectoryChangesW - Part 1

The longest, most detailed description in the world of how to successfully use ReadDirectoryChangesW.

This is Part 1 of 2. This part describes the theory and Part 2 describes the implementation.

Download the sample code for this article.

I have spent this week digging into the barely-documented world of ReadDirectoryChangesW and I hope this article saves someone else some time. I believe I've read every article I could find on the subject, as well as numerous code samples. Almost all of the examples, including the one from Microsoft, either have significant shortcoming or have outright mistakes.

You'd think that this problem would have been a piece of cake for me, having been the author of Multithreading Applications in Win32, where I wrote a chapter about the differences between synchronous I/O, signaled handles, overlapped I/O, and I/O completion ports. Except that I only write overlapped I/O code about once every five years, which is just about long enough for me to forget how painful it was the last time. This endeavor was no exception.

Four Ways to Monitor Files and Directories

First, a brief overview of monitoring directories and files. In the beginning there was SHChangeNotifyRegister. It was implemented using Windows messages and so required a window handle. It was driven by notifications from the shell (Explorer), so your application was only notified about things that the shell cared about - which almost never aligned with what you cared about. It was useful for monitoring things that the user did in Explorer, but not much else.

SHChangeNotifyRegister was fixed in Windows Vista so it could report all changes to all files, but is was too late - there are still several hundred million Windows XP users and that's not going to change any time soon.

SHChangeNotifyRegister also had a performance problem, since it was based on Windows messages. If there were too many changes, your application would start receiving roll-up messages that just said "something changed" and you had to figure out for yourself what had really happened. Fine for some applications, rather painful for others.

Windows 2000 brought two new interfaces, FindFirstChangeNotification and ReadDirectoryChangesW. FindFirstChangeNotification is fairly easy to use but doesn't give any information about what changed. Even so, it can be useful for applications such as fax servers and SMTP servers that can accept queue submissions by dropping a file in a directory. ReadDirectoryChangesW does tell you what changed and how, at the cost of additional complexity.

Similar to SHChangeNotifyRegister, both of these new functions suffer from a performance problem. They can run significantly faster than shell notifications, but moving a thousand files from one directory to another will still cause you to lose some (or many) notifications. The exact cause of the missing notifications is complicated. Surprisingly, it apparently has little to do with how fast you process notifications.

Note that FindFirstChangeNotification and ReadDirectoryChangesW are mutually exclusive. You would use one or the other, but not both.

Windows XP brought the ultimate solution, the Change Journal, which could track in detail every single change, even if your software wasn't running. Great technology, but equally complicated to use.

The fourth and final solution is is to install a File System Filter, which was used in the popular SysInternals FileMon tool. There is a sample of this in the Windows Driver Kit (WDK). However, this solution is essentially a device driver and so potentially can cause system-wide stability problems if not implemented exactly correctly.

For my needs, ReadDirectoryChangesW was a good balance of performance versus complexity

The Puzzle

The biggest challenge to using ReadDirectoryChangesW is that there are several hundred possibilities for combinations of I/O mode, handle signaling, waiting methods, and threading models. Unless you're an expert on Win32 I/O, it's extremely unlikely that you'll get it right, even in the simplest of scenarios. (In the list below, when I say "call", I mean a call to ReadDirectoryChangesW.)

A. First, here are the I/O modes:
  1. Blocking synchronous
  2. Signaled synchronous
  3. Overlapped asynchronous
  4. Completion Routine (aka Asynchronous Procedure Call or APC)
B. When calling the WaitForXxx functions, you can:
  1. Wait on the directory handle.
  2. Wait on an event object in the OVERLAPPED structure.
  3. Wait on nothing (for APCs.)
C. To handle notifications, you can use:
  1. Blocking
  2. WaitForSingleObject
  3. WaitForMultipleObjects
  4. WaitForMultipleObjectsEx
  5. MsgWaitForMultipleObjectsEx
  6. I/O Completion Ports
D. For threading models, you can use:
  1. One call per worker thread.
  2. Multiple calls per worker thread.
  3. Multiple calls on the primary thread.
  4. Multiple threads for multiple calls. (I/O Completion Ports)
Finally, when calling ReadDirectoryChangesW, you specify flags to choose what you want to monitor, including file creation, last modification date change, attribute changes, and other flags. You can use one flag per call  and issue multiple calls or you can use use multiple flags in one call. Multiple flags is always the right solution. If you think you need to use multiple calls with one flag per call to make it easier to figure out what to do, then you need to read more about the data contained in the notification buffer returned by ReadDirectoryChangesW.

If your head is now swimming in information overload, you can easily see why so many people have trouble getting this right.

Recommended Solutions

So what's the right answer? Here's my opinion, depending on what's most important:

Simplicity - A2C3D1 - Each call to ReadDirectoryChangesW runs  in its own thread and sends the results to the primary thread with PostMessage. Most appropriate for GUI apps with minimal performance requirements. This is the strategy used in CDirectoryChangeWatcher on CodeProject. This is also the strategy used by Microsoft's FWATCH sample.

Performance - A4C6D4 - The highest performance solution is to use I/O completion ports, but, as an aggressively multithreaded solution, it's also a very complex solution that should be confined to servers. It's unlikely to be necessary in any GUI application. If you aren't a multithreading expert, stay away from this strategy.

Balanced - A4C5D3 - Do everything in one thread with Completion Routines. You can have as many outstanding calls to ReadDirectoryChangesW as you need. There are no handles to wait on, since Completion Routines are dispatched automatically. You embed the pointer to your object in the callback, so it's easy to keep callbacks matched up to their original data structure.

Originally I had thought that GUI applications could use MsgWaitForMultipleObjectsEx to intermingle change notifications with Windows messages. This turns out not to work because dialog boxes have their own message loop that's not alertable, so a dialog box being displayed would prevent notifications from being processed. Another good idea steamrolled by reality.

Wrong Techniques

As I was researching this solution, I saw a lot of recommendations that ranged from dubious to wrong to really, really wrong. Here's some commentary on what I saw.

If you are using the Simplicity solution above, don't use blocking calls because the only way to cancel it is with the undocumented technique of closing the handle or the Vista-only technique of CancelSynchronousIo. Instead, use the Signal Synchronous I/O mode by waiting on the directory handle. Also, to terminate threads, don't use TerminateThread, because that doesn't clean up resources and can cause all sorts of problems. Instead, create a manual-reset event object that is used as the the second handle in the call to WaitForMultipleObjects.When the event is set, exit the thread.

If you have dozens or hundreds of directories to monitor, don't use the Simplicity solution. Switch to the Balanced solution. Alternatively, monitor a root common directory and ignore files you don't care about.

If you have to monitor a whole drive, think twice (or three times) about this idea. You'll be notified about every single temporary file, every Internet cache file, every  Application Data change - in short, you'll be getting an enormous number of notifications that could slow down the entire system. If you need to monitor an entire drive, you should probably use the Change Journal instead. This will also allow you to track changes even if your app is not running. Don't even think about monitoring the whole drive with FILE_NOTIFY_CHANGE_LAST_ACCESS.

If you are using overlapped I/O without using an I/O completion port, don't wait on handles. Use Completion Routines instead. This removes the 64 handle limitation, allows the operating system to handle call dispatch, and allows you to embed a pointer to your object in the OVERLAPPED structure. My example in a moment will show all of this.

If you are using worker threads, don't send results back to the primary thread with SendMessage.  Use PostMessage instead. SendMessage is synchronous and will not return if the primary thread is busy. This would defeat the purpose of using a worker thread in the first place.

It's tempting to try and solve the issue of lost notifications by providing a huge buffer. However, this may not be the wisest course of action. For any given buffer size, a similarly-sized buffer has to be allocated from the kernel non-paged memory pool. If you allocate too many large buffers, this can lead to serious problems, including a Blue Screen of Death. Thanks to an anonymous contributor in the MSDN Community Content.

Jump to Part 2 of this article.

Download the sample code for this article.

15 comments:

  1. Hi Jim,

    thank you very much for your detailed explanation. After searching a while in internet I can say your description helped me alot and it is the most complete one.

    Cheers

    ReplyDelete
  2. Thx for sharing. Great explanation of ReadDirectoryChangesW!

    ReplyDelete
  3. Thanks for your article, and the source code, I found it very useful, saved a lot of time!

    I found one thing in CReadChangesRequest::ProcessNotification():

    if (wstrFilename.Right(1) != L"\\")

    Shouldn't this better be:

    if (m_wstrDirectory.Right(1) != L"\\")

    Regards,

    Jost

    ...

    ReplyDelete
  4. Thank You!!! This helps

    ReplyDelete
  5. Hi Jim,

    Thanks for the great article. Any idea how .NET System.IO.FileSystemWatcher implements its functionality. Would you recommend its use with a timer for watching files dropped via FTP?

    Dave W

    ReplyDelete
    Replies
    1. Dave,

      I haven’t done much with .Net (I’m an SDK guy) so I don’t have a good answer for you about that.

      As I discuss near the beginning of Part 1, FindFirstChangeNotification is a much simpler way to monitor for new files. ReadDirectoryChangesW is more complicated than you need. However, you need to read the discussion about timeouts in the Comments after Part 2.

      Delete
    2. We use FileSystemWatcher to monitor for SFTP changes coming in on a SAN.

      This complicates things as we don't see all create events for example, and the client renames their sftped files after copy onto the san.

      We monitor for all events in the directory, using them as an indication that 'something' is happening in the directory. Then we take a directory listing and act on that, looping until nothing is left in the directory, and just registering that other notifications might be occurring - although we take some care to not miss a notification that comes in after the directory listing that says there is nothing left to do.

      This 'strategy' means we are not subject to 'lost notifications'.

      Delete
  6. Hi jim. Your article is great and useful. I have a question I hope you can help me. Anti virus softwares offer some feature they call Real-time protection or on-access scan. as wikipedia says:
    'real-time'means while data loaded into the computer's active memory: when inserting a CD, opening an email, or browsing the web, or when a file already on the computer is opened or executed.
    I'm interested in writing some code to implement this on-access or real-time functionality.
    do you have any suggestions to write a code which can monitor active memory changes and retrieve file address responsible for that change to trigger a scan by some tool.
    thank you very much.

    ReplyDelete
    Replies
    1. Hello Rezatash,

      Windows has some built-in functions for allowing antivirus to do its job, but I've never worked with them. Device change notifications are available at the user level with notifications, but that's too late for antivirus, so it probably needs to be done at the device driver level. I have no experience, sorry.

      Delete
  7. Very helpful. But a probrem can be found at ThreadSafeQueue.h. CThreadSafeQueue::pop() never calls WaitForSingleObject() when the list is not empty.

    ReplyDelete
  8. Hi Jim, thanks for this clarifying article, really. I was browsing the source and a question popped up :)

    In your CReadChangesRequest::NotificationCompletion, in case of error you just delete the CReadChangesRequest object which is fine, except that I took a look at CReadChangesRequest's destructor and there's no trace of uninit calls of any kind in it, even "worse", you assume the object has already been uninit before being destructed by using an assertion onto directory's handle.

    So it _seems_ that in that particular case, the CReadChangesServer object (i.e.: the owner) continues to think that the request still exists as it is still registered in its list (vector here). I did see that m_nOutstandingRequests is decremented but it does not seem to be enough for the CReadChangesServer to avoid making assumption onto request's existence...
    Am I missing something?

    Jean-Charles

    ReplyDelete
  9. polyvertex and Anonymous,

    Thanks so much for your feedback on this article. My schedule at the moment is completely overwhelmed and I don't expect to have time to dig into this for at least several weeks. Your comments definitely point out the complexity of using these APIs. The good news is that this code has been in production use for several years on systems that log all crashes, and we haven't seen any related crashes.

    ReplyDelete
  10. I have been searching for some help on 'tail -f' like solution. Finally, I came across your blog and solved my problem. Thanks!

    ReplyDelete