Monday 30 January 2012

Triple Buffering as a Concurrency Mechanism (II)

As a commentor on Reddit has pointed out, this implementation is flawed. The following steps illustrate how one of the buffers change be orphaned (0, 1 and 2 refer to the indices within the buffer array):

MacroLine tmp1tmp2CleanDirtySnap
(initial) 012
TRIPLE_BUFFER_FLIP_WRITERtmp1 = clean 0012
TRIPLE_BUFFER_NEW_SNAPtmp2 = clean 00012
TRIPLE_BUFFER_NEW_SNAPclean = snap 00212
TRIPLE_BUFFER_NEW_SNAPsnap = tmp2 00210
TRIPLE_BUFFER_FLIP_WRITERclean = dirty 00110
TRIPLE_BUFFER_FLIP_WRITERdirty = tmp1 0100

...and we can no longer access buffer two. The problem comes from trying to order two sets of three operations so that pretty much any combination leaves us in a valid state. My revised solution (which will fix the problem of double-snapping in the first post) will use a form of Optimistic Concurrency Control - basically in the two main macros we take a copy of the pointers, do the relevant swap and then atomically overwrite the three pointers only if nothing else has changed them since we took our copy. As three pointers are too big for a single atomic operation, we'll compact them down into three indices into the buffer array, encoded in three pairs of bits in an int:

/* bit flags are (unused) (new write) (2x dirty) (2x clean) (2x snap) */
#define TRIPLE_BUFFER_TYPE(TYPENAME, TYPE) \
  struct TYPENAME { \
    TYPE buffer[3]; \
    volatile uint_fast8_t flags; \
  }

Note that they are now volatile too - we don't want the compiler to optimise out checks to see if they've changed. I've also moved the to the end of the struct - I'm beginning to think about alignment and I'm assuming if the struct is aligned in a useful way I don't want to misalign the three buffers by the 8 bits of the flag array. Initialising is a bit more simple:

/* initially dirty = 0, clean = 1 and snap = 2 */
#define TRIPLE_BUFFER_NEW(NAME,TYPENAME) \
  struct TYPENAME NAME; \
  NAME.flags = 0x6;

We have to update our accessors to get the index of the clean or snap buffer out of the flags, and return a pointer to that element of the array:

#define TRIPLE_BUFFER_SNAP_PTR(BUFFER) &BUFFER.buffer[BUFFER.flags & 0x3]
#define TRIPLE_BUFFER_WRITE_PTR(BUFFER) &BUFFER.buffer[(BUFFER.flags & 0x30) >> 4]

Google helps with binary-to-hex conversions. The meat of this problem is still the flip and snap operations. I've added an extra bit flag that is set by the writer and cleared by the reader. The reader can use this to determine if there have been any writes since its last snap, and if not it won't take another snap. This means if the macro is called twice it won't snap and "un-snap" as the previous implementation did:

#define TRIPLE_BUFFER_NEW_SNAP(BUFFER) \
  do { \
    uint_fast8_t flagsNow; \
    uint_fast8_t newFlags; \
    do { \
      flagsNow = BUFFER.flags; \
      if((flagsNow & 0x40)==0) break; \
      newFlags = (flagsNow & 0x30) | ((flagsNow & 0x3) << 2) | ((flagsNow & 0xC) >> 2); \
    } while(!__sync_bool_compare_and_swap(&BUFFER.flags, flagsNow, newFlags)); \
  } while(0)

The I-have-written flag is in the seventh most significant bit (0x40) and the newFlags bit-shifting logic is just building an int by or'ing the extracted indices after shifting them to their new positions. The flip method is similar:

#define TRIPLE_BUFFER_FLIP_WRITER(BUFFER) \
  do { \
    uint_fast8_t flagsNow; \
    uint_fast8_t newFlags; \
    do { \
      flagsNow = BUFFER.flags; \
      newFlags = 0x40 | ((flagsNow & 0xC) << 2) | ((flagsNow & 0x30) >> 2) | (flagsNow & 0x3); \
    } while(!__sync_bool_compare_and_swap(&BUFFER.flags, flagsNow, newFlags)); \
  } while(0)

And you can add the following to the bottom of the unit test to prove the snap-unsnap problem's been fixed:

*TRIPLE_BUFFER_WRITE_PTR(it) = 7;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 8;
TRIPLE_BUFFER_FLIP_WRITER(it);

TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 8);
TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 8);

5 comments:

  1. Good thing you posted a follow up :)
    Either way it will probably be some time before I have time to try out your triple buffer scheme.

    Thanks again for sharing

    ReplyDelete
  2. I built one of these years ago in a calculation app with throttled publishing, except the output queue had priorities. Nice.

    ReplyDelete
  3. Hi Remis,

    Thanks for the excellent post(s)!
    I've made a C++ port of your code using C++11 and templates.
    Take a look if you like:
    https://github.com/p4checo/triplebuffer-sync

    ReplyDelete