ScalarFlow is the data set accompanying our paper ScalarFlow: A Large-Scale Volumetric Data Set of Real-world Scalar Transport Flows for Computer Animation and Machine Learning by M.-L. Eckert, K. Um, and N. Thuerey.

21 front views of reconstructed densities.


Data Set Download
Source Code
Paper with Background Information

Our data set contains 104 reconstructions with 150 frames of 3D density, 3D velocity, 2D rendered images, and 2D input images (also see the PhD Thesis by M. L. Eckert [Eck19]). The full data set is ca. 450 GB in size, but can be downloaded in parts. The 3D quantities have a resolution of 100x178x100, while the rendered images have a resolution of 600×1062 and the input images a resolution of 1080×1920. The reconstructions use the same camera calibration, which is given as text files containing all rays for each of our five cameras. In the following, we show the front and side views of reconstructed density and velocity and a comparison of the rendered density and the target input images.

If you find the data useful, you can use the following bibtex citation to reference the data set:

@article{ScalarFlow2019, author = {Marie-Lena Eckert, Kiwon Um, Nils Thuerey}, title = {ScalarFlow: A Large-Scale Volumetric Data Set of Real-world Scalar Transport Flows for Computer Animation and Machine Learning}, journal={ACM Transactions on Graphics}, volume={38(6):239}, year={2019} }

Front and side views of reconstructed 3D density (left) and velocity (right).
Rendered reconstructed density (top) and target input images (bottom).

Each reconstruction is approximately 4.5GB in size and lives in its own folder, which contains thumbnail images and .mp4 videos summarizing the reconstructed quantities visually, a folder ‘input’, and a folder ‘reconstruction’. The ‘input’ folder contains the original recorded videos and the extracted, unprocessed and post-processed images as .jpg and .npz files. We store the 2D images as 3D arrays where the images are stacked along the z-axis. The ‘reconstruction’ folder contains .npz files for density, velocity, rendered and input images in its original form and with cropped inflow section. For cropping the inflow, we delete the density and velocity in the lower 15 cells. The quantities without inflow area are additionally visualized as .jpg. In each ‘reconstruction’ folder, we also store a ‘description.json’ file where all relevant grid names, parameter values, and meta data are listed. Furthermore, the python script used for initiating the reconstruction is stored as well.

The following table shows the max, min, mean, and standard deviation values over all reconstructions and for some individual samples. We calculate the Euclidean norm of the velocity. Since many values are zero, we also indicate the average percentage of cells with non-zero values.

minmaxmeanstd. deviation% of nz cells
velocity07.0180.080 0.20091.26
Overview over 104 reconstructed densities in ScalarFlow, front view.
Overview over 104 reconstructed densities in ScalarFlow, side view.

In the following, we present our simple, affordable, and reproducible hardware setup for capturing real fluid flows. We capture fog phenomena, which are representative for single-flow fluids, such as hot smoke plumes. Therefore, we refer to our traceable markers as smoke although we are using a fog machine to create them.

Hardware Setup

With our hardware setup, we create and record controllable, repeatable smoke plumes. Fog fluid for common fog machines is water-based while some fog machines diffuse water-free oils. Our visible tracers are droplets of distilled water with propylene glycol. Our setup consists of inexpensive components as listed below, which are easily replaceable and adjustable to custom needs. In total, the whole setup consists of hardware that is available for less than 1100 $. Therefore, in contrast to previous capturing setups in graphics [HED05, XIAP17], our setup is economical and can more easily be recreated.

We make use of several Raspberry Pi computers for recording, for controlling the amount and speed of smoke release, and for positioning our calibration board. We first discuss challenges, describe the smoke generation, present background and lighting, specify the details of our recording hard- and software, and outline post-processing steps. Our implicit and automatic camera calibration method is specified in the last paragraph.

This image has an empty alt attribute; its file name is smokeCapturingSetup.png

Figure 1: A rising smoke plume in a) and an overview of our simple fluid capturing setup with black molton as background, covered smoke box with opening on the top, Raspberry Pis including camera modules mounted on microphone stands, and three diffuser lights in b) and c). The smoke generation part consists of the alleviated smoke container with its top and bottom valves, an electric heating cable, and a fog machine as illustrated in d). The volume to reconstruct is placed above the top opening and is calibrated with a movable calibration board. These components are covered with black molton when recordings take place, as shown in the photographs in a), b), and c).

An exemplary rising smoke plume produced with our capturing setup is displayed in Figure 1a. The capturing setup is shown in Figure 1b, where the black background cloth and the three lights for illumination are visible. A different angle of our setup is presented in Figure 1c, which shows five
Raspberry Pis mounted on microphone stands recording a rising smoke plume. A sketch of the components necessary for smoke generation is illustrated in Figure 1d. While recording, those components are covered with black cloth. The dimensions of our reconstruction domain, which is highlighted in Figure 1d in blue, are 50 cm x 90 cm x 50 cm, where we refer to the height with L = 0.9 m. The kinematic viscosity of air at a temperature of 20 °C is v = 1.516 x 10^(-5) m^2/s , see [Enga]. Average smoke plumes rise with the upward speeds of approximately u = 0.27 m/s to u = 0.4 m/s, leading to a Reynolds number Re=uL/v = 0.27 x 0.9 x 1.516 x 10^5 ≈ 3.7 x 10^4 to Re=5.4 x 10^4. These Reynolds numbers indicate that the generated flows contain structures transitioning from laminar to turbulent. We produce our captures with a temperature of 34 °C, where interesting and swirly structures are formed in the smoke plumes while passing through our calibrated volume.


The first challenge we face when capturing fluid behavior from a common fog machine is the speed and the control of the exiting hot marker particles. Our machine produces a fast, hot stream of diffused water, which is not controllable as such. To slow down the stream, we fill the smoke into a container with an opening at the top. However, with an open outflow area, we are still not able to control the amount of smoke and the precise timing the smoke exits the container. Therefore, we need a valve for controlling the opening. As conservation of volume holds, we need a second opening accompanied with a valve to account for adequate air inflow. Otherwise, smoke is not able to leave the container. But, smoke cools down very fast, which reduces its buoyancy leading to smoke staying in the container even when opening both valves. Hence, we control the temperature of the smoke through heating the air in our container. With control over the smoke temperature as well as density and speed of outflowing smoke, we are able to generate controllable and varying smoke plumes on demand.

The second challenge is to create high contrast between the nonrelevant background and the foreground smoke plumes and to uniformly light the scene in order to simplify post-processing of the captured images. Cameras should see a bright and detailed smoke volume on dark background without obstructing obstacles or reflections. In order to fully make use of available information in an image, we avoid over- and underexposure of the smoke.

The third challenge involves synchronizing and storing highly resolved recordings. The cameras should record the dynamic smoke plumes at very similar times. Unfortunately, frames are dropped and lost if the amount of data to store exceeds the writing speed of the Raspberry Pi computers when writing onto their SD cards. It is challenging to find optimal temporal and spatial resolution as well as bit rate for high-quality recordings without dropping any frames.

The last challenge is to obtain information about how camera rays travel through the reconstruction volume in order to establish the relation of density in the volume and the pixel’s intensity. We obtain the pixel-voxel correlations through accurate camera calibration, see our last paragraph.

Smoke Generation

This image has an empty alt attribute; its file name is smokeGeneration-1024x398.png

Figure 2: Components of our smoke generation setup: smoke box with heating cable, top and bottom openings with closing lids controlled by servo motors in a), and a Raspberry Pi driving the remote control of the fog machine in b).

A sketch and photographs of our smoke generation components are shown in Figure 1d and Figure 2. As smoke container, we use an inexpensive, insulated Styrofoam box. The dimensions of the box are 54.5 cm x 35 cm x 30 cm. It features a thermal conductivity of 0.03 W/mK at a temperature of 25 °C [Engb], and hence exhibits low dissipation of heat. We use an electric heating cable to heat the smoke to a target temperature. The maximum temperature is 60 °C, which does not pose any safety risks for the box, as the melting point of polystyrene is 240 °C. Furthermore, our heating cable is connected to a safety timer and is controlled by a thermostat. With the thermostat, we ensure that the desired temperature is reached, kept, and not exceeded. In order to avoid damaging the surrounding box and to uniformly distribute the warmth, our heating cable is wired around a metallic fence as visualized in Figure 2a.

As polystyrene is easily editable, we simply cut openings with a carpet cutter into our Styrofoam box. The top opening allows for smoke outflow while the bottom opening grants air to flow into the box. Both can be closed through a movable lid, called valve, as displayed on the right of Figure 2a considering the top valve. The valves are controlled via strings connected to servo motors driven by a Raspberry Pi computer and are closed when filling the box with smoke. For controlled release of smoke, the bottom and top valve are opened to a certain extent. If the bottom valve is closed, the smoke does not rise from the box due to volume conservation. The box is elevated from the ground by four pillars to allow for proper air flow below the box. The bottom opening is 12 cm x 12 cm big while the top opening is 7 cm x 7 cm large. The opening extent of both valves and the smoke temperature influence the amount and speed of rising smoke and hence, how much turbulent structures are created.

Besides the top and bottom openings, we cut a circular hole into the side of the box to permit smoke inflow. To guide the smoke from the fog machine into the side opening, we place a straight silicone hose from the machine’s nozzle to the box. The silicone hose has a diameter of 25 mm, features low thermal conductivity, is resistant to humidity, and is physically stable. The smoke machine’s model is an Eurolite N-10. To automate and accurately steer smoke captures, we control the smoke machine’s remote control with a Raspberry Pi, which is shown in Figure 2b. When the green LED lights up, the machine is ready to release smoke and the Raspberry Pi eventually sends a signal to release smoke for a certain amount of time. The longer the signal is sent, the more smoke is pushed into the box.

With our box, adjustable smoke temperature, controlled smoke inflow, smoke outflow, and air inflow, we are able to create manipulable steady smoke flows forming interesting and turbulent structures. One example for such a plume is shown in Figure 1a.

Background and Lighting

In order to decrease reflections in the background, we put black molton cloth behind our smoke plumes. We use two large movable walls extended to the height with long sticks to form a black background of approximately 2.8 m x 2.2 m, as shown in Figure 1b. To reduce the required size of the movable walls, we arrange both within a small angle to each other, imitating a curved background. Hence, the walls serve as background for a wider angle of cameras as they would without the imitated curve. Black molton cloth forms minimal wrinkles and absorbs light, which reduces reflections. Furthermore, we can bend it to fit the slight angle between our two movable walls. In contrast, black paper cannot be bent and produces glossy effects due to reflections, which influence the resulting pixel’s intensities in the images tampering the reconstruction quality. The black background color is beneficial, as it increases the contrast between the background and the bright smoke plumes.

To further optimize contrast, illumination of the scene is important. Our goal is to illuminate the smoke plume uniformly from each angle, such that the perceived brightness is equal for each camera and smoke structures are clearly visible. We target fully exploiting the capacity of storing information in images with intensities from 0 to 255 where the background color should be black, i.e., 0, and the brightest parts of the smoke white, i.e., 255. Hence, we install three lights with diffuser softboxes in front of them, which produce a diffuse, soft, and nearly uniform illumination. We place the lights behind the cameras as shown in Figure 1b.


We record the smoke plumes with camera modules attached to Raspberry Pi computers. The computers are Raspberry Pi Models 1 or 2 while the camera are modules of version 1.

This image has an empty alt attribute; its file name is smokeArrangement-1024x454.png

Figure 3: Arrangement and close-up of our Raspberry Pis with mounted camera modules.


We mount both computers and cameras on microphone stands in order to flexibly adjust their position and orientation, as shown in Figure 1b and Figure 1c. We arrange the cameras in a partial circle of approximately 90° between the two outermost camera positions. The cameras are placed with equal distance of  120 cm to the smoke release, at a similar height as the smoke release, and facing slightly upwards. The positioning of the cameras is displayed in Figure 3a. A close-up of one Pi computer with a mounted camera module is visualized in Figure 3b. The Raspberry Pis are connected to a switch and a router via Ethernet, allowing them to communicate with a host computer. Each Raspberry Pi has a generic SD card or a MicroSD card with at least 10 MB/s of writing speed. We ensure that cameras do not see each other in order to facilitate post-processing the images. This poses a limitation to the angle the cameras are able to cover and to the orientation of the outermost cameras.

Technical Details

The cameras have three physical ports: a video, still, and preview port. We record the smoke plumes via the video port using an H264 encoder. Although the video data contains more noise compared to images taken via the still port, the video port is better suited for taking multiple images within short time intervals. The required frequency of taking images could not be realized by using the still port. We use camera video mode 1, where the spatial resolution is 1920 x 1080 pixels. We set the temporal resolution to 60 fps.

We empirically determined suitable exposure correction values for each camera to avoid over- and underexposure. The auto-white-balance is corrected by iteratively taking pictures of a white sheet of paper and adjusting the gain values until the color white is also perceived as white in each camera. White balancing is the process of eliminating chromatic discrepancies in images captured by digital cameras due to varying illumination conditions. The correction is performed by simple matrix multiplications.


As we record dynamic smoke plumes, it is important that each camera captures the plume formation process at approximately the same time. We control the cameras by connecting all Raspberry Pi clients to a host computer, from which the whole capturing process is directed. The clients receive the command to start the capture at a certain time. Before starting the captures, the host sets exposure and white balance values for each camera.

Another important factor is the size of data to be stored and the writing speed of the Raspberry Pi computers. Video streams can only be fully stored if the combination of frame rate, resolution, and bit rate does not exceed the possible writing speed of the SD card. Otherwise, frames are dropped and lost, which leads to an inconsistent and varying frame rate for each camera. If the frame rate is set too low or the image resolution is too small, we lose information about the temporal or spatial behavior of the smoke plumes. The bit rate determines how many bits are used to encode a frame. It takes the motion rank into account, which specifies how much the motion varies across frames. If the bit rate is set too low, compressing artifacts occur, which average pixel values and blur interesting and fine-scale smoke details. Our final capture videos exhibit a bit rate of 19848 kbps, a spatial resolution of 1920 x 1080 pixels, and a temporal resolution of 60 fps.


This image has an empty alt attribute; its file name is smokePostProc-1024x294.png

Figure 4: Effect of our post-processing pipeline demonstrated on one smoke plume recorded from two distinct camera views. The first image is the captured image converted to gray scale, the second is after applying denoising, and the third shows the fully post-processed image, i.e., after subtracting the background and after thresholding. The denoising effect is not obvious to observe, but removes noise in the background as well as in the smoke, which leads to smoother temporal behavior. The small insets on the lower right show a close-up of the noisy background area left to the plume. In image sequences, this time-varying noise is perceived as temporal flickering.

We first convert the H264-encoded videos to .mp4 in order to facilitate visual inspection. Then, we extract single frames off the .mp4 videos and convert them to gray scale. In order to reduce noise and hence temporal flickering, we use a Non-Local Means Denoising procedure [BCM11], called fastNlMeansDenoising in OpenCV, to denoise our sequences. The mean of the spatial and temporal neighbors is used to adjust the value of each pixel. As such, pixel windows of different sizes are used. This denoising procedure is computationally expensive but provides excellent denoised image sequences. We use a filter strength of 3, a temporal window size of 7, and a spatial search window size of 21. In order to eliminate background unevenness like wrinkles or varying brightness, we subtract a frame of the early sequence from each frame. The frame of the early sequence does not show any smoke behavior yet, but we do not use the very first frame since recording parameters need to adjust such that artifacts become minimal. As final step, we apply zero thresholding where we set each pixel with a value below 8 to zero in order to avoid that remaining background noise is interpreted as moving smoke particles. If images were recorded with more complex lighting and diverse background, we could include advanced methods from the image processing area to smoothly extract the smoke plume. With our implicit camera calibration technique outlined in the following section, we do not need to undistort the captured images in our post-processing pipeline.


We employ a simple, convenient, automatic, and accurate calibration method, which produces dense ray calibration data with optimized correspondence between rays and voxels. In Figure 5, we show our calibration board with affixed ChArUco markers, the rail on which the board moves, and a sketch of how reference points are found in space. The black molton cloth is lifted to uncover the marker. After calibration, the black cloth is put back to cover the box, the marker, and the fog machine.

This image has an empty alt attribute; its file name is smokeCalib-1024x383.png

Figure 5: Implicit calibration with a movable ChArUco marker.

Implicit vs. Explicit Calibration

In order to establish the connection between pixel intensities and voxel densities, traditional calibration approaches assume a specific camera model, determine its camera parameters, and undistort the captured images. Such approaches perform explicit calibration to define intrinsic (optical center, focal length, and skew coefficient) and extrinsic (location and orientation) camera parameters, as well as distortion coefficients. With these parameters, image distortions can be corrected and rays originating from each pixel and going through the volume can be calculated in world space. Such explicit methods are mechanically simple and flexible, as they only require static markers and a small amount of reference points. The disadvantage is that possibly simplified camera and distortion models must be assumed, which have a major impact on the accuracy of the calibration.

In regard to our capturing setup, we use an implicit method for camera calibration. Without undistorting images or determining position and orientation of the cameras, the corresponding rays are calculated directly by interpolating between reference points in space. The calculation of intrinsic or extrinsic camera parameters is dispensable. However, in order to calculate the line of sight for each pixel, at least two reference points with different depth levels are required, for which dynamic markers are needed. Moving markers are used where their exact positions in space are known. There exist many variations of such implicit camera calibrations [MBK81, WM93]. Implicit calibration methods are not limited to a camera or a distortion model, but are able to handle any types of cameras and even refracting materials between camera and target without losing accuracy. A high calibration accuracy can be achieved where outliers do not affect the calibration accuracy as much. With our implicit camera calibration, we obtain a starting position and a direction for each ray in world space. We step along the rays to determine which voxels influence the given pixel and to which degree. The weights specifying the degree of influence of each voxel to each pixel are stored in our image formation matrix P, see our papers [EUT19] and [EHT18].

Implementation For implicit calibration, it is inherently important to have access to the precise location of the marker. We mount a marker board on a ball screw rail controlled and driven by a stepper motor and a Raspberry Pi. Through rotating the motor by a given angle, the spindle turns a small amount, which in turn moves the rail forward linearly. The board is only moving into one direction, namely straight along the rail. In every position, each camera takes an image after waiting for the wobbling of the marker to stop, which typically occurs after moving the marker to a new position. The marker is stopped at twenty-one equidistant positions. The home position is accurately defined, as the rail has an end switch, which is pressed when the marker is returned completely. We use a ChArUco marker of size 50 cm x 100 cm glued to a flat polystyrene board, which is mounted onto the rail. A ChArUco marker combines the benefits of chessboard and ArUco markers. The corners of two squares in a chessboard are detected very accurately. ArUco markers have a unique ID and are hence identified even when the marker is partially invisible. Placing ArUco markers in the white squares of a chessboard as shown in Figure 5a ensures that the patterns of the ChArUco marker are detected with a high accuracy even with partial occlusion.

For each pixel, an average line going through all reference points is calculated by interpolating between marker points and calculating image-to-world transformation matrices. The resulting ray directions for each of the five cameras, from left to right, are encoded in color as shown in Figure 6. Pixels without an assigned ray are black. The borders of the markers are usually not well detected. Rays for such pixels and also for pixels, which are geometrically not able to see the marker, are extrapolated from the neighboring pixels’ rays. For good calibration accuracy, it is necessary that each camera is able to see the marker board in each of its positions. Steeper viewing angles, such as for the outermost cameras, lead to less accurate calibration data. The volume right above the box, which is covered by the marker, is calibrated.

This image has an empty alt attribute; its file name is smokeCalib2-1024x354.png

Figure 6: Calibrated ray directions in world space encoded in RGB values for our five cameras, from left to right.


The accuracy of such implicit calibration methods is high if a large number of reference points is given as in our case. The rays are directly calculated from the marker images without any assumptions about intrinsic or extrinsic camera parameters. However, estimated rays covering the margins of the marker or no marker at all possess a low fidelity. Calibration errors still occur due to inaccuracies of corner detection, due to sub-pixel lens distortion, or due to inaccurate marker positions. As the marker exhibits high contrasting black and white areas, images of the marker tend to be overexposed, which leads to bright areas bleeding into darker areas. While chessboard detection is still accurate with overexposure artifacts, the identification of ArUco markers is more difficult. Such undetected markers are discarded, which results in incomplete calibration data. However, we found our calibration method to work well, as our reconstruction technique is robust to small calibration errors.

List of Hardware Components for our Capturing Setup

The prices listed for each component are the ones valid when buying the items. The sum in Euros is 986 €, which makes up less than 1100 $.

  1. Black molton cloth: dimensions 3 m x 2 m, 10 €, amazon
  2. Clips for cloth: 10 €, amazon
  3. Raspberry Pi: version 1, 2, or 3  35 €, SD card 10 €, power cable 10 €, seven pieces, conrad
  4. Raspberry Pi camera module: V1 35 €, microphone stands 10 €, five pieces, conrad
  5. Fog machine: Eurolite N-10, dimensions 5.7 cm x 10.8 cm x 6.7 cm, 37 €, amazon
  6. Items for driving remote control of fog machine: 10 €
    a) relais: amazon
    b) strip board: conrad
    c) optical coupler: conrad
  7. Fog fuid: EUROLITE Smoke Fluid – P, 7.5 €, conrad
  8. Machine cleaner: Nebelmaschinen-Reiniger 590298, 16 €, conrad
  9. Servo motor: Absima Standard-Servo S60MH Analog-Servo, 14.5 €, two pieces, conrad
    a) previous version, performing sufficiently: 4er MG90S Mini Metal Gear Analog Servo, 5 €, conrad
    b) previously used servo for side opening, not needed anymore: Modelcraft Mini-Servo MC1811 Analog-Servo, 4 €, conrad
  10. Silicone hose: Schlauchland, 4 €, amazon 
  11. Styrofoam box: Thermobox, expanded polystyrene foam (EPS), dimensions 54.5 cm x 35 cm x 30 cm, 9 €, hornbach
  12. Heating cable: Terrarium heating cable, 17.5 €, amazon
  13. Metallic fence: hadra G12500V5I verzinktes, punktgeschweißtes Gitter, 16 €, amazon 
  14. Controller thermostat: KKmoon Thermostat Controller, 14 €, amazon
  15. Timer: GAO Mechanischer Countdown Timer, 10 €, amazon
  16. Stepping motor: Quimat Nema 17 Schrittmotor, 24 €, amazon
  17. Rail: 1204 Ball Screw Linear Rail Stroke Long Stage Actuator with Stepper Motor 400MM, 75 €, ebay (not available any more, a comparable item is to be found here ebay)
  18. Smaller items: network cables, wood, glue, 100 €

[BCM11] BUADES A., COLL B., MOREL J.-M.: Non-local means denoising. Image Processing On Line 1 (2011), 208–212.

[Eck19] ECKERT M.-L.: Optimization for Fluid Simulation and Reconstruction of Real-World Flow Phenomena. PhD Thesis.

[EHT18] ECKERT M.-L., HEIDRICH W., THÜREY N.: Coupled fluid density and motion from single views. Computer Graphics Forum 37 (2018), 47–58.

[Enga] Engineers Edge: Viscosity of air, dynamic and kinematic. Accessed: 2019-07-23

[Engb] The Engineering ToolBox. Accessed: 2019-07-23

[EUT19] ECKERT M.-L., UM K., THUEREY N.: ScalarFlow: A large-scale volumetric data set of real-world scalar transport flows for computer animation and machine learning. ACM Transactions on Graphics (2019).

[HED05] HAWKINS T., EINARSSON P., DEBEVEC P.: Acquisition of time-varying participating media. ACM Transactions on Graphics 24, 3 (July 2005), 812–815.

[MBK81] MARTINS H., BIRK J. R., KELLEY R. B.: Camera models based on data from two calibration planes. Computer Graphics and Image Processing 17, 2 (1981), 173–180.

[WM93] WEI G.-Q., MA S.: A complete two-plane camera calibration method and experimental comparisons. In Proceedings of IEEE International Conference on Computer Vision (1993), IEEE, pp. 439–446.

[XIAP17] XIONG J., IDOUGHI R., AGUIRRE-PABLO A. A., ALJEDAANI A. B., DUN X., FU Q., THORODDSEN S. T., HEIDRICH W.: Rainbow particle imaging velocimetry for dense 3D fluid velocity imaging. ACM Transactions on Graphics 36, 4 (2017), 36.