Usually we use data[44:] to eliminate the header information of wav files, and then assume the rest data is PCM. This may not always be true.

For the wav header of a PCMWAVEFORMAT, we can usually understand that a structure as follows is at the front, and then followed by pure PCM, the format is as follows:

// externals/ext13/wavinfo.c :
typedef struct _wave
{
    char  w_fileid[4];              /* chunk id 'RIFF'            */
    uint32 w_chunksize;             /* chunk size                 */
    char  w_waveid[4];              /* wave chunk id 'WAVE'       */
    char  w_fmtid[4];               /* format chunk id 'fmt '     */
    uint32 w_fmtchunksize;          /* format chunk size          */
    uint16  w_fmttag;               /* format tag, 1 for PCM      */
    uint16  w_nchannels;            /* number of channels         */
    uint32 w_samplespersec;         /* sample rate in hz          */
    uint32 w_navgbytespersec;       /* average bytes per second   */
    uint16  w_nblockalign;          /* number of bytes per sample */
    uint16  w_nbitspersample;       /* number of bits in a sample */
    char  w_datachunkid[4];         /* data chunk id 'data'       */
    uint32 w_datachunksize;         /* length of data chunk       */
} t_wave;

The above formats are compatible with all wav formats.

Where w_fileid is a fixed string RIFF, 4 bytes, and the next 4 bytes w_chunksize is the size of the entire file. The next four bytes are fixed WAVE.

However, in the modern wav file, in addition to the 44 bytes, there are many extended formats. Although these formats are also standard wav, the formats are slightly different.

We can take an mp3 audio file and transcode it to a wav file through ffmpeg, and then use ffprobe to inspect the format information, the output may be as follows:

$ ffprobe -show_format sample-argcv-audio.wav
ffprobe version 4.4.1 Copyright (c) 2007-2021 the FFmpeg developers
....
[FORMAT]
filename=sample-argcv-audio.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=17.728000
size=567374
bit_rate=256035
probe_score=99
TAG:encoder=Lavf58.29.100
[/FORMAT]

Among them, we can see that there is a note TAG:encoder=Lavf58.29.100, which is actually not something that can be generated by a classic structure.

If we describe it in the same way, it might be possible to split this wav file in a format like the following.

typedef struct _wave_chunk {
    char c_desc[4];                  /* subchunk desc*/
    uint32 c_size;                   /* subchunk size*/
    char *data;                      /* subchunk detail*/
} t_wave_chunk;

typedef struct _wave
{
    char  w_fileid[4];              /* chunk id 'RIFF'            */
    uint32 w_chunksize;             /* chunk size                 */
    char  w_waveid[4];              /* wave chunk id 'WAVE'       */
    t_wave_chunk *w_chunks;         /* a set of chunks            */
} t_wave;

The first 12 bytes are still the overall description. After that is an ordered set of subchunks. Subchunks will obey the following rules:

  1. desc (4 bytes),
  2. length (4 bytes,uint32 le),
  3. and data (length bytes)

If we parse sample-argcv-audio.wav, it looks like this:

RIFF (w_fileid, 4 bytes)
567366 (w_chunksize, uint32 le 4 bytes)
WAVE (w_waveid, 4 bytes)

: subchunk1  ( data[12:36] )
    'fmt ' (w_fmtid, also c_desc, 4 bytes)
    16     (w_fmtchunksize, also c_size, 4 bytes)
    data   (w_fmttag, w_nchannels, ... w_nbitspersample and so on,16 bytes in total)

: subchunk2  ( data[36:70] )
    'LIST' (c_desc, 4 bytes)
    26     (c_size, 4 bytes)
    data   (INFOISFT...Lavf58.29.100... 26 bytes in total)

: subchunk3  ( data[70:567374])
    'data' (w_datachunkid, also c_desc, 4 bytes)
    567296 (w_datachunksize, also c_size, 4 bytes)
    data   (data[78:567374]... 567296 bytes in total, which is the PCM data part we actually need)

EOF

Note: Different files may have other formats, but always conform to the subchunk organization as above. But the PCM we want is always in the data subchunk highlighted above.

As the above description, if we want to obtain the PCM signals correctly, we need to read the content of subchunk3 after removing the first 8 descriptions. That is, we need to cut off the first 78 bytes instead of 44 bytes to get the correct PCM. In the naive format, subchunk2 does not exist, then 36 + 8 = 44, which is exactly the backward compatible format.

Here is a simple implementation in golang:

// WavGetDataChunkRange guess the data chunk range in a wav file
func WavGetDataChunkRange(fileName string) (int, int, error) {
   fp, err := os.Open(fileName)
   if err != nil {
      return 0, 0, err
   }

   defer fp.Close()

   sc := bufio.NewReader(fp)
   sc.Discard(12) // ignore w_fileid, w_chunksize and w_waveid

   buf := make([]byte, 4)

   readBuf := func() error {
      if n, e := sc.Read(buf); e != nil {
         return e
      } else if n != 4 {
         return io.EOF
      }

      return nil
   }

   headerSize := 12

   for {
      if e := readBuf(); e != nil {
         return 0, 0, e
      }
      log.Infof("@%v: desc: %v", headerSize, string(buf))

      isData := strings.ToLower(string(buf)) == "data"

      if e := readBuf(); e != nil {
         return 0, 0, e
      }

      length := int(binary.LittleEndian.Uint32(buf))
      log.Infof("length: %v", length)

      if isData {
         return headerSize + 8, headerSize + 8 + length, nil
      }

      // move to next header
      sc.Discard(length)
      headerSize += 8 + length
   }
}

Sample Output:

@12: desc: fmt
length: 16
@36: desc: LIST
length: 26
@70: desc: data
length: 567296
input chunk from: 78 to 567374, err: <nil>

For the returned result, we can get data[78: 567374]. Specifically in the current format, it is data[78:], which is the correct PCM data block supposed to find.

References

Categories: Code

Yu

Ideals are like the stars: we never reach them, but like the mariners of the sea, we chart our course by them.

Leave a Reply

Your email address will not be published. Required fields are marked *