Usually we use data[44:]
to eliminate the header information of wav files, and then assume the rest data is PCM. This may not always be true.
For the wav header of a PCMWAVEFORMAT, we can usually understand that a structure as follows is at the front, and then followed by pure PCM, the format is as follows:
// externals/ext13/wavinfo.c : typedef struct _wave { char w_fileid[4]; /* chunk id 'RIFF' */ uint32 w_chunksize; /* chunk size */ char w_waveid[4]; /* wave chunk id 'WAVE' */ char w_fmtid[4]; /* format chunk id 'fmt ' */ uint32 w_fmtchunksize; /* format chunk size */ uint16 w_fmttag; /* format tag, 1 for PCM */ uint16 w_nchannels; /* number of channels */ uint32 w_samplespersec; /* sample rate in hz */ uint32 w_navgbytespersec; /* average bytes per second */ uint16 w_nblockalign; /* number of bytes per sample */ uint16 w_nbitspersample; /* number of bits in a sample */ char w_datachunkid[4]; /* data chunk id 'data' */ uint32 w_datachunksize; /* length of data chunk */ } t_wave;
The above formats are compatible with all wav formats.
Where w_fileid
is a fixed string RIFF
, 4 bytes, and the next 4 bytes w_chunksize
is the size of the entire file. The next four bytes are fixed WAVE
.
However, in the modern wav file, in addition to the 44 bytes, there are many extended formats. Although these formats are also standard wav, the formats are slightly different.
We can take an mp3 audio file and transcode it to a wav file through ffmpeg, and then use ffprobe to inspect the format information, the output may be as follows:
$ ffprobe -show_format sample-argcv-audio.wav ffprobe version 4.4.1 Copyright (c) 2007-2021 the FFmpeg developers .... [FORMAT] filename=sample-argcv-audio.wav nb_streams=1 nb_programs=0 format_name=wav format_long_name=WAV / WAVE (Waveform Audio) start_time=N/A duration=17.728000 size=567374 bit_rate=256035 probe_score=99 TAG:encoder=Lavf58.29.100 [/FORMAT]
Among them, we can see that there is a note TAG:encoder=Lavf58.29.100
, which is actually not something that can be generated by a classic structure.
If we describe it in the same way, it might be possible to split this wav file in a format like the following.
typedef struct _wave_chunk { char c_desc[4]; /* subchunk desc*/ uint32 c_size; /* subchunk size*/ char *data; /* subchunk detail*/ } t_wave_chunk; typedef struct _wave { char w_fileid[4]; /* chunk id 'RIFF' */ uint32 w_chunksize; /* chunk size */ char w_waveid[4]; /* wave chunk id 'WAVE' */ t_wave_chunk *w_chunks; /* a set of chunks */ } t_wave;
The first 12 bytes are still the overall description. After that is an ordered set of subchunks. Subchunks will obey the following rules:
- desc (4 bytes),
- length (4 bytes,uint32 le),
- and data (length bytes)
If we parse sample-argcv-audio.wav
, it looks like this:
RIFF (w_fileid, 4 bytes) 567366 (w_chunksize, uint32 le 4 bytes) WAVE (w_waveid, 4 bytes) : subchunk1 ( data[12:36] ) 'fmt ' (w_fmtid, also c_desc, 4 bytes) 16 (w_fmtchunksize, also c_size, 4 bytes) data (w_fmttag, w_nchannels, ... w_nbitspersample and so on,16 bytes in total) : subchunk2 ( data[36:70] ) 'LIST' (c_desc, 4 bytes) 26 (c_size, 4 bytes) data (INFOISFT...Lavf58.29.100... 26 bytes in total) : subchunk3 ( data[70:567374]) 'data' (w_datachunkid, also c_desc, 4 bytes) 567296 (w_datachunksize, also c_size, 4 bytes) data (data[78:567374]... 567296 bytes in total, which is the PCM data part we actually need) EOF
Note: Different files may have other formats, but always conform to the subchunk organization as above. But the PCM we want is always in the data
subchunk highlighted above.
As the above description, if we want to obtain the PCM signals correctly, we need to read the content of subchunk3 after removing the first 8 descriptions. That is, we need to cut off the first 78 bytes instead of 44 bytes to get the correct PCM. In the naive format, subchunk2 does not exist, then 36 + 8 = 44, which is exactly the backward compatible format.
Here is a simple implementation in golang:
// WavGetDataChunkRange guess the data chunk range in a wav file func WavGetDataChunkRange(fileName string) (int, int, error) { fp, err := os.Open(fileName) if err != nil { return 0, 0, err } defer fp.Close() sc := bufio.NewReader(fp) sc.Discard(12) // ignore w_fileid, w_chunksize and w_waveid buf := make([]byte, 4) readBuf := func() error { if n, e := sc.Read(buf); e != nil { return e } else if n != 4 { return io.EOF } return nil } headerSize := 12 for { if e := readBuf(); e != nil { return 0, 0, e } log.Infof("@%v: desc: %v", headerSize, string(buf)) isData := strings.ToLower(string(buf)) == "data" if e := readBuf(); e != nil { return 0, 0, e } length := int(binary.LittleEndian.Uint32(buf)) log.Infof("length: %v", length) if isData { return headerSize + 8, headerSize + 8 + length, nil } // move to next header sc.Discard(length) headerSize += 8 + length } }
Sample Output:
@12: desc: fmt length: 16 @36: desc: LIST length: 26 @70: desc: data length: 567296 input chunk from: 78 to 567374, err: <nil>
For the returned result, we can get data[78: 567374]
. Specifically in the current format, it is data[78:]
, which is the correct PCM data block supposed to find.