12 Sequence element
Making presentations on the web usually involves videos, but they are very resource-intensive. A much more flexible and low-resource alternative is the sequence element.
Many presentations on videos are basically slideshows with commentary. While the individual images and audio snippets themselves may not be much in the way of resources, as video they can stress the playback capabilities of servers for small sites, especially if wanting full HD. The
- a.Provides an ordered sequence of parts, each being a timed sequence of audio and/or images and text, such as for slideshows.
- b.Audio only needs to be the lengths required for the actual voice, and not the silences in between.
- c.Audio can be looped to minimise resources used.
- d.Audio, images and text can be grouped for easy timing adjustments.
- e.Audio and text for other locales can be added when available.
This arrangement needs no video skills and allows variations of timing and order to be done within Smallsite Design. Some audio recording and cleanup skills are needed but DAW (digital audio workstation) software is not required, though audio spectral repair software like in Audacity or RX can make audio cleanup very fast.
The sequence element relies completely upon JavaScript for its functionality. JavaScript is usually enabled by default, but if it is not available, a message will be displayed and no controls will be available. To cater for such situations, a link to an alternative can be provided, such as a procedure article or YouTube video if one exists already. The link will be displayed after the message. Note that the element will not work in an article saved as a single file that embeds all files.
Minimising size and bandwidth△
The use of the sequence element can produce a significant reduction in content size and reduce the load on the server so much that keeping multiple media presentations on the site becomes feasible.
As an example of how much resource usage can be minimised, a 15 minute high-quality MP3 meditation track consisting of a background track with occasional voice guidance and some bell sounds is about 20MB, whereas the same using a short looped background and audio snippets is 1.5MB. Of course, 15 minutes of constant talking with a separate full-length backing track would be 40MB, but breaking up the audio and being able to edit the sequencing will likely provide more flexibility when it comes to making changes.
For multi-lingual sites, only the audio snippets and text need to be added for each locale variant using the existing sequence of elements, rather than having to create another whole video for each. This should make the whole creation process faster and easier to produce and modify.
Similarly, a 15 minute full HD 24fps MP4 video of a presentation consisting of almost continuous commentary over a series of 50 slides is 1.5GB, whereas as a collection of MP3 snippets and 50 full HD MPEG images is 28MB, a saving of over 80%, and uses a lot less site bandwidth, especially as looped files are cached in the browser. While catering for another language would require another 1.5GB video, using a
Server and browser load is reduced because each file is only retrieved once, with looped audio streaming from the same loaded copy.
Sequence structure△
The sequence collection of elements is a simplified selection of basic functional blocks.
SequencePresentation element. Can have an image as a backdropPartOne of up to 12 self-contained presentations ImageImage for the audio. Must be same size as the sequence's AudioClip of voice, music or other sounds CaptionUp to two short lines of text of the audio or comments GroupCollection of elements to schedule as one ImageSame as at the part level AudioSame as at the part level, except no looping CaptionSame as at the part level TrackOne of four summing nodes with gain, pan and compressor OutputFeeds the device's sound system. Has gain and compressor
Part△
A part is a sequenced collection of one or more of the other elements, with its own title text.
If a
loadingplayingtime display
areaSequence
imagePart
titleImage GroupCaptionCaption
(blank)CaptionCaption
(blank)Audio (voice-over)
AImage
(blank)Audio
D Track AGainPanCompressor Track DGainPanCompressor Output CompressorGain System audio
- a.There are some phases to playing a part.
- b.The display area is where all images and text are shown.
- c.There are blank images and captions that end display of the previous element of the same type, if required.
- d.Each audio element is connected to a single track.
- e.All tracks feed into the output, which feeds the device's sound system.
All elements in a part have a timecode, which when adjusted makes the earliest element align with the start of the part by making its timecode 00:00, and all other elements have their timecodes adjusted by the same zeroing offset. For example, if the current earliest element (at 00:00) is moved to be 15 seconds earlier, its timecode would be set to 00:00 but all the other elements would have their timecodes increased by that same 15 second offset, making one that was at 00:10 become 00:25.
Conversely, if the earliest element is moved to later, all other elements would be moved earlier by the same offset. The situation to be wary of is when moving the first element to later than other elements as the earliest of those would have its timecode set to 00:00 and all other elements would be adjust by the amount of the second element's original offset. For example, if the second element had a timecode of 00:10, and the first element was delayed by 15 seconds, all the elements would have their timecodes reduced by the 10 seconds, not the 15 seconds, and the moved element would now be at 00:05.
Deleting the first element may also trigger timecode recalculations, so if planning to do significant timing changes that may create inadvertent changes to other elements' timecodes, add a new element as a timing reference placeholder (will be created with 00:00 timecode) to prevent such changes, and which can be deleted when all adjustments have been made, precipitating a global adjustment of all remaining elements which will keep their relative offsets.
Normally, each part is independent, only being played when selected and the play button clicked. Choosing another part will not result in it being played immediately. This allows a sequence to hold several related presentations. However, a sequence can be set to automatically play the next part by enabling
Audio△
A snippet of audio, typically of less than a minute. They have a selection of fade in and out times.
Standalone audio elements are meant for a single complete block of audio, but possibly requiring multiple files for different audio formats, each of which may be optimised for minimal file size for specific operating systems. The
Each audio is connected to only one track, which has a
An
Background music at the start is usually better with longer fade-ins, with eight seconds providing a smooth lead-in. At the end, the background can be faded out at the same eight seconds. This means that such clips would normally not be included in a loop, though they may still be part of a group if there are images or other sounds accompanying them. Note that
Image△
Images allow a sequence to be a slideshow, with optional voiceovers and/or music.
An
Images can be animated GIF files, allowing for some video-like motion. GIFs are palette-based, which means that if not many colours are required, they can be a lot smaller than full-colour images. This makes them well-suited for instructional animations.
Caption△
Captions are typically text of the spoken commentary, but can be used to describe any other audible actions for those with hearing difficulties.
A
Captions are for providing the text of what is spoken, and perhaps other sounds, for those who are deaf or hearing impaired. See A guide to the visual language of closed captions and subtitles for further explanations, and Captions and Subtitles Formatting for a brief summary of formats. Many of the scenarios described mainly apply to video which have a lot more variety of interactions than for typical sequence element uses, which might only have one narrator. Captions here are kept simple and are plain text, which still allows for many of the formats described.
As captions can be a maximum of two lines, many may be required for one clip of voice audio. The trick is in making them line up with the spoken words. Even in a presentation without audio, captions can be used to describe the images and what to do in relation to them, but time must be allowed for the viewer to read without stress, so the words still need to be spaced apart.
- 1.Play the clip in an audio media player that shows the current elapsed time to the second.
- 2.At each suitable phrase or words, note the time and the words used.
- 3.Create a group in a part and place the audio for the voice starting at 00:00.
- 4.For each piece of text, create a
caption in the samegroup , move it to the noted time, and add the text. - 5.Add any other elements to be part of the same group, such as related images, setting their timecode as required.
- 6.If any of the elements in the
group need to be before theaudio , they can be moved to their final time now, which will cause all the other elements to be moved later, while keeping the previously relative timings of theaudio andcaptions . - 7.Repeat for the other voice audio clips.
- 8.Set the
group times in thepart .
This way, the timings for
Clips might have to be split if extra time is required to be able to read the
Pointer△
A pointer provides a means of indicating the current area being focused upon.
The
When the next
Given that nothing can be seen until a
Group△
A group is a collection of any of the other elements, whose starting times are relative to the earliest element in the group.
In making a presentation, often many elements are related in time to each other, and which may need to be moved together when modifying timing. The
Being able to adjust timing is especially important for multi-locale presentations, because when a new locale is added, the timing may have to be more spaced out due to its language being more verbose and taking longer to express. Being able to shift related elements together as a group expedites adapting to the new locale.
Track△
There are up to four tracks, each being automatically created when selected for an audio element.
Other than fade-ins or -outs, there are no gain controls on
If a
Panning allows for the content of the
Each
For presentations or meditations, background music should be enough to mask other noises, but never be near the levels of voices. These are not for song mixes where music levels would be high and then reduced (ducked) during vocals. The goal of background music is usually not to be the entertainment, but to mask distracting noises in the listener's environment.
Output△
The output is created if there is any audio, and all the tracks feed into it.
The
The
While compressors can lessen the chance of gross digital distortion that would result from overload, they are not a cure for excessive levels. They will distort and produce unwanted artifacts if pushed. If there are several tracks, make sure that the loudest one is less than 0dB, as their levels accumulate, perhaps leading to overdriving the
Controls and playback phases△
Each playback phase only allows certain controls.
# | Name | Icon | Description |
---|---|---|---|
a | Play | Starts playing the current part | |
b | Pause | Pauses the currently playing part | |
c | Previous | Moves to the previous part, if any. Appears ghosted if at the first part | |
d | Next | Moves to the next part, if any. Appears ghosted if at the last part | |
e | Parts: | -- | Reveals a table of parts where clicking on a part's |
PlayloadedPauseStoppedLoadingPlayingPaused finishedPlay
The solid lines indicate manual actions, while the dotted lines are for automatic actions triggered by the indicated event.
There is no explicit stop button, though the stopped phase is returned to when a part has finished. Buttons that will be ignored display as ghosted, like
The parts list is always available. The
If paused, a ⇌ shows before the checkbox, indicating that there are several options to skip around the current part in a table before the parts list. Options include going to the beginning or forward or backward by various time periods.
Limits△
Dealing with multimedia, the element has some limitations to prevent excessive resource usage.
While there is no limit on the length of audio clips, the underlying web audio technology is better suited to short clips, preferably under 45 seconds. Total duration of a
There are some limitations on what elements appear during playback. Multiple audio streams can play in parallel and even start at the same time. If there are any clashes, they will be immediately obvious during playback. For anything else, it makes no sense to show any more than one of any other element type at the same time, but since such situations can occur during timing adjustments, some rules are implemented to ensure unnecessary conflicts are avoided.
- a.Only one non-
audio element of the same type at the same timecode will be available. - b.Only the first, by
part -level first, will be included in the playlist. - c.Only the first of an
image orpointer at the same timecode will show. - d.If there is no current
image at the scheduled time for apointer , it is ignored. - e.Any audio of a loop starting after 59:59 will be ignored.
Therefore, if an expected element does not appear, check for timecode duplications first.
Any
Recording voice△
Recording voice has some risks because there are many sounds produced by the mouth that interfere with getting clear speech, besides the usual background noises.
The major one is plosives, which are produced by words starting with p. The trick is to not record straight ahead of the mouth, as that is where their force is, but most of the richness and level of a voice is produced off-centre. The technique on page 109 of Mixing with your mind by Michael Stavrou involves using a microphone (or an ear) to hear where the voice is most full. Lips are normally curved around that point, in a bias that increases with age, making it easier to know where to start. The offset from straight ahead is usually 20-30o, and always on the same side for a particular person.
The whole of chapter 7 of the book deals with getting the best vocals, but this optimisation is easy to accomplish without much effort. A bonus is that there will likely be no need for a large ugly pop shield in the way. However, what is not seen in the image in the aside is a small piece of pop-shield foam that covers the area of the microphone's diaphragm to stop any spittle droplets that may come its way. It was held on by some fine wires that hooked into the grill.
A phone can be used to record voice by holding it close and aiming the bottom of the phone at the optimal mouth position, but to cut out a lot of room echoes, perhaps position a blanket over a door or other high furniture behind the microphone to reduce the unwanted room reflections reaching the mic. The blanket can also be used to pin the sheet that has the text to be spoken, to avoid moving the mouth away from the mic due to looking away at text that is not straight ahead.
Other artifacts produced by the voice will need to be handled in whatever audio editor is used, and only a spectral editor will enable dealing with some.
Preparing audio clips△
For a quality presentation, there will need to be some editing of each raw audio clip.
When recorded, any audio will have leading and trailing periods where there are just background noises, while vocals will likely have various unwanted noises like mouth clicks and excessive sibilance. All these have to be minimised so they do not distract the listener. If background levels are too loud or in the middle of words, unless skillful with how to exorcise them with a spectral editor, the audio will have to be recorded again at a quieter time.
Music clips to be used for looping need to be trimmed to the length required, but they do not need fade-in or -outs before importing.
When recording, ensure that maximum levels are 1-2dB below the maximum 0bB. If recorded too close to 0bB, while the incoming audio may not not clipped in conversion to digital, when reconverted to analog, the reconstructed signal between the samples can be more than the analog circuitry can handle without cutting the excess off, resulting in harshness.
Processing△
For optimal use with the sequence element, audio needs to be made into clips.
- a.Reduce excessive spikes.
- b.Maximise levels using a normaliser effect, though items to be on the same track need to have the same level settings. The recommended maximum is -2dB.
- c.Cut the audio where there are longer gaps. A good metric for size is what will allow its words to fit in a caption, or a few if close together.
- d.Trim off the leading and trailing regions of each clip that are not required.
- e.Reduce the short quiet parts between words to silence.
- f.Cut out low frequency noise.
- g.Reduce the levels of the areas of predominantly sibilant sounds like s, soft cs or ts. At the end of a word, they can be faded-out fairly quickly.
- h.Cut out mouth clicks and other unwanted artifacts (spectral editor).
- i.Cut out breathing noises between phrases, perhaps with a fade-out on the preceding phrase.
- j.For fairly pure sounds like bells, cut out the noise in the areas around the fundamental frequency and harmonics. Higher harmonics are shorter than lower ones, so noise trailing those can be silenced as well (spectral editor).
Being fairly ruthless with cutting out unwanted noises and room sounds will produce the clearest vocals. While that might make them sound odd in a video of a lecture, as a voiceover to screenshots or slides, they are far less distracting without room sounds, breathing and mouth clicks in the spaces between words. Of course, it is better to record at quiet times when there are few noises in the environment.
Deciding what to have in each clip is a balance between the space all the files occupy and the number of pieces of audio that are required to be sequenced. Making a clip out of each word would be excessive. Preferably break up the audio at gaps between sentences, allowing the timing to be optimised in the sequence element editing. Long pauses are the perfect breakpoints. For example, for meditations or presentations with intermittent guiding vocals, there are large gaps of many seconds or even minutes that can be cut out.
If using captions for the words of the audio, while editing try to make sentences of phrases start on a whole second by cutting or inserting in the spaces between phrases, as long as it still sounds natural. If catering for multiple languages, allow enough space for those that may take longer to say.
When making clips for looping background music, make them 10, 12, 15 or 20 second plus three seconds for the overlap for crossfading. Making the duration for the clip in one of these lengths makes it easier to know how many loops are required per minute. For example, an 18-second clip with a
Uses△
Because of its dynamic nature, the sequence element offers more than static content.
- a.Instructional content, rather like the many YouTube videos that show how to do tasks in computer programs.
- b.Slideshows, where are series of images are shown, with voice or captions describing or elaborating of the elements of the images or their meanings.
- c.Guided meditations, optionally with images.
While there may be other types of presentation that the
Instructions△
From time to time, we have all needed help with how to do a task in a program or some tool we have.
At those times, we have turned to YouTube and there is usually some helpful presenter with the solution to our quest. However, for us to be one of those helpful people, we have to come to grips with video technology and processes, as well as the idiosyncrasies of YouTube algorithms and rules. Plus we have to be the talking head, which presents a whole other level of stress, especially for those more inclined to introversion.
Videos capture movement, but that is a resource-intensive way of capturing information that does not require 24 pictures per second, forcing use of a beefed-up third-party platform to deliver a succession of images that are two orders of magnitude more rapid than many instructions require. Videos also waste a lot of frames capturing the gaps in the vocals, adding to more wasted bandwidth.
The sequence element breaks providing instructions down to a sparse succession of images accompanied by sporadic audio clips and captions. While producing those are not trivial, they can be produced independently of each other, removing the stressful and error-prone process of making them all happen simultaneously that video requires. Coordinating the elements of a sequence element presentation is not a real-time process. Making videos may look easy, but they are usually the result of many people synchronising their activities to make them happen, in the days before and during the video.
Conceptually, instructions using the sequence element may appear to be similar, or even preferable to using a procedure article. However, while the former can be used in many scenarios, procedures are designed for a formal multi-level documenting of processes and instructions. It is better to see them as complementary, where the former can be more colloquial, but deferring to the latter for the formal precision it offers.
Slideshow△
We have been exposed to slideshows through PowerPoint for almost 50 years, so it is a popular presentation paradigm.
The
The
Meditation△
Guided meditations can be audio only, but captions allow for use by the hearing-impaired or translations.
Meditations usually consist of sporadic commands with soft background music. The
While some audio skills are still required for editing, these are considerably less than for the level of proficiency and learning curve required for creating quality audio in audio workstation software. The added bonus is that images can be provided to help set listeners in the mood.
Note that while many meditation creators seem to be focused upon very talky and numerous variants, to actually be helpful, guided meditations need to be fairly lacking in talking to allow the listeners time to mentally action the guidance. Also, there is no need to have a lot of variants, as most people, once they understand the process, can adapt it to use for other areas of their life. Meditation is a process for taking control of our own thinking, not outsourcing it to others so that we become dependent upon them.
Self-sufficiency△
We live in a world that wants us to be dependent upon their cloud infrastructure, but simplifying our requirements and creative processes can free us from that.
YouTube is a juggernaut that has taken over as the premier presentation platform, but it is not necessary for many uses, and using it only forces the use of centralised power-hungry server farms that are mainly engaged in exploiting user data for advertising. Moving content that does not require such resource usage while allowing us to produce in our own cloud – our web site – frees us from dependence upon those who only want to exploit us and our viewers.
For such a simple tool, the