Making presentations on the web usually involves videos, but they are very resource-intensive. A much more flexible and low-resource alternative is the sequence element.
Many presentations on videos are basically slideshows with commentary. While the individual images and audio snippets themselves may not be much in the way of resources, as video they can stress the playback capabilities of servers for small sites, especially if wanting full HD. The sequence element collection allows for creating a presentation just from those images, audio snippets and text.
The advantages of the sequence elements are:
- a.Provides an ordered sequence of parts, each being a timed sequence of audio and/or images and text, such as for slideshows.
- b.Audio only needs to be the lengths required for the actual voice, and not the silences in between.
- c.Audio can be looped to minimise resources used.
- d.Audio, images and text can be grouped for easy timing adjustments.
- e.Audio and text for other locales can be added when available.
This arrangement needs no video skills and allows variations of timing and order to be done within Smallsite Design. Some audio recording and cleanup skills are needed but DAW (digital audio workstation) software is not required, though audio spectral repair software like in Audacity and RX can make audio cleanup very fast.
The limitations of sequences are:
The use of the sequence element can produce a significant reduction in content size and reduce the load on the server so much that keeping multiple media presentations on the site becomes feasible.
As an example of how much resource usage can be minimised, a 15 minute high-quality MP3 meditation track consisting of a background track with occasional voice guidance and some bell sounds is about 20MB, whereas the same using a short looped background and audio snippets is 1.5MB. Of course, 15 minutes of constant talking with a separate full-length backing track would be 40MB, but breaking up the audio and being able to edit the sequencing will likely provide more flexibility when it comes to making changes.
For multi-lingual sites, only the audio snippets and text need to be added for each locale variant using the existing sequence of elements, rather than having to create another whole video for each. This should make the whole creation process faster and easier to produce and modify.
Similarly, a 15 minute full HD 24fps MP4 video of a presentation consisting of almost continuous commentary over a series of 50 slides is 1.5GB, whereas as a collection of MP3 snippets and 50 full HD MPEG images is 28MB, a saving of over 80%, and uses a lot less site bandwidth, especially as looped files are cached in the browser. While catering for another language would require another 1.5GB video, using a sequence would only require adding the locale variants to the audio files, locale text for the captions, and a locale variant for those images that had text in them that needed translation.
Server and browser load is reduced because each file is only retrieved once, with looped audio streaming from the same loaded copy.
The sequence collection of elements is a simplified selection of basic functional blocks.
The hierarchy of elements for a sequence is:
SequencePresentation element. Can have an image as a backdropPartOne of up to 12 self-contained presentations ImageImage for the audio. Must be same size as the sequence's AudioClip of voice, music or other sounds CaptionUp to two short lines of text of the audio or comments GroupCollection of elements to schedule as one ImageAs for a part AudioAs for a part CaptionAs for a part TrackOne of four summing nodes with gain, pan and compressor OutputFeeds the device's sound system. Has gain and compressor
A part is a sequenced collection of one or more of the other elements, with its own title text.
If a part only contained audio elements, it would be like a track in an audio player, but with images and text overlays becomes a self-contained segment of a presentation. A part has a title, which is shown when a part is selected, and when played, for two seconds or until all its files are loaded, whichever is later. If the sequence has an image, the title is shown in a box with background colour and border, centred over the image. If no image, the title is shown centred in the display area.
A typical part would have a structure like:
D Track AGainPanCompressor Track DGainPanCompressor Output CompressorGain System audio
Some things to note about the diagram are:
- a.There are some phases to playing a part.
- b.The display area is where all images and text are shown.
- c.There are blank images and captions that end display of the previous element of the same type, if required.
- d.Each audio element is connected to a single track.
- e.All tracks feed into the output, which feeds the device's sound system.
All elements in a part have a timecode, which when adjusted makes the earliest element align with the start of the part by making its timecode 00:00, and all other elements have their timecodes adjusted by the same zeroing offset. For example, if the current earliest element (at 00:00) is moved to be 15 seconds earlier, its timecode would be set to 00:00 but all the other elements would have their timecodes increased by that same 15 second offset, making one that was at 00:10 become 00:25.
Conversely, if the earliest element is moved to later, all other elements would be moved earlier by the same offset. The situation to be wary of is when moving the first element to later than other elements as the earliest of those would have its timecode set to 00:00 and all other elements would be adjust by the amount of the second element's original offset. For example, if the second element had a timecode of 00:10, and the first element was delayed by 15 seconds, all the elements would have their timecodes reduced by the 10 seconds, not the 15 seconds, and the moved element would now be at 00:05.
Deleting the first element may also trigger timecode recalculations, so if planning to do significant timing changes that may create inadvertent changes to other elements' timecodes, add a new element as a timing reference placeholder (will be created with 00:00 timecode) to prevent such changes, and which can be deleted when all adjustments have been made, precipitating a global adjustment of all remaining elements which will keep their relative offsets.
Normally, each part is independent, only being played when selected and the play button clicked. Choosing another part will not result in it being played immediately. This allows a sequence to hold several related presentations. However, a sequence can be set to automatically play the next part by enabling Auto-next, enabling a multi-part presentation to be watched in one go, though there is still the seconds of silence while each part's title is shown. Here, when another part is selected, it will start playing immediately. However, Auto-next will never automatically force any sequence elements to play when a page is loaded.
A snippet of audio, typically of less than a minute. They have a selection of fade in and out times.
Standalone audio elements are meant for a single complete block of audio, but possibly requiring multiple files for different audio formats, each of which may be optimised for minimal file size for specific operating systems. The audio elements used with sequences only allow one file, but their advantage comes because, being snippets, they can minimise the total overall audio file size even if they are a format like MP3, which is less compressed than newer ones but almost universally available because it is royalty-free.
Audio elements can have a fade-in and a fade-out, making it possible to cross-fade between snippets when used for looping within a element. Except for these, all other audio is statically controlled, in that once controls are set during editing, they never change when viewed on a page. A site visitor can only adjust the overall levels and that is by the controls of their device.
Each audio is connected to only one track, which has a gain control that applies to all audio for that track. This means that all the audio for a track should be for the same one purpose, such as the main voice, or the background music. That does mean that all the clips for a track need to have their audio levels normalised to be similar to each other prior to uploading their files.
An audio element may be looped by specifying both a duration and number of loops. While for discrete speech or sounds, fade-ins or -outs are not required, for background audio, looping is normally done with crossfades, where there is a fade-out on the tail end of one loop overlapping with a fade-in of the next. The best type of music for this is called pads, which consist of a rich texture of sounds blended together, and don't have recognisable melodies or rhythms so that there are no clashes during the crossfades. Fade-ins and -outs of two seconds with an overlap of three seconds sound best.
Background music at the start is usually better with longer fade-ins, with eight seconds providing a smooth lead-in. At the end, the background can be faded out at the same eight seconds. This means that such clips would normally not be included in a loop, though they may still be part of a group if there are images or other sounds accompanying them. Note that audio in groups cannot be looped.
Images allow a sequence to be a slideshow, with optional voiceovers and/or music.
An image replaces any previous image, including the sequence's until another image is scheduled. An image element without a file specified can be used to cease display of the previous image, revealing the sequence's image.
Images have to be the same size as that for the sequence to avoid distracting jumpy resizing. To ensure this, when selecting an image, only those that are the same dimensions as the sequence image are shown. Conversely, if there are any image elements, the sequence's image cannot be removed, and only those images that are the same dimensions can be selected as a replacement.
Typically text of the spoken commentary, but can be used to describe any other audible actions for those with hearing difficulties.
A caption is shown until another caption is scheduled or the part ends. A blank caption can be used to cease display of the previous caption. If the sequence uses images, captions are displayed with background near the bottom of the image. Otherwise, they are shown centred in the display area.
Captions are for providing the text of what is spoken, and perhaps other sounds, for those who are deaf or hearing impaired. However, as captions can be a maximum of two lines, many may be required for one clip of voice audio. The trick is in making them line up with the spoken words. Even in a presentation without audio, captions can be used to describe the images and what to do in relation to them, but time must be allowed for the viewer to read without stress, so the words still need to be spaced apart.
A suitable sequence of steps to do this is:
- 1.Play the clip in an audio media player that shows the current elapsed time to the second.
- 2.At each suitable phrase or words, note the time and the words used.
- 3.Create a group in a part and place the audio for the voice starting at 00:00.
- 4.For each piece of text, create a caption in the same group, move it to the noted time, and add the text.
- 5.Add any other elements to be part of the same group, such as related images, setting their timecode as required.
- 6.If any of the elements in the group need to be before the audio, they can be moved to their final time now, which will cause all the other elements to be moved later, while keeping the previously relative timings of the audio and captions.
- 7.Repeat for the other voice audio clips.
- 8.Set the group times in the part.
This way, the timings for captions need only be noted relative to the start of their related audio, bypassing the need for time arithmetic to work out their times relative to the start of the part. Checking the timing can be done by clicking pause and the required skip option buttons to get to the desired section of the part, making adjustments, then checking again.
A pointer provides a means of indicating the current area being focused upon.
The pointer can be positioned over an image to draw the viewer's focus to the part of it that is currently being discussed. It is like the laser pointer used in physical presentations.
When the next image becomes active, any pointer will be removed. The None option dispenses with the pointer until then. Schedule pointers at least a second after an image, which would be normal as a pointer is usually not required immediately because the image needs to be introduced first. If there is no image current at the scheduled time, the pointer is ignored.
Given that nothing can be seen until a part is played, to position pointers correctly over their required images, when a pointer offset is changed, it and the last image before it will be displayed for 10 seconds. If two or more images occur at the same timecode, only the first one, by part first, will be displayed. Checking the timing can be done by clicking pause and the required skip option buttons to get to the desired section of the part, making adjustments, then checking again.
A group is a collection of any of the other elements, whose starting times are relative to the earliest element in the group.
In making a presentation, often many elements are related in time to each other, and which may need to be moved together when modifying timing. The group element allows such related collections of elements to be moved at the same time. The same timecode adjustment provisos detailed for the part element apply within the group as well.
Being able to adjust timing is especially important for multi-locale presentations, because when a new locale is added, the timing may have to be more spaced out due to its language being more verbose and taking longer to express. Being able to shift related elements together as a group expedites adapting to the new locale.
There are up to four tracks, each being automatically created when selected for an audio element.
Other than fade-ins or -outs, there are no gain controls on audio elements, while each track has gain and pan. Audio elements of the same type, like voice or background, should be on the same track of the mixer so that their relative level in the mix is adjusted by the track's gain. This does mean that there is no individual sculpturing of sounds around each other, but the sequence element is not meant to be used for the sophisticated mixing typically used for making songs where instruments and vocals vie to be heard in the mix as the song goes along.
If a track no longer has an audio element joined to it, it is automatically deleted, with all settings lost, so if another audio is set to join the same track ID, its gain and pan will need to be set again if different from the defaults.
Panning allows for the content of the track to be centre, left or right, though the latter two are mildly offset from the centre. For most situations, centre will be appropriate, but some scenarios may benefit from having vocals off-centre, such as where two people are commenting upon a slide, where biasing each to opposite sides will seen natural. However, if one is predominant while the other is incidental, the former would be better in the centre, with the latter off to one side.
Each track has a compressor to minimise overloading that will lead to distortion. Pushing levels higher will result in the rate of increase being slowed down, making the audio sound squashed. To mitigate against pushing levels too much, the maximum gain is +20dB. Gain cannot be increased by more than 3dB at a time to prevent sudden excessive increases in sound levels that may damage ears or equipment.
For presentations or meditations, background music should be enough to mask other noises, but never be near the levels of voices. These are not for song mixes where music levels would be high and then reduced (ducked) during vocals. The goal of background music is usually not to be the entertainment, but to mask distracting noises in the reader's environment.
The output is created if there is any audio, and all the tracks feed into it.
The output has gain to adjust the final overall level, followed by another compressor to keep that level under control.
The track and output compressors work to keep overloading to a minimum. If a track is set for a gain of +20dB and the output gain to -20dB, the overall level sent to the device's output will be significantly less than if both were at gains of 0dB. The gains are set at editing time, with the runtime level set by the page-reader's device settings.
Each playback phase only allows certain controls.
|a||Play||–||Starts playing the current part|
|b||Pause||–||Pauses the currently playing part|
|c||Previous||–||Moves to the previous part, if any. Appears ghosted if at the first part|
|d||Next||–||Moves to the next part, if any. Appears ghosted if at the last part|
|e||Parts:||–||Reveals a table of parts where clicking on a part's Title will move to that part|
The playback phases and what causes transitions between them are:
The solid lines indicate manual actions, while the dotted lines are for automatic actions triggered by the indicated event.
The buttons available during each phase are:
There is no explicit stop button, though the stopped phase is returned to when a part has finished. Buttons that will be ignored display as ghosted, like . If Auto-next is enabled for a sequence element, when a part has finished and there is another part, it will play automatically. Also, clicking , or selecting from the parts list will start playing the selected part.
The parts list is always available. The Title of the part can be clicked during the stopped and paused phases to select it, but ignored during other phases. If the checkbox is preceded by a ©, and checked, the list of third-party copyright holders is displayed, as links to their work if available.
If paused, a ⇌ shows before the checkbox, indicating that there are several options to skip around the current part in a table before the parts list. Options include going to the beginning or forward or backward by various time periods.
Dealing with multimedia, the element has some limitations to prevent excessive resource usage.
The absolute limits for the element are:
While there is no limit on the length of audio clips, the underlying web audio technology is better suited to short clips, preferably under 45 seconds. Total duration of a part is not known until after all elements are loaded. Audio is not loaded at page loading time to reduce rendering delays and unnecessary resource usage. As long as audio starts on or before 59:59, it will play to completion, and is included in the total part duration displayed once play starts.
There are some limitations on what elements appear during playback. Multiple audio streams can play in parallel and even start at the same time. If there are any clashes, they will be immediately obvious during playback. For anything else, it makes no sense to show any more than one of any other element type at the same time, but since such situations can occur during timing adjustments, some rules are implemented to ensure unnecessary conflicts are avoided.
The playback exclusion rules are:
- a.Only one non-audio element of the same type at the same timecode will be available.
- b.Only the first, by part-level first, will be included in the playlist.
- c.Only the first of an image or pointer at the same timecode will show.
- d.If there is no current image at the scheduled time for a pointer, it is ignored.
- e.Any audio of a loop starting after 59:59 will be ignored.
Therefore, if an expected element does not appear, check for timecode duplications first.
Recording voice has some risks because there are many sounds produced by the mouth that interfere with getting clear speech, besides the usual background noises.
The major one is plosives, which are produced by words starting with p. The trick is to not record straight ahead of the mouth because, while the force of a plosive comes out there, most of the richness and level of a voice is produced off-centre. The technique on page 109 of Mixing with your mind by Michael Stavrou involves using a microphone (or an ear) to hear where the voice is most full, though lips are normally curved around that point, in a bias that increases with age, making it easier to know where to start. Typically the offset from straight ahead is 20-30o, and always on the same side for a particular person.
The whole of chapter 7 of the book deals with getting the best vocals, but this optimisation is easy to accomplish without much effort. A bonus is that there will likely be no need for a large ugly pop shield in the way. However, what is not seen in the image in the aside is a small piece of pop-shield foam that covers the area of the microphone's diaphragm to stop any spittle droplets that may come its way. It was held on by some fine wires that hooked into the grill.
A phone can be used to record voice by holding it close and aiming the bottom of the phone at the optimal mouth position, but to cut out a lot of room echoes, perhaps position a blanket over a door or other high furniture behind the microphone to reduce the unwanted room reflections reaching the mic. The blanket can also be used to pin the sheet that has the text to be spoken, to avoid moving the mouth away from the mic due to looking away at text that is not straight ahead.
Other artifacts produced by the voice will need to be handled in whatever audio editor is used, and only a spectral editor will enable dealing with some.
For a quality presentation, there will need to be some editing of each raw audio clip.
When recorded, any audio will have leading and trailing periods where there are just background noises, while vocals will likely have various unwanted noises like mouth clicks and excessive sibilance. All these have to be minimised so they don't distract the listener. If background levels are too loud or in the middle of words, unless skillful with how to exorcise them with a spectral editor, the audio will have to be recorded again at a quieter time.
Music clips to be used for looping need to be trimmed to the length required, but they don't need fade-in or -outs before importing.
For optimal use with the sequence element, audio needs to be made into clips.
For voice, the typical preparation for the clips is to use an audio or spectral editor to:
- a.Reduce excessive spikes.
- b.Maximise levels using a normaliser effect, though items to be on the same track need to have the same level settings.
- c.Cut the audio where there are longer gaps. A good metric for size is what will allow its words to fit in a caption, or a few if close together.
- d.Trim off the leading and trailing regions of each clip that are not required.
- e.Reduce the short quiet parts between words to silence.
- f.Cut out low frequency noise.
- g.Reduce the levels of the areas of predominantly sibilant sounds like s, soft cs ot ts. At the end of a word, they can be faded-out fairly quickly.
- h.Cut out mouth clicks and other unwanted artifacts (spectral editor).
- i.Cut out breathing noises between phrases, perhaps with a fade-out on the preceding phrase.
- j.For fairly pure sounds like bells, cut out the noise in the areas around the fundamental frequency and harmonics. Higher harmonics are shorter than lower ones, so noise trailing those can be silenced as well (spectral editor).
Being fairly ruthless with cutting out unwanted noises and room sounds will produce the clearest vocals. While that might make them sound odd in a video of a lecture, as a voiceover to screenshots or slides, they are far less distracting without room sounds, breathing and mouth clicks in the spaces between words. Of course, it is better to record at quiet times when there are few noises in the environment.
Deciding what to have in each clip is a balance between the space all the files occupy and the number of pieces of audio that are required to be sequenced. Making a clip out of each word would be excessive. Preferably break up the audio at gaps between sentences, allowing the timing to be optimised in the sequence element editing. Long pauses are the perfect breakpoints. For example, for meditations or presentations with intermittent guiding vocals, there are large gaps of many seconds or even minutes that can be cut out.
If using captions for the words of the audio, while editing try to make sentences of phrases start on a whole second by cutting or inserting in the spaces between phrases, as long as it still sounds natural. If catering for multiple languages, allow enough space for those that may take longer to say.
When making clips for looping background music, make them 10, 12, 15 or 20 second plus three seconds for the overlap for crossfading. Making the duration for the group in one of these lengths makes it easier to know how many loops are required per minute. For example, an 18-second clip with a Duration of 15 seconds means four loops per minute, so a 15-minute sequence would require 15 * 4 loops, less two for the clips for the long fade-in and -out.
Because of its dynamic nature, the sequence element offers more than static content.
Some uses for the sequence element are:
- a.Instructional content, rather like the many YouTube videos that show how to do tasks in computer programs.
- b.Slideshows, where are series of images are shown, with voice or captions describing or elaborating of the elements of the images or their meanings.
- c.Guided meditations, optionally with images.
While there may be other types of presentation that the sequence element may be useful for, these represent the obvious widely used media-intensive uses that can be implemented by it while substantially reducing resource usage enough to keep everything on site. This facilitates reducing dependence upon third-party sites to house content.
From time to time, we have all needed help with how to do a task in a program or some tool we have.
At those times, we have turned to YouTube and there is usually some helpful presenter with the solution to our quest. However, for us to be one of those helpful people, we have to come to grips with video technology and processes, as well as the idiosyncrasies of YouTube algorithms and rules. Plus we have to be the talking head, which presents a whole other level of stress, especially for those more inclined to introversion.
Videos capture movement, but that is a resource-intensive way of capturing information that doesn't require 24 pictures per second, forcing use of a beefed-up third-party platform to deliver a succession of images that are two orders of magnitude more rapid than many instructions require. Videos also waste a lot of frames capturing the gaps in the vocals, adding to more wasted bandwidth.
The sequence element breaks providing instructions down to a sparse succession of images accompanied by sporadic audio clips and captions. While producing those are not trivial, they can be produced independently of each other, removing the stressful and error-prone process of making them all happen simultaneously that video requires. Coordinating the elements of a sequence element presentation is not a real-time process. Making videos may look easy, but they are usually the result of many people synchronising their activities to make them happen, in the days before and during the video.
Conceptually, instructions using the sequence element may appear to be similar, or even preferable to using a procedure article. However, while the former can be used in many scenarios, procedures are designed for a formal multi-level documenting of processes and instructions. It is better to see them as complementary, where the former can be more colloquial, but deferring to the latter for the formal precision it offers.
We have been exposed to slideshows through PowerPoint for almost 50 years, so it is a popular presentation paradigm.
The sequence element is PowerPoint in motion. There is even a pointer. Things are kept simple though, in that it is one image at a time, with no transitions that would require additional effort to synchronise between overlapping images. The design philosophy is to favour operational and design simplicity for the owner over unnecessary artistic flourishes.
The sequence element comes into its own in a multi-locale environment, where once a presentation has been done for the master locale, all that needs to be done for another is to produce variants of the audio and image files, if required, and add translations to the captions. While there might need to be some fine-tuning of the timing, made easier by using groups, the same sequence element is used. Compare this to videos or PowerPoint which require an almost complete repetition of the creation process for each locale. The locale material can be added in-situ, and enabled when complete.
Guided meditations can be audio only, but captions allow for use by the hearing-impaired or translations.
Meditations usually consist of sporadic commands with soft background music. The sequence element really benefits such material as it allows dispensing with largely empty vocal audio, while supporting looping of the music, especially efficient if using pads. Together, these can turn a 20MB audio track into less than a tenth, while allowing cloning and editing for variants rather than complete re-recording.
While some audio skills are still required for editing, these are considerably less than for the level of proficiency and learning curve required for creating quality audio in audio workstation software. The added bonus is that images can be provided to help set listeners in the mood.
Note that while many meditation creators seem to be focused upon very talky and numerous variants, to actually be helpful, guided meditations need to be fairly lacking in talking to allow the listeners time to mentally action the guidance. Also, there is no need to have a lot of variants, as most people, once they understand the process, can adapt it to use for other areas of their life. Meditation is a process for taking control of our own thinking, not outsourcing it to others so that we become dependent upon them.
We live in a world that wants us to be dependent upon their cloud infrastructure, but simplifying our requirements and creative processes can free us from that.
YouTube is a juggernaut that has taken over as the premier presentation platform, but it is not necessary for many uses, and using it only forces the use of centralised power-hungry server farms that are mainly engaged in exploiting user data for advertising. Moving content that does not require such resource usage while allowing us to produce in our own cloud – our web site – frees us from dependence upon those who only want to exploit us and our viewers.
For such a simple tool, the sequence element can provide us with a lot of independence while not pushing the capabilities of the low-cost site hosting we may be using. Of course, people have to find our sites, but there is a lot of hype that maintains the illusion of success that being on YouTube is supposed to engender. Being a viral sensation is almost by definition a virtual impossibility, given the proliferation of creators (and many plagiarists) vying for the elusive YouTube jackpot. However, the types of uses that the sequence element is suitable for are not what goes viral. It pays to be real here!