To main heading

Smallsite Design

Online management help

2Find

This page lists all articles and categories alphabetically, and allows finding which articles have specific content.

The Work list page enables drilling down to articles by the subsites and categories that own them. While that works if the ownership tree is remembered, this page lists all articles and categories alphabetically to allow finding them by just their ID. There is also a list of the most recently edited, and a way to list all articles that contain particular text or elements, but it requires a knowledge of the internal structure of articles. While some examples will be presented, it is nowhere near exhaustive, and will probably never be.

Recent

The five most recently edited articles.

This list keeps track of the five last articles to be released, but excluding those currently being edited, which are already listed in the In progress section of the Work list page. Each row provides a jump to the Article head page for the article, along with how long ago it was updated, and its Show status.

Articles

All articles are listed alphabetically by ID.

This is a simple list, but while the navigation bar for it provides the usual periodic links into the list, it also provides a link to the first of each type of article by the first letter of their IDs, along with how many there are, in the form of a: 25. Each row provides a jump to the Article head page for the article, along with its Show status.

Categories

All categories are listed alphabetically by ID.

This is a simple list, where, if more than four categories, its navigation bar provides the periodic links into it. Each row has a jump to the Category page for the category, along with its Show status. That's it.

Find

List all articles containing specified text or elements.

The differences between this facility and the search facility are:

  1. a.This offers finding elements by their place in the article semantic structure.
  2. b.This only searches in article bodies, but includes non-listed ones and the latest edit versions.
  3. c.Search includes category headings and descriptions, and article headlines, bylines and introductions, but only for those listed.

The body content of each article across all locales is stored in a single XML file, the hierarchical structure of which matches the semantic hierarchy of the article. The standard method to search within XML is XPath, which is specially formatted plain text that enables specifying the hierarchical relationships of its elements.

The find options are:

  1. a.Text – find the text. Case-sensitive.
  2. b.XPath – find the matching elements.

While the Text option is quite straightforward, the XPath option is quite sophisticated, but to use its more advanced features requires deep knowledge of the structure of the XML being searched. Only some simple examples will be provided here as the full structure of each element and its attributes is far too involved to enumerate here, though some clues for those willing to experiment will be provided. The worst that can happen is the XPath command is invalid. No articles will be harmed by using this facility.

Both options allow including embedded characters in the expression. For example, using the Text option to search for left-to-right markers using lr^, will show all articles where that character has been used to correct directional rendering mismatches.

The section begins with a text area into which the XPath expression is typed, and clicking the Submit button below it will open the latest release and edit files for each article to see how many matches can be found in each. An error in the XPath expression will be indicated, but no clue as to what is in error will be given. If any articles matching a correct expression are found, they will be listed below the form. It will only take a couple of seconds, even if there are hundreds of articles.

The columns of the resulting list are:
#HeadingDescription
aArticleJump to the article's Article head page
bReleaseNumber of matching elements found in the latest release. If no release version, a
cEditNumber of matching elements found in the latest WIP or draft. If no such version, a
dDoneShows a if the latest release is newer than when the find was executed

Typically, this facility will be to find articles that may need updating, but there needs to be a way to mark off which have been edited. The Done column indicates which articles have been updated after the list was generated. This presumes that the edits are implementing what the find was for. Make sure that the latest edit of an article occurred after the find, as its timestamp is what is used as the time of update, not the time of release.

Elements

List of the element names and their XML tag name to search by.

While element blocks and hover buttons show the full names of elements, many of their XML representation names are shorter. Most inline element XML names (as shown in Inline insert) and many block element names are just the name of their HTML rendering elements, so at least when actual HTML is seen, they will be familiar.

While some HTML elements may have the same name in different levels in the rendered document's hierarchy, such as a section which can be a child of the article, or as a subsection in a section, to simplify the rules controlling what elements can be contained in each, they are given different XML names. For example, the XPath representation of the example given in the first paragraph of this section, relative to the art element at the root, is scn/ssn/table/row/cell. If fact, if that expression is used in the XPath field, it would indicate how many cells occur in subsections in each article.

As there are a limited number of valid combinations of those elements, there are ways of shortening that statement, such as//ssn//cell. The // indicates any descendant, but a subsection can only appear in a section, and a cell can only appear in a table row, making this statement much shorter while still being just as specific. The Append cell of an element's element block indicates what it can validly contain.

Of course, getting to particular cells or any other elements in the path would require some extra information, and that is where it starts getting complicated, requiring some knowledge of the element's attributes and options, or where it stores its text. XPath is very powerful and generally concise, and while an alternative way of doing the same thing could be done using picklists and the like, it would very quickly become unwieldy, and still be very limited, though XPath expressions themselves can also get that way. The trick is in finding the most concise unique expression.

Attributes and options

Some clues to how to derive the likely names for element attributes and their option values.

Specifying attributes and option values can be a way of narrowing down which elements in the XPath statement are included, and thus minimising the number of articles to look through in the output list.

Many elements have attributes that specify either structure or appearance options, in addition to some of their text. While the attributes section of the element block shows their full names, their presentation in the XML is very terse, often only by single letter. These are usually the initial letter of the English name, but to avoid expanding names to avoid duplicates, another letter of a strong part of the name is used, such as the last letter or consonant.

For example, for colouring the text for some inline elements, the blue option is b, but the brown option is n. As colour can be applied to several inline elements, to avoid precluding the use of meaningful letters for any other options that those elements may need, x is used for the colour attribute name. For some attributes, the default option means there is no actual attribute, but that can also be tested for in XPath.

Attributes are denoted by a preceding @ in expressions, such as @n. Test for no attribute by not(@n).

Basic XPath find examples

These are some basic Xpath expressions that may be useful.

Be aware that PHP is limited to XPath 1.0 syntax and functions, so make sure not to use what applies to later versions.


Some basic examples of things to find are:
Text
//*[contains(.,'the')]|//@*[contains(.,'the')]
where the square brackets [] contain how to qualify the preceding entity, the dot . indicates their contents, contains is a function that tests whether the quoted text is within the content, and the vertical bar | indicates to find what satisfies either of the expressions either side of it, here being for elements or attributes. The search is case sensitive, so will ignore The, but will include all words containing the. This is the form of the expression used with the Text option.
Elements
//table
//*[name='table']
self::art[//p and //fig]
self::art[//p and //fig]//*[contains('fig|p',name())]
where the first two are equivalent in finding all tables, the third indicates whether the root element has both paragraphs and figures, and so will show 1 if they do, whereas the last will show the numbers of those elements. Note that if wanting to be unambiguous about the list of elements that may have name parts in common, but allows for easy expansion of the list, the ending qualifier for the last expression would have to be like:
//*[contains('|a|table|',concat('|',name(),'|')].
Disabled elements
//*[@s]
//*[contains(@s,'m')]
//*[contains(@s,'n')]
which list all articles with disabled elements, only those manually disabled, and only those with errors, respectively. The last will include every element that has children with errors, as that helps to troubleshoot errors, but won't list released articles, which must have no errors.
Minimum number of an element
self::art[count(//p)>5]//p
which will list articles with more than five paragraphs.
Has sections without numbering
self::art[not(@n) and scn]
where the not(@n) indicates that the article does not use numbering. The scn is included to make sure that only general articles are found, and actually have some sections, otherwise whether they use numbering is irrelevant. Note that if wanting to find articles with numbering but no sections, use:
self::art[@n and not(scn)].
Has figures with prefixes
//fig[@p=ancestor::art//*[contains('list|table',name())]/@p]
which finds all figures that have prefixes that match a list or table somewhere in the article. The ancestor::art makes sure that the search for matches start from the article again, as the matching list or table can be anywhere in it. For example, a figure may be in an aside, but its matching table is next to the aside so they appear side by side, as in the Procedure article.
  • Import
  • Work list
  • Files
  • Contact   Glossary   Policies
  • Categories   Feed   Site map

  • This site doesn't store cookies or other files on your device when visiting public pages.
    External sites: Open in a new tab or window, and might store cookies or other files on your device. Visit them at your own risk.
    Powered by: Smallsite Design©Patanjali Sokaris