Data Representation – Binary Format, ASCII EBCDIC Unicode, Digital Analog, Images Videos Audios
This article covers most of the things about Data Representation. If you reach this page by searching a certain keyword from any search engine, I would recommend you to use CTRL-F to find the parts that you need.
Computers – Process and store all forms of data in binary format.
Human communication – Includes language, images and sounds.
Data formats – Specifications for converting data into computer-usable form. – Define the different ways human data may be represented, stored and processed by a computer.
Computing Systems Data
Usually the computing systems are complex devices, dealing with a vast array of information categories. The computing systems store, present, and help us modify:-
- Images and graphics
Digital vs. Analog
Computing systems are finite machines. They store a limited amount of information, even if the limit is very big. The goal is to represent enough of the real world data to satisfy our computational needs and our senses of sight and sound. The information can be represented in one or two ways: analog or digital.
Analog data – is a continuous representation, analogous to the actual information it represents.
For example, a mercury thermometer is an analog device. The mercury rises in a continuous flow in the tube in direct proportion to the temperature.
Digital data - is a discrete representation, breaking the information up into separate (discrete) elements. Computers cannot work with analog information directly, so there is a need to digitise the analog information. This is done by breaking the analog information into pieces and representing those pieces using binary digits.
Why digital signal?
Both electronic signals (analog and digital) degrade as they move down a line. The voltage of the signal fluctuates due to environmental effects. As soon as an analog signal degrades, information is lost. Since any voltage level within the range is valid, it is impossible to know that the original signal was even changed. Digital signals jump sharply between two extremes (high and low state). A digital signal can degrade quite a bit until the information is lost, because any value over a certain threshold is considered high value and below the threshold is considered low value.
You can still retrieve the information from a reasonably degraded digital signal. Periodically a digital signal is reclocked to regain its original shape. As long as it is reclocked before too much degradation, no information is lost.
Why binary representation (as opposed to decimal or octat, etc..)?
Because the devices that store and manage the digital data are far less expensive and complex for binary representation. They are also far more reliable when they have to represent one out of two possible values. The electronic signals are easier to maintain if they carry only binary data.
One bit can be either 0 or 1. Therefore, one bit can represent only two outputs. To represent more than two outputs, we need multiple bits. Two bits can represent four outputs because there are four combinations of 0 and 1 that can be made from two bits: 00, 01, 10,11.
In general, n bits can represent 2n outputs because there are 2n combinations of 0 and 1 that can be made from n bits. Note that every time we increase the number of bits by 1, we double the number of things we can represent.
Data Formats – How to Interpret Data
Meaning of internal representation must be appropriate for the type of processing to take place:
- That is, images & sound: have to be digitised.
Images – need detailed description of the data, how colour is represented at each data point
Sound – need sampling rate
- Proprietary formats
Unique to a product or company
For example, Microsoft Word, Corel Word Perfect, IBM Lotus Notes.
Evolve two ways:
Proprietary formats become de facto standards (e.g., Adobe PostScript, Apple Quick Time)
Committee is struck to solve a problem (Motion Pictures Experts Group, MPEG)
They exist because they are:-
- Convenient – sometimes the time to market is very important whenever trying to finish a product, therefore existing standards may be used to save time elaborating own protocols and interfaces.
- Efficient – most of the standards are put together by committees with a wide experience in the specific area.
- Flexible – usually the standards allow for manufacturer or OEM specific extensions.
- Appropriate – address a specific problem in a specific domain.
Standards allow communication and sharing of information. They also allow computing systems and software to interoperate (at both hardware and software levels). Sometimes standards are arbitrary and have some “blast from the past” reasons (due to historical evolution).
Examples of Standards
ISO – International Standards Organisation
CSA – Canadian Standards Association
ANSI – American National Standards Institute
IEEE – Institute for Electrical and Electronics Engineers
Text : Alphanumeric Data
Three standards for representing letters (alpha) and numbers
- ASCII – American Standard Code for Information Interchange
- EBCDIC – Extended Binary-Coded Decimal Interchange Code (not used anymore, used to be used in IBM mainframes)
8th bit is unused (or used for a parity bit)
27 = 128 codes
Two general types of codes:
95 are “Graphic” codes (displayable on a console)
33 are “Control” codes (control features of the console or communications channel)
ASCII Reference Table
Extend the capability of the ASCII code set. For controlling terminals and formatting output. Defined by ANSI in documents X3.41-1974 and X3.64-1977.
The escape code is ESC = 1B16
An escape sequence begins with two codes:
- Erase display: ESC[2J
- Erase line: ESC[K
The extended version of the ASCII character set is not enough for international use. The Unicode character set uses 16 bits per character. Therefore, the Unicode character set can represent 216, or over 65 thousand, characters. Unicode was designed to be a superset of ASCII. That is, the first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set.
- Version 2.1
- Improves on version 2.0
- Includes the Euro sign (20AC16 = )
- From the standard:
…contains 38,887 distinct coded characters derived from the supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.
- Latest version of Unicode is 4.0
- More details can be found at Unicode’s main website http://www.unicode.org
It is important that we find ways to store text efficiently and transmit text efficiently:-
- Keyword encoding
- Run-length encoding
- Huffman encoding
Sound is perceived when a series of air compressions vibrate a membrane in our ear, which sends signals to our brain. A stereo system sends an electrical signal to a speaker to produce sound. This signal is an analog representation of the sound wave. The voltage in the signal varies in direct proportion to the sound wave.
To digitise the signal we periodically measure the voltage of the signal and record the appropriate numeric value. The process is called sampling. In general, a sampling rate of around 40,000 times per second is enough to create a very good high quality sound reproduction.
Several popular formats are: WAV, AU, AIFF, VQF, OGG, WMA and MP3. Currently, the dominant format for compressing audio data is MP3. MP3 is short for MPEG-2, audio layer 3 file.
MP3 employs both lossy and lossless compression.
- Analyses the frequency spread and compares it to mathematical models of human psychoacoustics (the study of the interrelation between the ear and the brain) and it discards information that can’t be heard by humans.
- Then the bit stream is compressed using a form of Huffman encoding to achieve additional compression.
Images and Graphics
Colour is our perception of the various frequencies of light that reach the retinas of our eyes. Our retinas have three types of colour photoreceptor cone cells that respond to different sets of frequencies. These photoreceptor categories correspond to the colours of red, green, and blue.
Colour is often expressed in a computer as an RGB (red-green-blue) value, which is actually three numbers that indicate the relative contribution of each of these three primary colours. For example, an RGB value of (255, 255, 0) maximises the contribution of red and green, and minimises the contribution of blue, which results in a bright yellow.
Images and Graphics
The amount of data that is used to represent a colour is called the colour depth.
- HiColour is a term that indicates a 16-bit colour depth. Five bits are used for representing the R and B components. Six bits are used for representing the G component, because the human eye is more sensitive to G;
- TrueColour indicates a 24-bit colour depth. Therefore, each number in an RGB value is represented using eight bits.
Digitised Images and Graphics
Digitising a picture is the act of representing it as a collection of individual dots called pixels. The word pixel was derived from the words, “picture element”. The number of pixels used to represent a picture is called the resolution.
Also known as raster-graphics format. It’s used for realistic images with continuous variations in shading, colour, shape and texture.
- Scanned photos
- Clip art generated by a paint program
Preferred when image contains large amount of detail and processing requirements are fairly simple
- Digital cameras and video capture devices
- Graphical input devices like mice and pens
Managed by photo editing software or paint software. Editing tools to make tedious bit by bit process easier.
Each individual pixel (pi(x)cture element) in a graphic stored as a binary number. Pixel: A small area with associated coordinate location. Example: each point below represented by a 4-bit code corresponding to 1 of 16 shades of gray.
- TIFF (Tagged Image File Format): .TIF
- GIF (Graphics Interchange Format): .GIF
- BMP (BitMaPped): .BMP
- JPEG (Joint Photographers Expert Group): .JPG
- PCX: .PCX (pronounced dot p c x) – Windows Paintbrush software
- PNG: (Portable Network Graphics): .PNG (pronounced ping)
A vector-graphics format describe an image in terms of lines and geometric shapes. A vector graphic is a series of commands that describe a line’s direction, thickness, and colour. The file size for these formats tend to be small because every pixel does not have to be accounted for.
Vector graphics can be resized mathematically, and these changes can be calculated dynamically as needed. However, vector graphics is not good for representing real-world images.
Created by drawing packages or output from spreadsheet data graphs. Computer translates geometric formulas to create the graphic. Storage space required depends on image complexity
number of instructions to create lines, shapes, fill patterns.
Cannot represent photos or paintings. Cannot be displayed or printed directly. Must be converted to bitmap since output devices (except plotters) are bitmap.
For Example : Objects seen in movies like Shrek, Toy Story, The Incredibles, Madagascar.
Most object image formats are proprietary. Files extensions include .wmf, .dxf, .mgx, and .cgm.
Popular Object Graphics Software:
- Macromedia Flash
- Micrographx Designer
- CorelDraw : vector illustration, layout, bitmap creation, image-editing, painting and animation software
- Autodesk AutoCAD
- W3C SVG (Scalable Vector Graphics) based on XML Web description language – Not proprietary
Bitmap vs. Object Images
Requires massive amounts of data. Video camera producing full screen 640 x 480 pixel true colour image at 30 frames/sec, which is also 27.65 MB of data/sec.
Method depends on how video delivered to users.
- Streaming video : video displayed as it is downloaded from the Web server
Example: video conferencing
- Local data (file on DVD or downloaded onto system) for higher quality
MPEG-2: movie quality images with high compression require substantial processing capability
A video codec (Coder/Decoder) refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network. Almost all video codecs use lossy compression to minimise the huge amounts of data associated with video.
Two types of compression:-
- Temporal compression
- Spatial compression
* The above information is aimed to assist people who want to learn more about computer technology. All credits go to the original creator(s) of these information or founder(s) of these knowledges. These works is not and will not be used for any commercial purposes in this weblog.