LoverBoy: October 2009

Saturday, October 31, 2009

Industrial Strength Sw

Users and developers are separate entities
Business is dependent on it
Bugs CANNOT be tolerated
User friendly User Interface
Documentation (for usage, maintenance, and further upgrading etc)
Reliability, Robustness, Portability etc very Important
Heavy investment
High Functionality
…

Productivity of 10-100 LOC / month per developer

Students’ Projects

Novice developers/users
Classroom or Small project
User Interface is not important, as it is to be used by him/herself
Documentation and testing not much cared – contains bugs
Reliability, Robustness, Portability are not important
No investment

Productivity of 2.5 to 5 KLOC/month per developer

Thursday, October 29, 2009

Definition of E-Commerce:

Electronic Commerce means buying and selling of goods and services across the internet. An e-commerce site can be as simple as a catalog page with a phone no, or it can range all the way to a real time credit and processing site where customer can purchase downloadable goods and receive them on the spot.
In the face of new market system forces create due to globalization and mounting competition, we can no longer follow historical paths and seek status quo. Companies are discovering old solution do not work with new business problems. The business has changed and so have the risk and payoffs.

Electronic commerce is more than just buying and selling products online. Instead, it encompasses the entire online process of developing, marketing, selling, delivering, servicing, and paying for products and services purchased by internetworked, global marketplaces of customers, with the support of a worldwide network of business partners.

Electronic commerce systems rely on the resources of the Internet, intranets, extranets, and other computer networks. Electronic commerce can include:

• Interactive marketing, ordering, payment, and customer support processes at e-commerce sites on the World Wide Web
• Extranet access of inventory databases by customers and suppliers
• Intranet access of customer relationship management systems by sales and customer service reps
• Customer collaboration in product development via Internet newsgroups and e-mail exchanges.

Technologies necessary for E-Commerce:

Technologies that are necessary for electronic commerce include:

• Information technologies
• Telecommunications technologies
• Internet technologies

INTRODUCTION OF E-COMMERCE

Electronic commerce encompasses the entire online process of developing, marketing, selling, delivering, servicing, and paying for products and services. The Internet and related technologies and e-commerce websites on the World Wide Web and corporate intranets and extranets serve as the business and technology platform for E-commerce marketplaces for consumers and businesses in the basic categories of business-to-consumer (B2C), (C2C) e-commerce. The essential processes that should be implemented in all e-commerce applications – across control and security, personalizing and profiling, search management, content management, catalogue management, payment systems, workflow management, event notification, and collaboration and trading – are summarized in Figure 7. 5.

E-Business is the creation of new, and the redesigning of existing value chains and business processes through the application of information technology. Naturally, e-Business is more than e-commerce. It expands the scope of e-commerce to transform the company and the industry itself.

Electronic Commerce or e-commerce is the trade of products and services by means of the Internet or other computer networks. E-commerce follows the same basic principles as traditional commerce that is, buyers and sellers come together to swap commodities for money. But rather than conducting business in the traditional way in shopping stores or through mail order catalogs and telephone operators — in e-commerce buyers and sellers transact business over networked computers.
E-commerce offers buyers maximum convenience. They can visit the web sites of multiple vendors round the clock a day to compare prices and make purchases, without having to leave their homes or offices from around the globe. In some cases, consumers can immediately obtain a product or service, such as an electronic book, a music file, or computer software, by downloading it over the Internet.
For sellers, e-commerce offers a way to cut costs and expand their markets. They do not need to build, staff, or maintain a physical store or print and distribute mail order catalogs. Automated order tracking and billing systems cut additional labor costs, and if the product or service can be downloaded then e-commerce firms have no distribution costs involved. Because the products can be sold sell over the global Internet, sellers have the potential to market their products or services globally and are not limited by the physical location of a store. Internet technologies also permit sellers to track the interests and preferences of their customers with the customer’s permission and then use this information to build an ongoing relationship with the customer by customizing products and services to meet the customer’s needs.
E-Commerce is about setting your business on the Internet, allowing visitors to access your website, and go through a virtual catalog of your products / services online. When a visitor wants to buy something he/she likes, they merely, "add" it to their virtual shopping basket. Items in the virtual shopping basket can be added or deleted, and when you're all set to checkout...you head to the virtual checkout counter, which has your complete total, and will ask you for your name, address etc. and method of payment (usually via credit card). Once you have entered all this information (which by the way is being transmitted securely) you can then just wait for delivery. It’s that simple. According to a CNN Opinion Poll, 62% of respondents who were surveyed said they plan to shop online during the Christmas season. Newsweek devoted its front page story to "shopping.com" in its December 7, 1998 issue (Asian Edition). The title was "Why Online Stores are the Best Thing since Santa Claus".S
E-Commerce is not about just online stores, it’s about anything and everything to do with money. If you pay (via cash, check, credit card, etc.) E-Commerce is about to make an introduction into your life soon. Banks like Bank of America and Wells Fargo are now giving their clients accessibility to their bank accounts via the web. Soon enough, banks in Pakistan would be following suit. Days are not far away (yes in Pakistan!) when you would be able to order and reserve your request for a movie at the local video store (all online) be able to browse through various titles, etc. and if you are feeling hungry

Wednesday, October 28, 2009

Implementation of Stacks in Memory

We present a linked list method using dynamic memory allocation and pointers. The two structures needed to implement stacks are head structure and data node structure. We name them STACK and NODE, respectively. In C, they are defined as follows.

struct node
{
int data;
struct node *next;
};

struct stack
{
int count;
struct node *top;
}stack1;

These declarations will reserve required memory for the two structures, but no values are assigned to the members of the both structures. The following algorithm will initialize, that is, will assign values, the stack to an empty state, which can further be expanded or shrunk as the need arises. Situation before execution of algorithm,

Stack data Node

The rest of the data structure is a typical linked list data node. Although the application determines the data that are stored in the stack, the stack data node looks like any linked list node. In addition to the data, it contains a next pointer to other data nodes, making it a self-referential data structure

Stack Head

Generally the head for a stack requires only two attributes: a top pointer and a count of the number of elements in the stack. These two elements are placed in a head structure. Other stack attributes can be placed here also. For example, it is possible to record the time the stack was created and the total number of items that have ever been placed in the stack. These two metadata items would allow the user to determine the average number of items processed through the stack in given period. A basic head structure

Data Structure

To implement the linked list stack, we need two different structures, a head and a data node. The head structure contains metadata and a pointer to the top of the stack. The data node contains data and a next pointer to the next node in the stack. The stack conceptual and physical implementations

Stack Top

The third stack operation is stack top. Stack top copies the item at the top of the stack; that is, it returns the data in the top element to the user but does not delete it. You might think of this operation as reading the stack top. Stack top can also result in underflow if the stack is empty. The stack top operation

Pop

When we pop a stack, we remove the item at the top of the stack and return it to the user. Because we have removed the top item, the next item in the stack becomes the top. When the last item in the stack is deleted, the stack must be set to its empty state. If pop is called when the stack is empty, then it is in an underflow state. The stack pop operation.

Push

Push adds an item at the top of the stack. After the push, the new item becomes the top. The only potential problem with this simple operation is that we must ensure that there is room for the new item. If there is not enough room, then the stack is in an overflow state and the item cannot be added. Figure 2 shows the push stack operation.

BASIC STACK OPERATIONS

The three basic stack operations are push, pop, and stack top. Push is used to insert data into the stack. Pop removes data from a stack and returns the data to the calling module. Stack top returns the data at the top of the stack without deleting the data from the stack.

Stacks

A stack is an ordered dynamic linear list in which all additions of new elements and deletions of existing elements are restricted to one end, called the Top. If a data series is inserted into a stack, and then removed it, the order of the data would be reversed. This reversing attribute due to addition and removal at the top of stack give a special behaviour, called as “Last-in, first-out” (LIFO) structure.

A stack is dynamic because it is constantly changing
objects, it expands and shrinks with passage of time.
The basic Stack operations are Create Stack, Push,
Pop, Stacktop, Emptystack, Full stack, Stack Counts,
And Destroy stacks.

We shall also study the following
application of stacks, reversing data (convert decimal to
binary) parsing data (matching of parentheses in source programs), postponing data usage (Infix, Postfix, Prefix and evaluation of Postfix expressions.) Stack frames concept is studied in a separate chapter on recursion.

Yardstick for Complexity Measurement

We need to devise a way of characterizing essential performance properties of an algorithm. Hard performance measures, such as wall clock time, vary significantly when we use different computers, compilers and programming languages for expressing and running the same algorithm. For reasoning, we use the example of INSERTION-SORT algorithm as a case for study.

Complexity of Algorithms

The soul of analysis of algorithms is to determine their complexity. Knowledge of complexity helps in deciding the implementation and choice of algorithms to solve computation problems efficiently.
The complexity of an algorithm A is the function f(n) or T(n) which gives the running time and/or storage space requirement of the algorithms in terms of the size n of the input data. Frequently the storage space required by an algorithm is simply a multiple of the data size n. Accordingly, the term complexity shall refer to the running time of the algorithm only, i.e by a function like T (n).

Analysis of Insert Sort

As a case for study, let us calculate running time for the algorithm INSERTION-SORT. We let tj be the number of times the while loop test at line 1.3 is executed, for that value of j. We also assume that comments are not executable statements, and so they take no time.

Running Time

The running time of an algorithm on a particular input is the number of primitive operations or steps executed. For defining the notion of step, RAM model viewpoint will be kept. In actual practice, a constant amount of time is required to execute each line of pseudo code. One line may take a different amount of time than another line. Let us assume each execution of the ith line takes time ci, where ci is a constant.

Input Size

The problem input size is usually denoted by n, which is usually the number of items handled by the algorithm. For example, for sorting and searching algorithms, the array size n will be input size. Sometimes it is more appropriate to describe the size of the input with two numbers rather than one. For instance, if the input to an algorithm is a graph, the input size can be described by the number of vertices and edges in the graph. In analyzing algorithms, input size of the problem will be indicated.

Input Size and Running Time

In general, the time and or memory taken by an algorithm grows with the size of the input, so it is traditional to describe the running time of a program as a function of the size of its input. To do so, we need to define the terms “running time” and “size of input”. Immediate goal of analysis is to find a means of expression or a measuring yardstick that helps characterize the time or space requirements of running algorithm, and suppresses tedious details. This leads to a well-known concept of measuring complexity of an algorithm. The complexity of an algorithm is the function T(n) which gives the running time and/or storage space requirement of the algorithm in terms of the size n of the input data. Frequently, the storage space required by an algorithm is simply a multiple of the data size n. Accordingly; the term complexity shall refer to the running time of the algorithm only.

Random Access Machine (RAM) model of computation Technology

Knowing that our algorithms will be implemented as computer programs, we shall assume using a generic one-processor RAM. In the RAM model, instructions are executed one after another, with no concurrent operations. Models based on parallel processors will not be used.

Type of Data Structures

General morphology of data structures is shown in figure 3.

Morphology of Data Structures

There are two aspects of managing data structures, namely, logical and physical.
A. Logical Data Structures
- Linear Structures
The most common organization for data is a linear structure. A structure is linear if it has these two properties :
Property 1: Each element of the structure is followed by at most one other element
Property 2: No two elements are followed by the same element
An array is an example of a linearly structured data type. We generally write a linearly structured data type like this ABCD.
Counter example 1: If property 1 violates, then BAC, A points to two elements. This example is tree which is non-linear structure. Trees are acyclic structures.

Counter example 2 : If property 2 violates, then ACB, A and B both points to C. This is graph which is again non-linear structure.
Other common linear structure like linked lists, stack & queue are shown above in the picture depicting morphology of the Data Structures.

Non-Linear Structures
In a non-linear structure there is no limit on the number of predecessors or successors of an element of the structure. An element may have a number of successors or predecessor. These structures violate both properties of linear structure. Trees and graphs are the important examples.

A tree is a collection of nodes, where every node has a unique predecessor, but it can have many successors. A Graph is also a collection of nodes. In graph, any node can have any number of successors and predecessors, as shown in figure.

Data Structure

A Structure, usually in computer memory, used to organize data and information for better algorithm efficiency is called Data structure. Examples are array, linked-lists, queue, stack and tree.

In the design of computer programs, the choice of data structure is a primary design consideration. The quality of final results depends heavily on choosing the best data structure. The choice of appropriate data structure is crucial and has given rise to many design methods and programming languages. Object oriented languages such as C++ and Java are one group of languages that exhibit this philosophy, and have several in-built structures.

A data structure can also be defined as an aggregation of atomic and structured data types into a set with defined relationships. Structure means a set of rules that hold the data together. In other words, if we take a combination of data types and fit them into a structure such that we can define its relating rules, we have made a data structure. Data structures can be nested. We can have a data structure that consists of other data structures. For example, we can define the two structures array and record as shown in table below.

Array Record
Homogenous Sequence of
data or data types known
as elements Heterogeneous combination of
data into a single structure with
an identified key
Position association among the
elements No association

Usual operations performed on data structures are
1. Insertion, 2. Deletion, 3. Searching, 4. Sorting, 5. retrieval, 6. traversing 7. Merging, and many other operations as required by applications.

Data Types

There are two types of data, namely, atomic and structured (also called composite) data.
Atomic data are data that we choose to consider as a single, non-decomposable entity. Boolean or logical data are examples. The integer 4562 may be considered as a single integer value. We can decompose it into single digits (4,5,6,2) but decomposed digits will not have the same characteristics of the original integer.

Data and Information

Data is collected about an entity which can be an object or event (real or abstract) of interest to the user. An entity may be a person, place, location or thing- e.g., a salesperson, a city or a product. An entity can also be an event or unit of time such as machine break-down, a sale, or a month.

Data remains raw facts in isolation unless they are manipulated to be useful. These isolated facts do convey meaning but generally are not useful by themselves.

For example, consider a data base which stores data about a BEIT class. Data records contains students names, date of birth, address, courses, and grades on each course. If the Principal calls to find out the total number of students enrolled in BEIT class, the clerk can answer his question by looking at the data base. To the clerk, data base is information. But director of institute wants to know the total number of students in data structures class completing with A grade. The director will have to identify the students in class data base and then identify students with A grades. The information given to the principal is data to the director. It produced useful information for the director when it was processed further to output the list of only those students who have completed data structure course with A grades. This example tells that one person’s information may be another person’s data. In this sense, information eliminates uncertainty about a state, while data mean accumulated but unorganized facts.
One can say that data is to information is same as ore is to gold. A computer-based information system creates, collects and stores data, and processes that data into useful information. Symbolically,
Information = f (data, processing), a function of data and processing.
Data is, therefore, an essential ingredient for generation of information.

Paths, Cycles, and Adjacency :

Two vertices vi and vj in a graph G = (V,E) are adjacent or neighbours if there exists an edge e  E such that e=( vi,vj). A path p in a graph G=(V,E) is a sequence of vertices of V of the form, p = v1, v2………vn, (n 2) in which each vertex vi is adjacent to the next one,vi+1 (for 1 . The path is said to be simple if all the vertices or vertices are distinct, with the exception that v1 may equal vn. A path will be of length n if consists of a sequence of n+1 vertices. In other words, length of a path is the number of edges in it.

A cycle is a path p = v1, v2………vn, such that v1 = v2 (so that p starts and ends at the same vertex and forms a loop). Figure 2 illustrates paths and cycles in a graph. A cycle is a path of length greater than one that begins and ends at the same vertex. In an undirected graph, a simple cycle is a path that travels through three or more distinct vertices and connects them into a loop. Formally speaking, this means that if p is a path of the form, p = v1, v2………vn then p is a simple cycle if and only if (n>=3), v1 = vn, and vi  vj for distinct i and j in the range . Put differently, when you travel around the loop in a simple cycle, you must visit at least three different vertices, and you cannot travel through any vertex more than once.

Graphs and Multigraphs:

A graph represented by G (V,E) consists of two things :
(1) A set V of elements called vertices (or points or nodes)
(2) A set E of edges such that each edge eiE is identified with a unique (unordered) pair [vi , vj] of vertices in V, denoted by ei=[ vi , vj]

The above graph consists of 5 vertices and edges

Suppose ei or ej=[ vi , vj]. Then the vertices vi and vj are called endpoints of e, and vi and vj are said to be adjacent vertices or neighbors. The degree of a vertex v, written deg (v), is the number of edges containing v. If deg(v)=0-that is, if v does not belong to any edge-then v is called an isolated vertex.

Separate Chaining Resolution Using Linked List

A major disadvantage to open addressing is that each collision resolution increases the probability of future collisions. This disadvantage is eliminated in the second approach to collision resolution known as separate chaining using short linked lists. Separate chaining is an ordered collection of data in which each element contains the location of the next synonymous element. For example, in Figure 12, array element 001, Sarah Trapp, contains a pointer to the next element, Harry Eagle, which in turn contains a pointer to the third element, Chirs Walljasper. Linked list resolution uses a separate area to store collisions and chains all synonyms together in a linked list. It uses two storage areas, the prime area and the overflow area. Each element in the prime area contains an additional field, a link head pointer to a linked list of overflow data in the overflow area. When a collision occurs, one element is stored in the prime area and chained to its corresponding linked list in the overflow area. Although the overflow area can be any data structure, it is typically implemented as a linked list in dynamic memory. Figure 12 Shows the linked list from Figure 11 with three synonyms for address 001and three synonyms for address 007.

The linked list data can be stored in any order, but a last in-first out (LIFO) stack sequence and a key sequence are the most common. The LIFO sequence is the fastest for inserts because the linked list does not have to be scanned to insert the data. The element being inserted into overflow is simply placed at the beginning of the linked list and linked to the node in the prime area. Key sequenced lists, with the key in the prime area being the smallest, provide for faster search retrieval. Which sequence (LIFO or key-sequence) is used depends on the application.

Key Offset

Key offset is a double hashing method that produces different collision paths for different keys. Whereas the pseudorandom number generator produces a new address as a function of the previous address, key offset calculates the new address as a function of the old address and the key. One of the simplest versions simply adds the quotient of the key divided by the list size to the address to determine the next collision resolution address, as shown in the formula below.

Offset = key/listSize
address = ((offset + old address) modulo listSize)

For example, when the key is 166703 and the list size is 307, using the modulo-division hashing method we generate an address of 1. As shown in Figure 11, this synonym of 070919 produces a collision at address 1. Using key offset to calculate the next address, we get 237, as shown below.

Offset = 166703/307 = 543
address = ((543 + 001) modulo 307)= 237

If 237 were also a collision, we would repeat the process to locate the next address, as shown below.

Offset = 166703/307 = 543
address = ((543 + 237) modulo 307)= 166

To really see the effect of key offset, we need to calculate several different keys, all hashing to the same home address. In Table 2. We calculate the next two collision probe addresses for three keys that collide at address 001.

Pseudorandom (MAD) Collision Resolution

This method uses a pseudorandom number to resolve the collision. We saw the pseudorandom number generator as a hashing method in “Pseudorandom Method”. We now use it as a collision resolution method. In this case, rather than using the key as a factor in the random number calculation, we use the collision address. Consider the collision we created in Figure 10. We now resolve the collision using the following pseudorandom number generator, where a is 3 and c is 5 :

y = (ax + c) modulo listSize
= (3 x 1 + 5) Modulo 307
= 8, next addresses are 29, 92, and so on

In this example, we resolve the collision by placing the new data in element 008 (Figure 11). Pseudorandom numbers are a relatively simple solution, but they have one significant limitation: All keys follow only one collision resolution path through the list. (This deficiency also occurs in the linear and quadratic probes.) Because pseudorandom collision resolution can create significant secondary

Double Hashing or Rehashing

These are methods of dealing with hash collisions. Double hashing or rehashing involves using a secondary hash function on the hashed key of the item. The rehash function is applied successively until an empty position is found. If the hash position is found occupied during a search, the rehash function is again used to locate the item. In general, a rehash function, RH, accepts an array address and produces another. If array address H (K) is already occupied, a rehash function, RH, is applied to the value of H (K) i.e.

RH (H (K))

to find another location. If position RH (H (K) ) is also occupied, it too is rehashed to see if RH (RH (H (K))) is available. This process continues until an empty location is found. RH is not necessarily the same as the original hash function.

The following two methods are collectively known as double hashing or rehashing methods. In each method, rather than using an arithmetic probe function, the address is rehashed. Both methods prevent primary clustering.

Quadratic Probe

Primary clustering can be eliminated by adding a value other than 1 to the current address. One easily implemented method is to use the quadratic probe. In the quadratic probe, the increment is the collision probe number squared. Thus for the first probe we add 12, for the second collision probe we add 22, for the third collision probe we add 32, and so forth until we either find an empty element or we exhaust the possible elements. To ensure that we don’t run off the end of the address list, we use the modulo of the quadratic sum for the new address. This sequence is shown in Table 1, which for simplicity assumes a collision at location 1 and a list size of 100. From table, we can see quadratic probing causes secondary clustering on a collision resolution path.

Address = H(K)+C i2
Where we take C=1 and i is collision probe number.

Linear Probe

Our first collision resolution method is also the simplest. In a linear probe, when data cannot be stored in the home address, we resolve the collision by adding 1 to the current address. For example, let’s add two more elements to the modulo-division method example. The results are shown in figure 10. When we insert key 070919, we find an empty element and insert it with no collision. When we try to insert key 166703, however, we have a collision at location 001. We try to resolve the collision by adding 1 to the address and inserting the new data at location 002. However, this address is also filled. We therefore add another 1 to the address and this time find an empty location, 003, where we can place the new data.
As an alternative to a simple linear probe, we can add 1, subtract 2, add 3, subtract 4, and so forth until we locate an empty element. In either method, the code for the linear probe must ensure that the next collision resolution address lie

Open Addressing

The first collision resolution approach, open addressing, resolves collisions in the prime area  that is, the area that contains all of the home addresses. This technique is opposed to linked list resolution, in which the collisions are resolved by placing the data in a separate overflow area.

When a collision occurs, the prime area addresses are searched for an open or unoccupied element where the new data can be placed. We discuss four different methods: linear probe, quadratic probe, pseudorandom collision resolution, and key offset. The last two methods, pseudorandom and key offset, are collectively known as double hashing or rehashing methods

Clustering

As data are added to a list and collisions are resolved, some hashing functions tend to cause data to group within the list. This tendency of data to build up unevenly across a hashed list is known as clustering. Clustering is a concern because it is usually created by collisions. If the table contains a high degree of clustering, then the number of probes to locate an element grows and reduces the processing efficiency of the table.

Primary clustering, occurs when data cluster around a home address. A cluster is a sequence of adjacent occupied entries in a hash table. Clusters have no empty keys in them, and consists of contiguous runs of occupied entries. A collision resolution method, linear probing, is subject to something called Primary Clustering. When a number of keys collide at a given location, and we use linear probing to resolve, the colliding keys are inserted into empty locations immediately adjacent to the collision location. This can cause a puddle of keys to form at the collision location, called Primary Clustering.

Therefore, we need to design our hashing functions to minimize clustering. However, note that with the exception of the direct method, we cannot eliminate collisions. If we have a list with 365 addresses, we can expect to get a collision within the first 23 inserts more than 50% of the time.

C. Our final concept is that the number of elements examined in the search for a place to store the data must be limited. The traditional limit of examining all elements of the list presents three difficulties. First, the search is not sequential, so finding the end of the list doesn’t mean that every element has been tested. Second, examining every element would be excessively time-consuming for an algorithm that has as its goal a search effort of one. Third, some of the collision resolution techniques cannot physically examine all of the elements in a list. (For an example, “Quadratic Probe,”)

Generally a collision limit is placed on hashing algorithms. What happens when the limit is reached depends on the application.

Generally, there are two different approaches to resolving collisions: open addressing and using linked list chaining.

Load Factor

Before we discuss the collision resolution methods, however, we need to cover three more concepts. Because of the nature of hashing algorithms, there must be some empty elements in a list at all times. In fact, we define a full list as a list in which all elements except one contain data. As a rule of thumb, a hashed list should not be allowed to become more than 75% full. This guideline leads us to our first concept, load factor. The load factor of a hashed list is the number of elements in the list divided by the number of physical elements allocated for the list, expressed as a percentage. The formula in which k represents the number of filled elements in the list and n represents the total number of elements allocated to the list is

Collision Resolution Methods

With the exception of the direct method, none of the methods used for hashing are one-to-one mapping. Thus, when we have a new key to an address, we may create a collision. There are several methods for handling collisions, each of them independent of the hashing algorithm. That is, each hashing method can be used with each of the collision resolution methods. In this section we discuss the collision resolution methods shown in Figure 9.

Pseudorandom Method

In the pseudorandom method, the key is used as the seed in a pseudorandom number generator and the resulting random number is then scaled into the possible address range using modulo division. A common random number generator is shown below.

y = ax + c

This method is also known as MAD which stands for multiply, add and divide. To use the pseudorandom number generator as a hashing method, we set x to the key, multiply it by the coefficient a, and then add the constant c. The result is then divided by the list size with the remainder (see “Modulo-Division Method,”) being the hashed address. Let’s demonstrate the concept with an example from Figure 6. To keep the calculation reasonable, we use 17 and 7 for factors a and c, respectively. Also, the list size in the example is the prime number

Rotation Method

Rotation hashing is generally not used by itself but rather is incorporated in combination with other hashing methods. It is most useful when keys are assigned serially, such as we often see in employee numbers and part numbers. A simple hashing algorithm tends to create synonyms when hashing keys are identical except for the last character. Rotating the last character to the front of the key minimizes this effect. For example, consider the case of a six-digit employee number that might be used in a large company.
Examine the rotated key carefully. Because all keys now end in 60010, they would obviously not work well with modulo division. One the other hand, if we used a simple fold shift hash on the original key and a two-digit address, the addresses would be sequential starting with 62. Using a shift hash on the rotated key results in the series of addresses 26, 36, 46, 56, 66, which has the desired effect of spreading the data more evenly across the address space. Rotation is often used in combination with folding hashing.

Folding Methods

Two folding methods are used: fold shift and fold boundary. In fold shift, the key value is divided into parts whose size matches the size of the required address. Then the left and right parts are shifted and added with the middle part. For example, imagine we want to

Key
123456789

123
123 456 789
789 321
1 368 123 456 789
987
1 764

(a) Fold shift (b) Fold boundary

Figure 7: Hash fold examples

map identity numbers into three-digit addresses. We divide the nine-digit identity number into three three-digit numbers, which are then added. If the resulting sum is greater than 999, then we discard the leading digit. This method is shown in Figure 7 (a).

In fold boundary, the left and right numbers are folded on a fixed boundary between them and the center number. The two outside values are thus reversed, as seen in Figure 7 (b). It is interesting to note that the two folding methods give different hashed addresses.

Midsquare Method

In midsquare hashing, the key is squared and the address selected from the middle of the squared number. The most obvious limitation of this method is the size of the key. Given a key of 6 digits, the product will be 12 digits, which is beyond the maximum integer size of many computers. Because most personal computers can handle a 9-digits integer, let’s demonstrate the concept with keys of 4 digits. Given a key of 9452, the midsquare address calculation is shown below using a 4-digit address (0000 to 9999).

9452 * 9452 = 89340304 : address is 3403

As a variation on the midsquare method, we can select a portion of the key, such as the middle three digits, and then use them rather than the whole key. Doing so allows the method to be used when the key is too large to square. For example, for the keys in Figure 6, we can select the first three digits and then use the midsquare method as shown below. (We select the third, fourth, and fifth digits as the address.)

Digit – Extraction Method

Using digit extraction, selected digits are extracted from the key and used as the address. For example, using our six-digit employee number to hash to a three-digit address (000 to 999) we could select the first, third, and fourth digits (from the left) and use them as the address. Using the keys from Figure 6, we would hash them to the addresses shown below.

379452  394
121267  112
378845  388
160252  102
045128  051

Modulo-Division Method

Also known as division remainder, the modulo-division method divides the key by array size (table size) and uses the remainder for the address. This method gives us the simple hashing algorithm shown below where tableSize is the number of elements in the array.

address = key MODULUS tableSize

This algorithm works with any table size, but a table size that is a prime number produces fewer collisions than other table sizes. We should therefore try, whenever possible, to make the array size a prime number.

As the little company begins to grow, we realize that soon we will have more than 100 employees. Planning for the future, we create a new employee numbering system that will handle 1,000,000 employees. We also decide that we want to provide data space for up to 300 employees. The first prime number greater than 300 is 307. We therefore choose 307 as out list (array) size, which gives us a table with addresses that range from 0 through 306. Our new employee table and some of its hashed addresses are shown in Figure 6.

To demonstrate, let’s hash Bryan Devaux’s employee number, 121267.

121267/307 = 395 with remainder of 2
Therefore: hash(121267) = 2

Direct Method

In direct hashing, the key is the address without any hashing manipulation. The structure must therefore contain an element for every possible key. The situations in which you can use direct hashing are limited, but it can be very powerful because it guarantees that there are no synonyms. Let’s look at two applications.

Now let’s take an example. Imagine that a small organization has fewer than 100 employees. Each employee is assigned an employee number between 1 and 100. In this case, if we create an array of 101 employee records, the employee number can be directly used as the address of any individual record. This concept is shown in Figure 5.

Note that not every element in the array contains an employee’s record. In fact, all hashing techniques other than direct hashing require that some of the elements be empty to reduce the number of collisions.
Although this is the ideal method, its application is very limited. For example, we cannot have the National Identity Card Number as the key using this method because National Identity Card Number is 13 digits. In other words, if we use the National Identity Card Number as the key, we need an array as large as 1,000,000,000,0000 records but we would use fewer than 100 of them. Let’s turn our attention, then to hashing techniques that map a large population of possible keys into a small address space.

Probing and Probing Sequence

It should be obvious that when we need to locate an element in a hashed list, we must use the same algorithm that we used to insert it into the list. Consequently, we first hash the key and check the home address to determine whether it contains the desired element. If it does, the search is complete. If not, we must use the collision resolution algorithm to determine the next location and continue until we find the element or determine that it is not in the list. Each calculation of an address and test for success is known as a probe. In case of collisions, method of open addressing produces alternate lists of addresses. The process of probing produces a probing sequence which is a sequence of locations that we examine when we attempt to insert a new key into the hashed table, T. The first location in the probe sequence is the hash address, H (K). The second and successful locations in the probe sequence are determined by Collision Resolution Policy. To guarantee availability of always an empty location in every probe sequence, we define a “Full” Table T, to be a table having exactly one empty table entry.

Collisions in Hashing

Generally, the population of keys for a hashed list is greater than the storage area for the data. For example, if we have an array of 50 students for a class in which the students are identified by the last four digits of their National Identity Card Numbers, then there are 200 possible keys for each element in the array (10,000/50).
Because there are many keys for each index location in the array, more than one student may hash to the same location in the array. We call the set of keys that hash to the same location in our list synonyms.
Hash function H (K) are used to map keys into table addresses to store data as table entries (K, I) in hashed table. Almost always we use hashing techniques in which there are many more distinct keys K than there are table addresses, so we encounter a situation in which two distinct keys, K1  K2, map to the same table address, i.e.;
H (K1) = H (K2)

Hash Function

The idea of using the key from a large set K of keys to determine the address/location of a record into a smaller set L of table addresses leads to the form of a hash function H. A function that performs this job, such as
H (K) = L
is called a hash function or hashing function. Another way to describe hashing is as a key-to-address transformation in which the keys map to addresses in a list. This mapping transformation is shown in Figure 2. At the top of Figure 2 is a general representation of the hashing concept. The rest of the figure shows how three keys might hash to three different addresses in the list. There are various methods or functions used to map key to addresses, namely Modulo-Division Method, Midsquare Method, and Folding Method etc.

There are few criteria and purposes for hashing of keys. One is that hashing function should be very easy, simple and quick to compute. Second is to map a large possible set of keys to a smaller set of addresses in table. Usually the population of possible keys is much larger than the allocated table size. Thirdly we would like hashed addresses be distributed evenly on the smaller set so that there are a minimum number of collisions. Collision elimination is not guaranteed. Uneven distribution will cause clustering which is not desirable. Fourthly hashing allow direct access to a specific location in the table where the key of the record has been hashed. Its running time efficiency is O(1).

A hash table is a list of records in which keys are mapped by a hash function. It is similar to table, T, described in section 1.0, except here table entries T (K, I) are generated by a appropriate hash functions. A hash table generated by a perfect hash has no collisions. In this case usually all possible keys must be known before hand. This is also known as optimal hashing or perfect hashing.

Table

A table, T, is an abstract data storage structure that contains table entries that are either empty or are pairs of the form (K, I), where K is a key and I is some data or information associated with the key K.

In a table, there is no order on the elements. A table with no order, showing
a list of students with idenfication number, name and phone is given below.
given below.

Figure 1: A Simple Table

The identity number is “key”, and name, phone etc are “information” associated with the key.
A common table operation is table searching, which is an activity in which, given a search key, K, we attempt to find the table entry (K, I) in T containing the key K. Then we may wish to retrieve or update its information, I, or we may wish to delete the entire table entry (K, I). We may also wish to insert a new table entry (K, I). We can enumerate entries in table T, e.g. to print contents of the table.

Table can be represented by various data structures like C struct, arrays and linked lists.

Thursday, October 22, 2009

Binary Trees

A binary tree T is defined as a finite set of elements, called nodes, such that :
(a) T is empty (called the NULL tree or empty tree) or
(b) T contains a distinguished node R, called the root of T, and the remaining nodes of T form an ordered pair of disjoint binary trees T1 and T2 .

A binary tree is a tree in which no node can have more than two subtrees. In other words, a node can have zero, one, or two subtrees. These subtrees are designated as the left subtree and right subtree.

Terminology of Trees

In addition to root, many different terms are used to describe the attributes of a tree. A leaf is any node with an outdegree of zero. A node that is not a root or a leaf is known as an internal node because it is found in the middle portion of a tree. A leaf, being a node with no successor, is also called terminal node.

A node is a parent (predecessor) if it has successor nodes – that is, if it has an outdegree greater than zero. Conversely, a node with a predecessor is a child (Successor). A child node has an indegree of one. Two or more nodes with the same parent are siblings. An ancestor is any node in the path from the root to the node. A descendent is any node in the path below the parent node; that is, all nodes in the paths from a given node to a leaf are descendents of the node.
A descendent is any node in the path below the parent node; that is, all nodes in the paths from a given node to a leaf are descendents of the node.
The level of a node is its distance from the root. Because the root has a zero distance from itself, the root is at level 0. The children of the root are at level 1, their children are level 2, and so forth.

Basic Tree Concepts

A tree consists of a finite set of elements, called nodes, and a finite set of directed lines, called branches, that connect the nodes. The number of branches associated with a node is the degree of the node. When the branch is directed toward the node, it is an indegree branch; when the branch is directed away from the node, it is an outdegree branch. The sum of the indegree and outdegree branches is the degree of the node.
If the tree is not empty, then the first node is called the root. The indegree of the root is, by definition, zero. With the exception of the root, all of the nodes in a tree must have an indegree of exactly one. All nodes in the tree can have zero, one, or more branches leaving them; that is, they may have an outdegree of zero, one, or more.

Monday, October 19, 2009

love

love is life .................................................
what do u thing

Saturday, October 31, 2009

Thursday, October 29, 2009

Wednesday, October 28, 2009

Thursday, October 22, 2009

Monday, October 19, 2009

Flag counter

SUNNY

WELCOME

counter

Blog Archive

About Me