As data are added to a list and collisions are resolved, some hashing functions tend to cause data to group within the list. This tendency of data to build up unevenly across a hashed list is known as clustering. Clustering is a concern because it is usually created by collisions. If the table contains a high degree of clustering, then the number of probes to locate an element grows and reduces the processing efficiency of the table.
Primary clustering, occurs when data cluster around a home address. A cluster is a sequence of adjacent occupied entries in a hash table. Clusters have no empty keys in them, and consists of contiguous runs of occupied entries. A collision resolution method, linear probing, is subject to something called Primary Clustering. When a number of keys collide at a given location, and we use linear probing to resolve, the colliding keys are inserted into empty locations immediately adjacent to the collision location. This can cause a puddle of keys to form at the collision location, called Primary Clustering.
Therefore, we need to design our hashing functions to minimize clustering. However, note that with the exception of the direct method, we cannot eliminate collisions. If we have a list with 365 addresses, we can expect to get a collision within the first 23 inserts more than 50% of the time.
C. Our final concept is that the number of elements examined in the search for a place to store the data must be limited. The traditional limit of examining all elements of the list presents three difficulties. First, the search is not sequential, so finding the end of the list doesn’t mean that every element has been tested. Second, examining every element would be excessively time-consuming for an algorithm that has as its goal a search effort of one. Third, some of the collision resolution techniques cannot physically examine all of the elements in a list. (For an example, “Quadratic Probe,”)
Generally a collision limit is placed on hashing algorithms. What happens when the limit is reached depends on the application.
Generally, there are two different approaches to resolving collisions: open addressing and using linked list chaining.
No comments:
Post a Comment