Thursday, August 19, 2010

Set and Iterator

The Collections Framework in Java, which took shape with the release of JDK 1.2
and was expanded in 1.4 and again in Java 5, and yet again in Java 6, gives you lists,
sets, maps, and queues to satisfy most of your coding needs. They've been tried, tested,
and tweaked. Pick the best one for your job and you'll get—at the least—reasonable
performance. And when you need something a little more custom, the Collections
Framework in the java.util package is loaded with interfaces and utilities.


What are Sets - An unordered collections is called a set , you have probably learned some set theory in mathematics. If data structures is no longer responsible for remembering order of elements insertion , can  it give us better performance for some of its operations? It turns out to be true we will see it later. Lets check the operations on set first.

1) Adding element
2) Deleting element
3) Check if set contains a given object i.e containment testing
4) Listing all elements

One of the most important thing to remember about set is "Set's don't have duplicates values ,adding element in set that's already present is ignored".This is the most useful feature in many programming situations as well.
We could have used linked list to implement a set but all operations on set become slow due to linear search
through list we neglected it.Adding element in set requires check to make sure we don't add a duplicate

We have two data structures hash tables and trees. The standard java Library provides set implementation based on both data structures called hash Set and TreeSet. Both these data structures implement the Set interface.
 
                        
What is Iterator: It is used to visit all elements in set. A set iterator does not visit the elements in the order in which you inserted them. they are infact visited in order in which HashSet keeps them for rapid execution of its methods.


What is List Iterator : They use the next and hasnext methods to step through the set.

You might want to know then what is the difference between these two ?

List Iterator has an add method to add elements at the list iterator position but iterator interface has no such method.


Check Yourself - 1) Arrays and lists remember the order in which you added elements , set do not . Why would you want to use set instead of arrays or lists?


It is considered good way to store a references to a HashSet or TreeSet in a variable of type Set.

Set<String> names = new HashSet<String>();

Also methods that operate on set should specify parameters of type Set:
public static void print(Set<String> s)
Set Interface & the Map Interface are well designed and we should use them.

Maps

What is Map : It is a data type that keeps associations between keys and values , every key in map has a unique value, but value can be associated to several keys. Same as set implementation the Java Library has two implementation for maps a) Hash Map and b) Tree Map.

The most important thing here to remember is both Hash Map & Tree Map implement the Map Interface
 Map <String, Color> favoriteColors = new HashMap<String, Color>();
We can use put method to add association and remove method to remove association, you can also change value of  an existing association simply by calling put again. The get method returns the value associated with a key. If you ask for key that's not associated with any values, then the get method returns null.To find all keys and values in a map, iterate through the key set and find the values that correspond to the keys through get method. Sometimes when you want to enumerate all keys in a map KeySet method yields the set of keys

Check yourself - 1) What is difference between set and map
                          2) Why is the collections of the keys of a map a set

Hash Table

What is Hashing:  Hashing is a technique that can be used to find elements in a data structure quickly without making a linear search through elements. Hashing give rise to hash table that can be used o implement sets and maps.A hash function is a function that computes an integer value, the hash code from an object. The Object class has a hashCode method that other classes need to redefine. The call computes hashCode of object x.
       int h = x.hashCode();

The two or more distinct objects can have same hash code this is what we call Collision. The String class defines a hash function for strings that does a good job of producing different integer values for different strings.A hash code is used as an  array index into a hash table. In its implementation we can say you should make an array and insert each object at the location of its hash code.HashCodes  often make search simpler and easier.Its all about computing hashCode and check whether array position with that hash code is already there or not.So we can say that it saves times in search of complete array.

Drawback of HashCode Approach: It is not possible to allocate an array that is large enough to hold all possible integer index positions.So in this case we need an array of some reasonable size and then reduce hashCode to fall inside the array.

It can also be possible that two objects have same hashCode,to store multiple objects in the same array position , use short node sequences for all elements with same hash code. These node sequences ar known as bucket. Hope you are not confused till here .

In more brief i can say A hash table can be implemented as an array of buckets - sequences of node that hold elements with same hash code.

 Check yourself - 1) Write the algorithm for finding an object x in a hash table.

Few points to learn here for hash table are:
1) If there are no or only a few collisions , then adding , locating and removing hash table elements takes constant or O(1) time.
2) For reducing the chances of collisions , you should make hash table somewhat larger than the number of elements that you expect to insert.
3) According to some researchers hash table size should be chosen to be a prime number to minimize the number of  collisions.
4) As long as there are few collisions, an element can also be added or removed in constant or O(1) time