Java Streams

Parallel Streams

March 14, 2020

Parallelism is dividing a problem into sub-problems and executing them on multiple resources. For example, Computers having a Multi-Core processor are faster. Similarly, our programs can be designed in a way so that they can take advantage of multi-core processing.

Streams API allows us to do that. We can create Parallel Streams to take advantage of multi-core architecture so that code gets executed faster without any effort when more cores are available.

In this tutorial, we will introduce you to Parallel streams. With Parallel Streams, we can execute the code faster without any effort when multiple cores are available just by leveraging the parallel streams.

Parallelism is for Speed. Most of the computers come with MultiCore processors. So our programs can be designed in a way so that it can take advantage of multicore processing means More resources at work simultaneously.

Streams Can do that Job very easily without putting a lot of effort!

There are two ways of processing with streams, Sequential and Parallel

By Default all streams are sequential but we can make them parallel by just calling another method.

Creating Parallel Stream:

  1. Using parallelStream() method on a collection, which will create a parallel stream of collection elements.
  2. Using parallel() method on a stream, which takes a stream and returns equivalent parallel stream.

Let’s practically see how parallel streams can improve performance by taking a code example. Where we will compare the processing time using sequential vs parallel streams.

Consider the following example.

//This is Employee bean class

public class Employee {
	private String name;
	private int salary;
	
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public int getSalary() {
		return salary;
	}
	public void setSalary(int salary) {
		this.salary = salary;
	}
	public Employee(String name, int salary) {
		this.name = name;
		this.salary = salary;
	}

}
// Compare Sequential v/s Parallel Streams

import java.util.ArrayList;
import java.util.List;

public class SequencialVsParrlel {
	
public static void main(String[] args) {
	
	long time1, time2;
	//Step 01
	List<Employee> eList = new ArrayList<Employee>();
	
	for (int i = 0; i < 100; i++) {
		eList.add(new Employee("John", 20000));
		eList.add(new Employee("Rohn", 3000));
		eList.add(new Employee("Tom", 15000));
		eList.add(new Employee("Bheem", 8000));
		eList.add(new Employee("Shiva", 200));
		eList.add(new Employee("Krishna", 50000));
	} 

	//Step02
	/***** Here We Are Creating A 'Sequential Stream' & Displaying The Result *****/
	
	time1 = System.currentTimeMillis();
	System.out.println("Sequential Stream Count?= " 
		+ eList.stream()
		.filter(e -> e.getSalary() > 1000)
		.count()
	);


	
	time2 = System.currentTimeMillis();
	System.out.println("Sequential Stream Time Taken?= " + (time2 - time1) + "\n");

	//Step03
	
	/***** Here We Are Creating A 'Parallel Stream' & Displaying The Result *****/
	time1 = System.currentTimeMillis();
	System.out.println(
	"Parallel Stream Count?= " 
			+ eList.parallelStream()
			.filter(e -> e.getSalary() > 1000)
			.count());

	
	time2 = System.currentTimeMillis();
	System.out.println("Parallel Stream Time Taken?= " + (time2 - time1));
}

}


Explanation:

Out put on my Mac with 4 cores, your output may differ based on your system configuration.

Sequential Stream Count?= 500
Sequential Stream Time Taken?= 27

Parallel Stream Count?= 500
Parallel Stream Time Taken?= 10

In this example, we are having an Employee bean.

Step01 creates a list eList and adding some random Employees, again and again, using a loop.

Step02 records two timelines in variables time1 and time2. time1 records the time before processing the stream for counting employees on the list having salary > 1000 and time2 records the time after processing. Finally, we print the time taken in this sequential processing by subtracting time2-time1.

Step03 record time the same way in step02 before and after the processing but using the parallel streams this time.

If you see the output, in my computer having 4 cores time taken using parallel processing using streams is much less than in sequential processing.

So, see the difference in the durations taken by these two approaches. Parallel streams are very powerful and they give you the speed.

But this deciding whether we should go parallel or not is not easy because going parallel is very expensive.

Parallelism comes with some implications – like the outcome after the processing should be the same as sequential processing.

For Example, if we are aggregating some data so the resultant data set should be the same as it would be with Sequential Processing. If Parallel is not used carefully it can result in an invalid output.

So while applying a parallel operation on some data we need to keep in mind that the stream should be 

Stateless: state of one element should not affect another element

Non-interfering: data source should not get affected during the operation

Associative: one operation result should not be affected by the order of data under processing

Moreover,

Parallel streams are built on top of fork-join framework.which is a multithreaded framework, So that it doesn’t have any synchronization or data visibility issues. This can be avoided easily if we are not changing the state of an object throughout the pipeline.

So however it is very easy to make the code work in parallel with streams but

We cannot just make something parallel and expect it to be faster than sequential processing. 

parallel execution has more complexity than the sequential. In reality, sometimes parallelism will speed up your computation, sometimes it will not, and sometimes it will even slow it down

We have to use parallel streams wisely as it’s not wonderful all the time.

Indeed, 

There are times when it can slow down the operations.

In the next topic, we will discuss stateful and stateless operations in detail.

So the key point of this topic to remember whenever you get a thought of making the code work in parallel, First identify that if parallelism will help you and your operation satisfies the above three properties to get a valid result.