Java Streams

Creating your own Streams using custom Spliterator and How Streams works internally in Java

January 4, 2020

The Agenda of this post is to make you understand how Streams works internally in Java and how to implement your custom Spliterator

Streams in Java 8 and beyond are the most favorite Objects used to process the data. Every Collection object in Java returns the Stream Object provided by JDK in JDK 8 and above versions.

  List<String> list = 
   new ArrayList<>();
  list.add("Basics");list.add("Strong");
  Stream<String> stream = 
   list.stream();
  System.out.println(stream.count()); // 
 //output 2

Streams give us the flexibility to pass lambdas you may call it your little cute algorithms and a lot of methods automatically available like Filter, Map, Reduce, FlatMap, etc. Streams work as a method pipelines and execute that. Streams come with many advantages like concise code, Lazy evaluation for better performance and many others.

Please remember, Streams are not data containers and streams do not contain any data. You may think streams as fancy iterator and data just pass through that. This is analogous to an old iterator.

Streams connect to some data source like above it connects to a list, the data passes through a stream, and operations are applied. We call them intermediate and terminal operations. This whole thing is called the Stream pipeline. Stream in Java is a Class located in java.util.stream package.

Have you ever thought that like Java created the Stream Object and Connected that to underline Java Collection? Can you also create your stream and connect that to any other data source that JDK supports? Like File, DataBase, WebService as these also holds that data.

 In this tutorial let’s discuss and implement our own stream and connect that to an Array, You can then build your own Stream that can process that data from any data source you like. As a result, your API consumers can take advantage of the well known and standard stream data processing methods. 

 If you look at the Collection Interface in java.util you will find two default methods

@Override
  default Spliterator<E> spliterator() {
   return Spliterators.spliterator(this, 0);
  }
  default Stream<E> stream() {
   return StreamSupport.stream(spliterator(), false);
  } 

If you observe them correctly

1. The stream() default method we can use on every collection returns a Stream of Elements. Remember Stream is also a class in JDK in java.util.stream package.

2. The stream() method internally uses JDK’s StreamSupport class and calls its’s method by passing the spliterator() and some boolean values.

3. The spliterator() method returns a Spliterator of elements.

 As we have already discussed Streams. But what is Spliterator?

From Java’s perspective, Spliterator is an Interface in Java located in java.util.Let’s discuss a little more about Spliterator to understand what it actually does.

We can think it like, As java’s old collections processing was done via Iterators and Iterators were having methods like next and hasNext() the data processing in Streams is done via Spliterator. As data structures on the source can be different for different sources so the Iterators or Splitrator will have to be different for different data sources.

It’s like if you are having pizza on your plate you will use a fork and knife to eat that but if you are having soup you may need a spoon.

In programming or data structures terms, If you have your data in List you will access the elements differently and if you are having the data in a tree you will access them differently.

 This mechanism is provided to Stream using Spliterator implementation. 

 So there are two things you need to keep in mind while creating your own streams from custom sources 

 1. What is your data source and what is the implementation of your Spliterator to access the data from that source?

2. How data processing is going to happen (Using ReferencedPipeline)?

 Point no 1 is clear from above, for point no 2 as soon as you provide an implementation of Spliterator data processing is taken care of by Stream classes called ReferencedPiplines. This is a class having a lot of complicated implementations like map, filter, flatmap reduce etc. You are not needed to touch this class as this is way complicated and let the few things been taken care of by JDK.

 Now the Agenda is how to build our own custom Spliterator that in turn can be used to access the data and provide to Stream, and then ReferencedPiplines can be used to process the data accessed from source via Spliterator.

So how the Spliterator interface look like?

public interface Spliterator<T> {
 
  // For next element to process
  boolean tryAdvance(Consumer<? super T> action);


  // Related parallel Streams
  Spliterator<T> trySplit();


  // For Size of the data source
  long estimateSize();


  // Setting Stream characteristics
  int characteristics();


}

The Spliterator Interface contains many methods but the above 4 methods are abstract that you need to provide an implementation of these 4 methods to create your custom Spliterator Object. We are having tryAdvance method to access the next element that returns boolean if the element is present. trySplit is used mainly when we want to go parallel because here we can provide a splitting mechanism to parallel running thread. estimateSize is the size estimation and the final one is characteristics.

 Characteristics are the characteristics of Stream and declared as constant in the Splitiraotr Interface.

public static final int ORDERED  = 0x00000010;
public static final int DISTINCT  = 0x00000001;
public static final int SORTED   = 0x00000004;
public static final int SIZED   = 0x00000040;
public static final int NONNULL  = 0x00000100;
public static final int IMMUTABLE = 0x00000400;
public static final int CONCURRENT = 0x00001000;
public static final int SUBSIZED = 0x00004000;

As the name suggests these are the properties we can set into a Spliterator.

Here Let’s try to build our custom but very simple Spletrator using an Array, and In turn, creating our Stream

package com.basicsstrong.customstreams;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.Spliterator;
import java.util.function.Consumer;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

public class myCustomStreamDemo {

	public static void main(String[] args) {

		// Step 01 - Start

		String[] stringArray = new String[100];

		// Let’s fill this Array by some Strings, for the sack of putting different
		// String Objects 
		// I am putting Basics on even location and Strong on odd locations of the
		// Arrays

		for (int i = 0; i < stringArray.length; i++) {
			if (i % 2 == 0) {
				stringArray[i] = "Basics";
			} else {
				stringArray[i] = "Strong";
			}
		}

		// Step 01 Ends
		// Step 02 Starts

		// Here we are defining our own splitrator
		Spliterator<String> mySpliterator = new arraySpliterator(stringArray);

		// Here we are Defining our own stream using stream support
		// and passing the Splitiratir above.

		Stream<String> myStream = StreamSupport.stream(mySpliterator, false);

		// Step 02 Ends
		// Step 04 Starts
		// Let’s try calling some classic stream methods one by one
		// Try only one method at a time as stream goes through only once and then
		// closed.
		myStream.forEach(System.out::println);
		// System.out.println(myStream.count());
		// System.out.println(myStream.distinct().count());

		// Step 04 Ends

	}

}

//Step 03 Starts
// creating arraySpliterator from an Array Implementing Spliterator Interface 

class arraySpliterator implements Spliterator<String> {

	private String[] arrayToSplit;
	int count = 0;
	// Constructor to access Array this is the source

	public arraySpliterator(String[] arrayToSplit) {
		this.arrayToSplit = arrayToSplit;
	}

	@Override
	public boolean tryAdvance(Consumer<? super String> action) {

		if (count <= arrayToSplit.length - 1) {
			action.accept(arrayToSplit[count]);
			count++;
			return true;
		} else {
			return false;
		}

	}

	@Override
	public Spliterator<String> trySplit() {
		return null;
	}

	@Override
	public long estimateSize() {
		return arrayToSplit.length;
	}

	@Override
	public int characteristics() {
		// For Simplicity lets use the characteristics of a List as base
		return Arrays.asList(arrayToSplit).stream().spliterator().characteristics();
	}

}

// Step 03 Ends

In Step 01: from start to End we are creating a simple array of Strings of length 100, You may also take the length anything you like.

In Step 02: In Step 02 we are creating a Splitrator of Strings and Stream of Strings using the Splitratior Above

In Step 03: We are Implementing all 4 required abstract methods of Spliterator

In Step 03 let me explain how the methods are implemented, let me take you from the bottom to top.

In characteristics() we are returning the characteristics of the List for keeping it simple, although you can create your own.

estimateSize() we are returning the length of the array passed

In trySplit we are returning null, because we don’t want to go parallel for now.

In tryAdvance we are consuming the array elements until they are there else returning false, remember to increase the count to 1 after consuming one element.

And finally, in the constructor, we are accepting the array and initialize the class variable.

In Step 04 : We are trying calling some classic stream methods one by one If its working fine.

I hope, we are able to demo you how Streams are built internally and what is the role of Spliterator. You need to remember few classes to keep in your memory
1. Splitrator Interface

2. StreamSpoort class

3. Stream Class

4. ReferencedPipleLine class (For Algorithums)

You may also like to check the Internal Implemetatoon Splitrator of some classic data structure like ArrayList that is the simplest one to understand it more. This demo is kept very simple but JDK implementation of Spliterator is much more complex for other collections like HashMap, TreeSet, etc.

 Hope you have enjoyed the tutorial, Thanks for reading. Please post questions in comments if Any.

Happy Learning!