Tuesday, 29 October 2013

More maps and reduction.

Maps and reduction are useful in a variety of situations beyond just simple math. After all, in any case where a collection of objects can be transformed into a different object (or value) and then collected into a single value, map and reduction operations work. The map operation, for example, can be useful as an extraction or projection operation to take an object and extract portions of it, such as extracting the last name out of a Person object: Once the last names have been retrieved from the Person stream, the reduction can concatenate strings together, such as transforming the last name into a data representation for XML.

String xml =
"<people data='lastname'>" +
people.stream()
.map(it -> "<person>" + it.getLastName() +
"</person>")
.reduce("", String::concat)
+ "</people>";
System.out.println(xml);

And, naturally, if different XML formats are required, different operations can be used to control the contents of each format, supplied either ad hoc, or from methods defined on other classes, such as from the Person class itself, which can then be used as part of the map() operation to transform the stream of Person objects into a JSON array of object elements. The ternary operation in the middle of the reduce operation is there to avoid putting a comma in front of the first Person serialized to JSON. Some JSON parsers might accept this format, but that is not guaranteed, and it looks ugly to have it there. It is ugly enough, in fact, to fix. The code is actually a lot easier to write if we use the built-in Collector interface and its partner
Collectors, which specifically do this kind of mutable-reduction operation .
This has the added benefit of  being much faster than the versions using the explicit reduce and String::concat from the earlier examples, so it’s generally a better bet.Oh, and lest we forget our old friend Comparator, note that Stream also has an operation to sort a stream in-flight, so the sorted JSON representation of the Person list looks like.This is powerful stuff.
Parallelization.

What’s even more powerful is that these operations are entirely independent of the logic necessary to pull each object through the Stream and act ion each one, which means that the traditional for loop will break down when attempting to iterate, map, or reduce a large collection by breaking the collection into segments that will each be processed by a separate thread. The Stream API, however, already has that covered, making the XML or JSON map() and reduce() operations shown earlier a slightly different operation—instead of calling stream() to obtain a Stream from the collection, use parallelStream() instead. For a collection of at least a dozen items, at least on my laptop, two threads are used to process the collection: the thread named main, which is the traditional one used to invoke the main() method of a Java class, and another thread named ForkJoinPool.commonPool worker-1, which is obviously not of our creation.
Obviously, for a collection of a dozen items, this would be hideously unnecessary, but for several hundred or more, this would be the difference between “good enough” and “needs to go faster.” Without these new methods and approaches, you would be staring at some significant code and algorithmic study. With them, you can write parallelized code literally by adding eight keystrokes(nine if you count the Shift key required
to capitalize the s in stream) to the previously sequential processing. And, where necessary, a parallel Stream can be brought back to a sequential one by calling—you can probably guess— sequential() on it.
The important  thing to note is that regardless of whether the processing is better done sequentially or in parallel, the same Stream interface is used for both. The sequential or parallel implementation becomes entirely an implementation detail, which is exactly where we want it to be when working on code that focuses on business needs (and value); we don’t want to focus on the low-level details of firing up threads in thread pools and synchronizing across them.

No comments:

Post a Comment