Find self-references in (possibly nested) collections

March 16, 2018

I found a nice trick reading part of the ElasticSearch client for Java. Say you are given an object (could be a map, a list, an array or anything) and you need to make sure the same reference does not show up in any of the children objects (or theirs). Here is how this the ElasticSearch guys solved this problem: static void ensureNoSelfReferences(Object value) { ensureNoSelfReferences(value, Collections.newSetFromMap(new IdentityHashMap<>())); } static void ensureNoSelfReferences(final Object value, final Set<Object> ancestors) { if (value ! ... Read more

Mutability and Java 8's method references

March 12, 2018

Method references is a nice feature introduced in Java 8. It lets you make your lambdas even more concise: // from... someStream .map(item -> obj.someMethod(item)) .moreStuff... // to... someStream .map(obj::someMethod) .moreStuff... Some linting tools will even suggest you replace your lambdas with method references but, watch out, that sometimes have some unadverted consequences. For instance, this is a piece of actual code I was working one: private List<Application> applications() { return applications . ... Read more

Merge multi-project test coverage: Gradle + Jacoco + Sonarqube

March 8, 2018

I’m assuming you got here because you are using Gradle with Jacoco and noticed that integrating it with Sonarqube does not work perfectly out of the box. Specifically, when your project has multiple modules, you might have seen that Sonarqube’s coverage report ignores code in module A covered by tests in module B. In fact, this is a problem that you will find even if you are not using Sonarqube: Jacoco itself will not merge test reports by default, which makes it extra hard to find a solution online. ... Read more

Send Flink's logs to ElasticSearch using Log4j

November 29, 2017

Flink uses slf4j as its logging façade, and log4j as the default logging framework (they support logback too). Logs are accessible via Flink’s UI in the JobManager tab which is good for short-lived jobs but unusable for long-lived, streaming applications. You probably want your logs out of there somewhere else; here’s how you can send them to ElasticSearch so you can access them, say, with Kibana. First, you will need a log4j binding for ElasticSearch; Downfy/log4j-elasticsearch-java-api seems to do the job. ... Read more

Shuffle or pick random lines from a file

November 13, 2017

There comes a day in the life a developer when one needs to choose random lines from a text file. This is useful for a myriad of reasons, like taking a random sample of a CSV file, or shuffling the code of your coworkers, just for fun. There are multiple tools in the UNIX toolbox to solve this problem, but the shuf utility is by far the most elegant: ... Read more

Customizing pyspark app start up script

November 10, 2017

A nice feature of pyspark applications is that you don’t have to use spark-submit manually. Instead, when you instantiate a SparkContext instance, it will take care of running spark-submit ... pyspark-shell for you. Sometimes, however, you may want to customize that launch a bit. For instance, it is useful to tell spark-submit to include specific libraries in the classpath; which is in and of itself a pretty cool feature because you can provide Maven coordinates. ... Read more

JVM notes

November 3, 2017

JVM notes A class file consists of bytecode and a symbol table There are two kinds of types: primitives and references Reference types can be either a dynamically allocated class instance or an array. There is no way to distinguish primitive types within bytecode except for the operands used to manipulate them. Each operand has a different version depending on the type: e.g.: iadd, ladd, fadd and dadd are addition operands for int, long, float and double. ... Read more


September 14, 2017

A piece of art from Mort Garson. A good companion for when you need to go there. Happy trip!

© 2017 | Powered by Hugo