I was a software engineer at that time, mostly working with Java. But I enjoyed listening to DevOps people having problems. They all loved me because they thought I also received alerts in the middle of the night; I didn't.
Here I will present the Data Lake architecture, which introduces an interesting twist on storing and processing data. Data Lake is not a revolution in the big data world, a one-size-fits-all solution, but a simple evolutionary step in data processing, which naturally came to be.
Aspiring Cassandra engineer-apprentice was fiddling with Cassandra cluster trying to fetch the data he needed. For a while, he was receiving strange responses from the server. But after hacking his way through the CQL, he finally received the response he was looking for. He felt so proud...
The best way to test an infrastructure before going into production is to mimic production load and solve problems that arise. One of the main challenges with this approach is having a load generator that can provide both rate and message size as close to production as possible
In the first two blog posts, we gave a couple of pointers about the settings which can be found in the driver and which you can leverage to improve the performance of application to Cassandra communication.Most of the times, this is enough to improve the performance significantly.
In this part we will concentrate on some more advanced features, speculative executions let driver issue parallel request if some threshold is not reached and latency aware load balancing policy measures and penalize slow performing nodes and leverages nodes with good performance.
In this part we covered first round of settings which can give you quick wins. Stay tuned since next week we will cover speculative executions and latency aware load balancing policy which can help you get even better performance for some use cases.
The Apache Software Foundation (ASF) has been a steward of free open source software (FOSS) for over 15 years. And the ASF has overseen many of the top FOSS projects this decade (Hadoop, Spark, Kafka, Cassandra, Mesos, Lucene, Tomcat, Zeppelin, Log4j, Parquet, Zookeeper, TinkerPop, etc.
Big Data Engineering is such a field of work where there is a lot of cogs lying around waiting to be integrated. In other words, DevOps and, in this particular case, Linux are inevitable and inseparable parts of Big Data Engineering. This is a story about one of those cogs.
In our previous post we referred to the subject of having logs in a central place and viewing aggregated data from all the nodes. This blog post presents our learning process, while working on a complex use case with tight SLA where every piece of information counts.
Working on high nines where the latency of every query matters is a whole different beast in comparison with normal Cassandra use cases where measurements on 99.9% or even 99% are enough.We recently worked on a project which had really tight latencies-threshold in milliseconds on 99.999% of requests