Collaborative filtering (CF) is one of the most popular techniques for building recommender systems. It is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating).
In the first two blog posts, we gave a couple of pointers about the settings which can be found in the driver and which you can leverage to improve the performance of application to Cassandra communication.Most of the times, this is enough to improve the performance significantly.
Recommender systems have become ubiquitous and very important in recent years. They are present in a variety of areas, such as for recommending movies, music, news, books. One of the most popular algorithms used to produce a list of recommendations is collaborative filtering.
In this part we will concentrate on some more advanced features, speculative executions let driver issue parallel request if some threshold is not reached and latency aware load balancing policy measures and penalize slow performing nodes and leverages nodes with good performance.
In this part we covered first round of settings which can give you quick wins. Stay tuned since next week we will cover speculative executions and latency aware load balancing policy which can help you get even better performance for some use cases.
The Apache Software Foundation (ASF) has been a steward of free open source software (FOSS) for over 15 years. And the ASF has overseen many of the top FOSS projects this decade (Hadoop, Spark, Kafka, Cassandra, Mesos, Lucene, Tomcat, Zeppelin, Log4j, Parquet, Zookeeper, TinkerPop, etc.
Big Data Engineering is such a field of work where there is a lot of cogs lying around waiting to be integrated. In other words, DevOps and, in this particular case, Linux are inevitable and inseparable parts of Big Data Engineering. This is a story about one of those cogs.
In our previous post we referred to the subject of having logs in a central place and viewing aggregated data from all the nodes. This blog post presents our learning process, while working on a complex use case with tight SLA where every piece of information counts.
Working on high nines where the latency of every query matters is a whole different beast in comparison with normal Cassandra use cases where measurements on 99.9% or even 99% are enough.We recently worked on a project which had really tight latencies-threshold in milliseconds on 99.999% of requests
Browsing through logs is always hard, even when you are on a single node system. You are scrolling up and down, trying to figure out what events happened before a certain error. Often you want to see what followed after the error which happened, then you go back again to see the actual cause etc.
Relational databases have been around for a long time, developers tend to use them often and provided feature set is familiar. It is enough to be familiar with SQL to use them. The design of relational databases is doing a great job in hiding the internals from users.
One of the most important parts of a scalable architecture is a messaging system which is used for communication of application components, log aggregation, event handling, etc. There are some standards that try to describe different protocols but I will focus on the architecture.