Hello everyone, bring everyone to know big data today.
where is the big data, For laymen, big data nouns sound very high and atmospheric. It is very abstract and unintelligible.
But as a programmer, database administrator, Big Data needs to know in detail. This will be your future address.
First of all, let me introduce what big data can do. Everyone knows that we are buying on Taobao. Need to register an account, then buy goods, add to the shopping cart, payments and deliveries, logistics information of goods, etc.
This information is called data on the computer. The generated data will be stored on the Taobao server. Hundreds of millions of data are stored every day on the Taobao server.
The generated data will be stored on the Taobao server. Hundreds of millions of data are stored every day on the Taobao server.
Because these data are huge, they are called big data. According to the underlying big data, intelligent analysis is carried out in the background. You can analyze the purchase preferences of a specific user. Information about the sales volume of certain products.
Then, when extracting the results of the bulk data, the specified element is sent to the user. This is the comfort that the era of big data offers you.
Because there is a large amount of data in big data, storing and analyzing this data is mainly to solve the problem of efficiency. How to store and analyze large amounts of data quickly? What technologies are currently being used? Take these questions Let me talk about the need to charge for learning.
First, the Java language is the basis that must be mastered for the development of big data. Next you must understand the programming in network, mysql and other databases. After further study, it is necessary to understand the cluster technology.
Because large amounts of data require multiple databases for storage and processing. That is, distributed storage. The solution that is currently being solved is to adopt the Hadoop architecture. Hadoop implements a distributed storage system with high fault tolerance and high performance to access application data, suitable for applications with very large data sets.
The high performance of Hadoop and the massive data processing capabilities facilitate the handling of large amounts of data. But Hadoop is not good at real-time computing, another big data technology storm is being born. It has real time data processing capacity.
For example, yesterday you bought a pair of shoes on Taobao and today you want to buy a hat. The results of the big data analysis remain yesterday, and the system constantly recommends shoes. Instead of thinking about your needs today.
Using Storm technology, it will analyze your needs in real time and the system will recommend hats according to the results. Another technology for big data, Apache spark, is a big data processing framework created around speed, ease of use and complex analysis. It is used to work with Bigoop and Storm technology to handle large data. Accelerates big data processing through parallel computing.
The other direction of Big Data is machine learning to obtain an efficient machine learning system. You need to train it with a small amount of data. The machine learning system does not fail with a small amount of data. Then, increase the amount of learning and, finally, you can connect through repeated tests.
Machine learning under big data requires a large number of examples of learning, under a large number of examples of learning. The machine can be examined as a human being, the analysis process obtains useful data and the results are finally sent to the user.
Machine learning requires mastering the R, python and mahout language techniques. Machine learning applications have broad perspectives, not just big data. It can become a mandatory course for future programmers.