
Source code introduction and installation instructions:
Alluxio (formerly known as Tachyon) is a virtual distributed storage system. It bridges the gap between computing frameworks and storage systems, allowing computing applications to connect to numerous storage systems through a common interface. The Alluxio project stems from a research project called Tachyon at the University of California, Berkeley, AMPLab, which is the data layer of the Berkeley Data Analytics Stack (BDAS).
Functions of Alluxio Big Data Storage System:
1. Flexible file API: Alluxio's local API is similar to the java.io.File class, providing interfaces for InputStream and OutputStream and efficient support for memory-mapped I/O. We recommend using this API for the full functionality and best performance of Alluxio.
2. File system interface compatible with Hadoop HDFS: Based on this interface, Hadoop MapReduce and Spark can use Alluxio instead of HDFS.
3. Pluggable underlying storage: Alluxio supports persistence of memory data to the underlying storage system. Alluxio provides a common interface to simplify interfacing with different underlying storage systems. Currently, Alluxio supports Microsoft Azure Blob Store, Amazon S3, Google Cloud Storage, OpenStack Swift, GlusterFS, HDFS, MaprFS, Ceph, NFS, Alibaba OSS, Minio and single-node local file systems, and will support more other storage systems in the future.
4. Alluxio hierarchical storage: Alluxio can manage memory and local storage such as SSDs and HDDs to accelerate data access. If more granular control is needed, tiered storage capabilities can be used to automate the management of data between different tiers, ensuring that hot data is on faster storage tiers. Custom policies can be easily applied to Alluxio, and the concept of pins allows users to explicitly control where data is stored.
5. Unified namespace: Alluxio can achieve efficient data management between different storage systems through the mounting function. In addition, the transparent naming mechanism can preserve the file name and directory hierarchy of storage objects when persisting them to the underlying storage system.
6. Web UI: Users can browse the file system through the Web UI. In debugging mode, administrators can also view detailed information about each file, including storage location, checkpoint path, etc.
7. Command line: Users can also use./ Bin/alluxio fs interacts with Alluxio, for example, implementing copying data in and out of the file system.
Comments0