ARC’s Data Science Platform, an upgraded Hadoop cluster, is now available

Advanced Research Computing at U-M (ARC) is pleased to announce the availability of our Data Science Platform, an upgraded Hadoop cluster that will foster and support new collaborations among data and computational sciences, and enable new data-intensive research in the information, social, biosocial, medical, natural, and engineering sciences. Currently available as a technology preview with no associated charges to U-M researchers, the ARC Hadoop cluster is an on-campus resource that provides a different service level than most cloud-based Hadoop offerings, including:

high-bandwidth data transfer to and from other campus data storage locations with no data transfer costs
very high-speed inter-node connections using 40Gb/s Ethernet

The cluster provides 112TB of total usable disk space, 40GbE inter-node networking, Hadoop version 2.3.0, and several additional data science tools. Aside from Hadoop and its Distributed File System, the ARC data science service includes:

Pig, a high-level language that enables substantial parallelization, allowing the analysis of very large data sets.
Hive, data warehouse software that facilitates querying and managing large datasets residing in distributed storage using a SQL-like language called HiveQL.
Sqoop, a tool for transferring data between SQL databases and the Hadoop Distributed File System.
Rmr, an extension of the R Statistical Language to support distributed processing of large datasets stored in the Hadoop Distributed File System.

If a cloud-based system is more suitable for your research, ARC can support your use of Amazon cloud resources through MCloud, the UM-ITS cloud service. For more information on the Hadoop cluster, please see this documentation or contact us at data-science-support@umich.edu.

Tags:

ARC’s Data Science Platform, an upgraded Hadoop cluster, is now available

Tags:

Subscribe to Our Newsletter

Archives

Categories

ARC seeks list of Flux-related publications

High Contrast Styles:

ARC’s Data Science Platform, an upgraded Hadoop cluster, is now available

Tags:

Subscribe to Our Newsletter

Archives

Categories

ARC seeks list of Flux-related publications

Sign up for our newsletter

High Contrast Styles: