The HDF Group's Call the Doctor

HPC and Fortran expert Scot Breitenfeld on Call the Doctor 8/27/24

The HDF Group, Scot Breitenfeld,

The HDF Group's Scot Breitenfeld ended up using this session to discuss topics including subfiling, support of multidimensional arrays by various languages, the future of Fortran and more with community members and The HDF Group co-workers. 

In this session, we mentioned Aleksandar Jelenak's Call the Doctor session last week where he discussed Hidefix, a Rust package for multithreaded read-only access to local HDF5/netCDF4 files. The link was provided in chat and is being provided here for your convenience.

This session was recorded on August 27, 2024. 

Call the Doctor is a series of weekly, unscripted, live events! The HDF Group’s staff members will answer attendee questions and, for example, go over the previous week’s HDF Forum posts. The HDF Clinics are free sessions intended to help users tackle real-world HDF problems from a common cold to severe headaches and offer relief where that’s possible. As time permits, we will include how-tos, offer advice on tool usage, review your code samples, teach you survival in the documentation jungle, and discuss what’s new or just around the corner in the land of HDF.

Join us every Tuesday at 12:20 p.m. central (US/Canada.) on Zoom!

M. Scot Breitenfeld  0:00  
Dana, do you have any questions?

Dana Robinson  0:03  
No, I have any questions. I think holding the 114, five release answer is no.

M. Scot Breitenfeld  0:11  
We were talking about, I was talking with Jordan, that there is a new API that might be useful for subfiling that maps what rank is on what node so of the subfiling, um, but I don't think it's not going to make it into this one. But we might think about maybe under rapids, getting that into the actions, yeah, yeah, yeah, that's cool. Yeah. Other than that, there's, yeah, there's no. I don't think there's any hold ups. I'm testing. I'm going to see I was curious to see if stuff finally works on max.

Dana Robinson  0:52  
Yeah. There's a few other things I want to add in testing too. Like, we don't have any auto tools testing on the Mac we, um, we should be doing Intel parallel and CI still, stuff like that.

M. Scot Breitenfeld  1:08  
Will Intel dropped their support for Mac?

No, I just mean Intel parallel in general. Oh, I just turned on subfiling And, uh, the GCC action

on Linux, yeah, I was just gonna say I don't have any material that I present. I've have readily available to present. So hopefully, if people have questions, feel free to ask. Alexander, I did a question for you in Python. How are is Python row or column? Major ordering for like arrays. It's neither right? Is that? How I understand it? It's the Alexander stepped away.

Dan Kahn  2:16  
The Python itself doesn't support multi dimensional arrays. That's all done through the numpy package. So I think, I mean, you could have other numerical packages, but I think that's the one everybody's using, okay? And I think by default, it's the same as c. So what does that make it the row? Yeah, I always hate using row and column because those, because, because some vendors, is it like, I think it's like, left index major. So the left and left most index is the slowest one.

M. Scot Breitenfeld  2:56  
Yeah, I think so

Dan Kahn  2:57  
right and then, and then the right most index is the one moving the fastest. And I have this vague recollection that maybe you can create arrays where they can work the other way, but I've never done that. Okay.

Aleksandar Jelenak  3:18  
Sorry, I saw that someone wanted me to unmute What was the question?

M. Scot Breitenfeld  3:22  
Who has been answered? Sorry, I was just

Dana Robinson  3:28  
asking about raising Python, erasing Python, multi dimensional. Raise. Okay. Picture generally handled through NumPy, correct.

Aleksandar Jelenak  3:43  
Sorry, I can't, can you repeat generally

Dana Robinson  3:45  
handle through NumPy?

Aleksandar Jelenak  3:49  
Yes, yes. If you're serious about that stuff, you use NumPy, nothing else,

Dana Robinson  3:53  
because otherwise, if you try to do it using native Python stuff, you get ragged. No,

Aleksandar Jelenak  3:57  
no, no, no. I mean, NumPy is what turned Python into a science and engineering, you know, language powerhouse. So it was basically before, you know, just another, you know, scripting, doodling language, yeah,

Dan Kahn  4:18  
kind of related to that. One of the things I've noticed recently is that neither, it seems that neither Golang nor rust support multi dimensional arrays natively. And I'm not sure how good the support is with any kind of add on. I was kind of interested in those languages with respect to HDF five, but it seems like if they really can't support multi dimensional arrays, that may not be a very good match.

Aleksandar Jelenak  4:46  
I don't, I don't know, I don't know anything about going but for rust, there is a package that tries to mimic NumPy.

Dan Kahn  4:53  
And do you know if it's terribly successful, or

Aleksandar Jelenak  4:58  
I think that it's terribly recent, okay, so that might be, you know, one of the things, but they definitely have, because the HDF five crate in ROSS basically, I think it's called MD array,

Dan Kahn  5:13  
okay, yeah. Have you, you know, if you use the HDF five crate.

Aleksandar Jelenak  5:22  
I actually had my talking last week was showcasing a rust, okay, package that uses HDF5 create. Ah,

Dan Kahn  5:34  
okay, I think I couldn't make it last week because I Are you often have a meeting which conflicts with this one is, I guess there was a recording of that made,

Lori Cooper  5:42  
yeah, I'll put it in the I'll put the link in the chat. 

Dan Kahn  5:45  
Oh, okay, thanks, yeah. You know, every so often I try to see if we can get away from our C based codes and Russ might be a good candidate for that. 

M. Scot Breitenfeld  6:03  
Is there a reason why you want to get away from c base code?

Dan Kahn  6:07  
Oh, mostly it's just to get away from the whole specter of buffer overruns things like that. I mean, it's just it doesn't make sense that, you know, you should have to, you know, we have computers that should be able to do this kind of bookkeeping for us.

M. Scot Breitenfeld  6:29  
Yeah, it's called Fortran.

Dan Kahn  6:31  
Well, yes. And you know, the problem with Fortran has been that we also wanted some, I mean, this isn't necessarily for our, our heavy numerical stuff, but we have a lot of, I guess, just kind of data processing, kind of efforts within, in particular our, you know, we deal with satellite data processing. And so the the satellite data comes down in a format called, what is it? Called ccsds. It's kind of essentially a binary format, and we have to convert it into HDF five for the head of our processing chains. And that was written in C. And this might be a those codes need to be rewritten anyway, or dramatically revised, and it might be a good time to move to something that's a little more reliable.

M. Scot Breitenfeld  7:33  
Yeah, the thing I like about Fortran is it handles all that memory management. For you, it's kind of hidden.

Dan Kahn  7:40  
Yes, yes, no, I agree that's, you know, the fact that you really have to worry about memory leaks. But I think rust may also do that, and I think go Lang does as well, but the issue comes up as to whether or not they actually have native support for, or any support it doesn't have to be native for, you know, I don't know what you call it True. True. Multi dimensional arrays, arrays with, you know, strides and offsets, instead of just list of lists 

M. Scot Breitenfeld  8:20  
you're saying those other languages,

Dan Kahn  8:22  
Golang and, yeah, and rust, I think at least out of the box, only really support list of list of lists, or arrays of arrays, if you want to call it that as a to get kind of multi dimensional support. And that doesn't really, that doesn't really map well if you're reading in a an array through the HDF five library, where you your array can come in as a sort of a contiguous block.

Aleksandar Jelenak  8:52  
I think one thing going forward, really for us is this cargo tool that really manages dependencies and a bunch of other things, and building and testing and debugging. It's really powerful. I think that that it kind of feels like it's from a recent era. You know, it's really done for people who want to start from now and move forward, rather than, you know, dealing with decisions from 1977 and then onwards. You know what I mean?

Dan Kahn  9:18  
Yeah, I think, I think I have noticed something about that, and one of the concerns I had was that, in addition to HDF five, a lot of our processing depends on being able to read YAML files, because a lot of our control files are done that way. In fact, all of them are done that way. And there was a YAML interface I noticed for for rust. And then there was some concern, because one of the packages it depended on kind of went out of support. And I guess that, which is to say, you know. Nobody, the volunteer maintainer, stopped maintaining it, and it then kind of became unavailable, and everything that depended on that became unavailable. I I really don't know the details, but it seemed like there was kind of a knockout effect when, when one package ceased having a an official maintainer.

Dana Robinson  10:22  
It isn't a problem with Rust that it's still actively developing so that code starts to not work after a while. I mean, C code is going to like work forever, but

Aleksandar Jelenak  10:35  
I don't know that that's I never heard that, but I'm not following that closely. I would just looked down either, yeah, I just look at this last week, I had to deal with it a little. This cargo tool is just like, amazing, you know, just bring dependencies built for you. You know, you don't have to really makes it really modern feel. That's how I would say. Can be template. You can, you can generate a templated template file for your project, all these kind of things. You know, I think, goes a long way to standardize from the get go how things are done and best practices, rather than wait 1015, years to percolate something.

I think, I think a lot of people will use it. I mean, I see a lot of impact in Python, because there's a bunch of things now written in Rust they're super fast and really easy, imported from Python, and Python starts to benefit from from that as well. So I think the Rust will maybe not overtake, you know, who knows, but I think you'll just keep going, yeah.

Dan Kahn  11:50  
And there was some, what was it? The one of the various, was it the National Institutes of Standards or the National Security Agency kind of issued a, I don't know if it's a requirement or recommendation to write software languages that don't have these problems with buffer overruns, because that's frequently, I mean, in addition to just being inconvenient, when your code crashes, it's frequently the source of security problems, right?

Dana Robinson  12:20  
I remember that that was recent. I

Dan Kahn  12:23  
think it was within the last year, and they had a whole list of recommended I don't remember if rust was on there, but you know anything where you know, it handles the, you know, the buffer overruns, you know, for you, or at least prevents them.

Dana Robinson  12:48  
Scot, do you know what the status of Fortran is going forward? I know, like they're all working on that, like, read on clang. I think that was a labs project. But

M. Scot Breitenfeld  12:58  
you mean new features? I mean

Dana Robinson  12:59  
there was, like, there's a lot of soul searching. It was like a year ago, where they're trying to figure out what to do with Fortran, because they all those code written. Fortran was hard to find developers, and no language was as well maintained. And

M. Scot Breitenfeld  13:15  
I know Greg can talk to this, but pretty last time I was at Sandia, they were pushing to move all the Fortran codes to C++. And I think their main now language is C++. That might have changed, but at least at Sandia, that's what I thought they were all going to C++.

Greg Sjaardema  13:36  
Yeah, we're not doing much with Fortran anymore. At least in our group, I think there are a couple that are still using some Fortran.

Dan Kahn  13:46  
Yeah, I think a couple years ago, I came across some opinions that said that the Fortran developers or standards committee had kind of, I don't know, erred in the sense that they put a lot of time and effort into developing various features for parallel processing, and they hadn't put any effort into supporting hash tables or list data types, which are pretty common now in sort of almost every modern language. And they're a lot of programmers really depend on them for doing the kind of nuts and bolts bookkeeping that you have to do. And I think that's at least, that's kind of been my experience, that it kind of makes it hard to to, sort of to get things up and running quickly, and for training,

M. Scot Breitenfeld  14:37  
yeah, if you're doing link lists and stuff like that. It's not reveal doing that Fortran as  C++,

Dan Kahn  14:46  
yeah. And so the and then maybe also some of these parallel features have they were perhaps incorporated into the language before they were really mature. The. Um, it may be that, you know, there's some thought that maybe it should have been more of a an experimental effort, instead of defining it in but, you know, they have limited amount of time and resources, and that's where they that's where they

M. Scot Breitenfeld  15:09  
put them. Yeah, I haven't looked too much at the new standards to 2024 I think so it just came out. Or maybe 2023

they did dress, I think that you can do a character array that's can be assigned a dimension, a length dynamically, which is nice. So,

Dan Kahn  15:37  
okay, so it's like a string data type instead of having, instead of having every string be its own data type depending on its length. Yeah,

M. Scot Breitenfeld  15:47  
it's nice for passing to C, because they used to have a DAF have if you're, if you're passing a string to C, then there has to be one length of one, right? But they can be in an array where, I think they've removed that restriction. Now you can actually pass a string or a character of more than length one,

Dana Robinson  16:19  
but How soon is that going to actually show up in out there.

M. Scot Breitenfeld  16:24  
It's in it's, I think it's been an extension to compilers for quite some time. It just hasn't been part of the standard, because I've used it before, and it doesn't complain

Dana Robinson  16:39  
because, like, are we worried we're talking about using Fortran 08 features, yes. I mean, that's like a 15 year old standard, yeah.

M. Scot Breitenfeld  16:58  
Yeah. I don't think Fortran gets the attention it used to in compiler world.

Is there any other questions or concerns? Well,

this might be a short reading, and I don't have anything, so,

Dana Robinson  17:27  
yeah, the next one I'll talk about the release that's coming up, the 114, five release that's going to be largely a bug fix release with some tweaks for cloud algorithms, Issue five and such. Talk a little bit more about the where our thinking is on a 116 release between then and in the spring.

Lori Cooper  17:56  
Okay, all right, thanks, Dana, Thanks, Scot. Kind of glad we just had a little discussion time that was nice. All right, see you, everyone next week. Thank you.

Transcribed by https://otter.ai

People on this episode