However, 19.95 can't be precisely represented as a binary floating-point number, just like 1/3 can't be exactly represented as a decimal floating-point number. The computer's binary representation of 19.95, therefore, isn't exactly 19.95.
When a floating-point number gets printed, the binary floating-point
representation is converted back to decimal. These decimal numbers are
displayed in either the format you specify with printf(),
or
the current output format for numbers (see $# if you use print. $#
has a different default value in Perl5 than it did in Perl4. Changing $#
yourself is deprecated.
This affects all computer languages that represent decimal floating-point numbers in binary, not just Perl. Perl provides arbitrary-precision decimal numbers with the Math::BigFloat module (part of the standard Perl distribution), but mathematical operations are consequently slower.
To get rid of the superfluous digits, just use a format (eg,
printf("%.2f", 19.95)
) to get the required precision.
oct()
or hex()
if you want the values converted. oct()
interprets both hex (``0x350'') numbers and octal ones (``0350'' or even without the leading ``0'', like ``377''), while hex()
only converts hexadecimal ones, with or without a leading ``0x'', like ``0x255'',
``3A'', ``ff'', or ``deadbeef''.
This problem shows up most often when people try using
chmod(),
mkdir(),
umask(),
or
sysopen(),
which all want permissions in octal.
chmod(644, $file); # WRONG -- perl -w catches this chmod(0644, $file); # right
sprintf()
or
printf()
is usually the easiest route.
The
POSIX module (part of the standard perl distribution)
implements ceil(),
floor(),
and a number of other
mathematical and trigonometric functions.
In 5.000 to 5.003 Perls, trigonometry was done in the Math::Complex module. With 5.004, the Math::Trig module (part of the standard perl distribution) implements the trigonometric functions. Internally it uses the Math::Complex module and some functions can break out from the real axis into the complex plane, for example the inverse sine of 2.
Rounding in financial applications can have serious implications, and the rounding method used should be specified precisely. In these cases, it probably pays not to trust whichever system rounding is being used by Perl, but to instead implement the rounding function you need yourself.
pack()
function (documented in
pack):
$decimal = pack('B8', '10110110');
Here's an example of going the other way:
$binary_string = join('', unpack('B*', "\x29"));
@results = map { my_func($_) } @array;
For example:
@triple = map { 3 * $_ } @single;
To call a function on each element of an array, but ignore the results:
foreach $iterator (@array) { &my_func($iterator); }
To call a function on each integer in a (small) range, you can use:
@results = map { &my_func($_) } (5 .. 25);
but you should be aware that the ..
operator creates an array of all integers in the range. This can take a lot
of memory for large ranges. Instead use:
@results = (); for ($i=5; $i < 500_005; $i++) { push(@results, &my_func($i)); }
You should also check out the Math::TrulyRandom module from CPAN.
localtime()
(see
localtime):
$day_of_year = (localtime(time()))[7];
or more legibly (in 5.004 or higher):
use Time::localtime; $day_of_year = localtime(time())->yday;
You can find the week of the year by dividing this by 7:
$week_of_year = int($day_of_year / 7);
Of course, this believes that weeks start at zero.
When gmtime()
and localtime()
are used in a
scalar context they return a timestamp string that contains a
fully-expanded year. For example,
$timestamp = gmtime(1005613200)
sets $timestamp
to ``Tue Nov 13 01:00:00 2001''. There's no
year 2000 problem here.
s/\\(.)/$1/g;
Note that this won't expand \n or \t or any other special escapes.
s/(.)\1/$1/g;
print "My sub returned @{[mysub(1,2,3)]} that time.\n";
If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:
print "That yields ${\($n + 5)} widgets\n";
See also ``How can I expand variables in text strings?'' in this section of the FAQ.
/x([^x]*)x/
will get the intervening bits in $1. For multiple ones, then something more
like /alpha(.*?)omega/
would be needed. But none of these deals with nested patterns, nor can
they. For that you'll have to write a parser.
reverse()
in a scalar context, as documented in
reverse.
$reversed = reverse $string;
1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
Or you can just use the Text::Tabs module (part of the standard perl distribution).
use Text::Tabs; @expanded_lines = expand(@lines_with_tabs);
use Text::Wrap; print wrap("\t", ' ', @paragraphs);
The paragraphs you give to Text::Wrap may not contain embedded newlines. Text::Wrap doesn't justify the lines (flush-right).
$first_byte = substr($a, 0, 1);
If you want to modify part of a string, the simplest way is often to use
substr()
as an lvalue:
substr($a, 0, 3) = "Tom";
Although those with a regexp kind of thought process will likely prefer
$a =~ s/^.../Tom/;
$count = 0; s{((whom?)ever)}{ ++$count == 5 # is it the 5th? ? "${2}soever" # yes, swap : $1 # renege and leave it there }igex;
tr///
function like so:
$string = "ThisXlineXhasXsomeXx'sXinXit": $count = ($string =~ tr/X//); print "There are $count X charcters in the string";
This is fine if you are just looking for a single character. However, if
you are trying to count multiple character substrings within a larger
string, tr///
won't work. What you can do is wrap a while()
loop around a
global pattern match. For example, let's count negative integers:
$string = "-9 55 48 -2 23 -76 4 14 -44"; while ($string =~ /-\d+/g) { $count++ } print "There are $count negative numbers in the string";
$line =~ s/\b(\w)/\U$1/g;
This has the strange effect of turning ``don't do it
'' into ``Don'T
Do It
''. Sometimes you might want this, instead (Suggested by Brian Foy <comdog@computerdog.com>):
$string =~ s/ ( (^\w) #at the beginning of the line | # or (\s\w) #preceded by whitespace ) /\U$1/xg; $string =~ /([\w']+)/\u\L$1/g;
To make the whole line upper case:
$line = uc($line);
To force each word to be lower case, with the first letter upper case:
$line =~ s/(\w+)/\u\L$1/g;
split(/,/)
because you shouldn't split if the comma is inside quotes. For example,
take a data line like this:
SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"
Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text):
@new = (); push(@new, $+) while $text =~ m{ "([^\"\\]*(?:\\.[^\"\\]*)*)",? # groups the phrase inside the quotes | ([^,]+),? | , }gx; push(@new, undef) if substr($text,-1,1) eq ',';
If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, C<``like \''this\``''). Unescaping them is a task addressed earlier in this section.
Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say:
use Text::ParseWords; @new = quotewords(",", 0, $text);
$string =~ s/^\s*(.*?)\s*$/$1/;
It would be faster to do this in two steps:
$string =~ s/^\s+//; $string =~ s/\s+$//;
Or more nicely written as:
for ($string) { s/^\s+//; s/\s+$//; }
substr()
or unpack(),
both documented in the perlfunc manpage.
$text = 'this has a $foo in it and a $bar'; $text =~ s/\$(\w+)/${$1}/g;
Before version 5 of perl, this had to be done with a double-eval substitution:
$text =~ s/(\$\w+)/$1/eeg;
Which is bizarre enough that you'll probably actually need an EEG afterwards. :-)
See also ``How do I expand function calls in a string?'' in this section of the FAQ.
If you get used to writing odd things like these:
print "$var"; # BAD $new = "$old"; # BAD somefunc("$var"); # BAD
You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:
print $var; $new = $old; somefunc($var);
Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference:
func(\@array); sub func { my $aref = shift; my $oref = "$aref"; # WRONG }
You can also get into subtle problems on those few operations in Perl that
actually do care about the difference between a string and a number, such
as the magical ++
autoincrement operator or the syscall()
function.
Sometimes it doesn't make a difference, but sometimes it does. For example, compare:
$good[0] = `some program that outputs several lines`;
with
@bad[0] = `same program that outputs several lines`;
The -w flag will warn you about these matters.
$prev = 'nonesuch'; @out = grep($_ ne $prev && ($prev = $_), @in);
This is nice in that it doesn't use much extra memory, simulating
uniq(1)'s
behavior of removing only adjacent duplicates.
undef %saw; @out = grep(!$saw{$_}++, @in);
@out = grep(!$saw[$_]++, @in);
undef %saw; @saw{@in} = (); @out = sort keys %saw; # remove sort if undesired
undef @ary; @ary[@in] = @in; @out = @ary;
@blues = qw/azure cerulean teal turquoise lapis-lazuli/; undef %is_blue; for (@blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
@primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31); undef @is_tiny_prime; for (@primes) { $is_tiny_prime[$_] = 1; }
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
@articles = ( 1..10, 150..2000, 2017 ); undef $read; grep (vec($read,$_,1) = 1, @articles);
Now check whether vec($read,$n,1)
is true for some $n
.
Please do not use
$is_there = grep $_ eq $whatever, @array;
or worse yet
$is_there = grep /$whatever/, @array;
These are slow (checks every element even if the first matches), inefficient (same reason), and potentially buggy (what if there are regexp characters in $whatever?).
@union = @intersection = @difference = (); %count = (); foreach $element (@array1, @array2) { $count{$element}++ } foreach $element (keys %count) { push @union, $element; push @{ $count{$element} > 1 ? \@intersection : \@difference }, $element; }
for ($i=0; $i < @array; $i++) { if ($array[$i] eq "Waldo") { $found_index = $i; last; } }
Now $found_index
has what you want.
If you really, really wanted, you could use structures as described in the perldsc manpage or the perltoot manpage and do just what the algorithm book tells you to do.
unshift(@array, pop(@array)); # the last shall be first push(@array, shift(@array)); # and vice versa
srand; @new = (); @old = 1 .. 10; # just a demo while (@old) { push(@new, splice(@old, rand @old, 1)); }
For large arrays, this avoids a lot of the reshuffling:
srand; @new = (); @old = 1 .. 10000; # just a demo for( @old ){ my $r = rand @new+1; push(@new,$new[$r]); $new[$r] = $_; }
for
/foreach
:
for (@lines) { s/foo/bar/; tr[a-z][A-Z]; }
Here's another; let's compute spherical volumes:
for (@radii) { $_ **= 3; $_ *= (4/3) * 3.14159; # this will be constant folded }
rand()
function (see rand):
srand; # not needed for 5.004 and later $index = rand @array; $element = $array[$index];
permut()
function should work on any list:
#!/usr/bin/perl -n # permute - tchrist@perl.com permut([split], []); sub permut { my @head = @{ $_[0] }; my @tail = @{ $_[1] }; unless (@head) { # stop recursing when there are no elements in the head print "@tail\n"; } else { # for all elements in @head, move one from @head to @tail # and call permut() on the new @head and @tail my(@newhead,@newtail,$i); foreach $i (0 .. $#head) { @newhead = @head; @newtail = @tail; unshift(@newtail, splice(@newhead, $i, 1)); permut([@newhead], [@newtail]); } } }
sort()
(described in sort):
@list = sort { $a <=> $b } @list;
The default sort function is cmp, string comparison, which would sort (1, 2, 10)
into (1, 10, 2)
. <=>
, used above, is the numerical comparison operator.
If you have a complicated function needed to pull out the part you want to sort on, then don't do it inside the sort function. Pull it out first, because the sort BLOCK can be called many times for the same element. Here's an example of how to pull out the first word after the first number on each item, and then sort those words case-insensitively.
@idx = (); for (@data) { ($item) = /\d+\s*(\S+)/; push @idx, uc($item); } @sorted = @data[ sort { $idx[$a] cmp $idx[$b] } 0 .. $#idx ];
Which could also be written this way, using a trick that's come to be known as the Schwartzian Transform:
@sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, uc((/\d+\s*(\S+)/ )[0] ] } @data;
If you need to sort on several fields, the following paradigm is useful.
@sorted = sort { field1($a) <=> field1($b) || field2($a) cmp field2($b) || field3($a) cmp field3($b) } @data;
This can be conveniently combined with precalculation of keys as given above.
See http://www.perl.com/CPAN/doc/FMTEYEWTK/sort.html for more about this approach.
See also the question below on sorting hashes.
pack()
and unpack(),
or else
vec()
and the bitwise operations.
For example, this sets $vec
to have bit
N set if $ints[N] was set:
$vec = ''; foreach(@ints) { vec($vec,$_,1) = 1 }
And here's how, given a vector in $vec, you can get those bits into your
@ints
array:
sub bitvec_to_list { my $vec = shift; my @ints; # Find null-byte density then select best algorithm if ($vec =~ tr/\0// / length $vec > 0.95) { use integer; my $i; # This method is faster with mostly null-bytes while($vec =~ /[^\0]/g ) { $i = -9 + 8 * pos $vec; push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); push @ints, $i if vec($vec, ++$i, 1); } } else { # This method is a fast general algorithm use integer; my $bits = unpack "b*", $vec; push @ints, 0 if $bits =~ s/^(\d)// && $1; push @ints, pos $bits while($bits =~ /1/g); } return \@ints; }
This method gets faster the more sparse the bit vector is. (Courtesy of Tim Bunce and Winfried Koenig.)
each()
function (see each) if you don't care whether it's sorted:
while (($key,$value) = each %hash) { print "$key = $value\n"; }
If you want it sorted, you'll have to use foreach()
on the
result of sorting the keys as shown in an earlier question.
%by_value = reverse %by_key; $key = $by_value{$value};
That's not particularly efficient. It would be more space-efficient to use:
while (($key, $value) = each %by_key) { $by_value{$value} = $key; }
If your hash could have repeated values, the methods above will only find one of the associated keys. This may or may not worry you.
keys()
function:
$num_keys = scalar keys %hash;
In void context it just resets the iterator, which is faster for tied hashes.
@keys = sort keys %hash; # sorted by key @keys = sort { $hash{$a} cmp $hash{$b} } keys %hash; # and by value
Here we'll do a reverse numeric sort by value, and if two keys are identical, sort by length of key, and if that fails, by straight ASCII comparison of the keys (well, possibly modified by your locale -- see the perllocale manpage).
@keys = sort { $hash{$b} <=> $hash{$a} || length($b) <=> length($a) || $a cmp $b } keys %hash;
tie()
using the
$DB_BTREE
hash bindings as documented in In Memory Databases.
$key
is present in the array, exists($key)
will return true. The value for a given key can be undef
, in which case $array{$key}
will be
undef
while $exists{$key}
will return true. This corresponds to ($key
, undef
) being in the hash.
Pictures help... here's the %ary
table:
keys values +------+------+ | a | 3 | | x | 7 | | d | 0 | | e | 2 | +------+------+
And these conditions hold
$ary{'a'} is true $ary{'d'} is false defined $ary{'d'} is true defined $ary{'a'} is true exists $ary{'a'} is true (perl5 only) grep ($_ eq 'a', keys %ary) is true
If you now say
undef $ary{'a'}
your table now reads:
keys values +------+------+ | a | undef| | x | 7 | | d | 0 | | e | 2 | +------+------+
and these conditions now hold; changes in caps:
$ary{'a'} is FALSE $ary{'d'} is false defined $ary{'d'} is true defined $ary{'a'} is FALSE exists $ary{'a'} is true (perl5 only) grep ($_ eq 'a', keys %ary) is true
Notice the last two: you have an undef value, but a defined key!
Now, consider this:
delete $ary{'a'}
your table now reads:
keys values +------+------+ | x | 7 | | d | 0 | | e | 2 | +------+------+
and these conditions now hold; changes in caps:
$ary{'a'} is false $ary{'d'} is false defined $ary{'d'} is true defined $ary{'a'} is false exists $ary{'a'} is FALSE (perl5 only) grep ($_ eq 'a', keys %ary) is FALSE
See, the whole entry is gone!
EXISTS()
and DEFINED()
methods differently. For example, there isn't the concept of undef with hashes that are tied to
DBM* files. This means the true/false tables above will give different results when used on such a hash. It also means that exists and defined do the same thing with a
DBM* file, and what they end up doing is not what they do with ordinary hashes.
keys %hash
in a scalar context returns the number of keys in the hash and resets the iterator associated with the hash. You may need to do this if
you use last
to exit a loop early so that when you re-enter it, the hash iterator has
been reset.
%seen = (); for $element (keys(%foo), keys(%bar)) { $seen{$element}++; } @uniq = keys %seen;
Or more succinctly:
@uniq = keys %{{%foo,%bar}};
Or if you really want to save space:
%seen = (); while (defined ($key = each %foo)) { $seen{$key}++; } while (defined ($key = each %bar)) { $seen{$key}++; } @uniq = keys %seen;
use Tie::IxHash; tie(%myhash, Tie::IxHash); for ($i=0; $i<20; $i++) { $myhash{$i} = 2*$i; } @keys = keys %myhash; # @keys = (0,1,2,3,...)
somefunc($hash{"nonesuch key here"});
Then that element ``autovivifies''; that is, it springs into existence
whether you store something there or not. That's because functions get
scalars passed in by reference. If somefunc()
modifies $_[0]
, it has to be ready to write it back into the caller's version.
This has been fixed as of perl5.004.
Normally, merely accessing a key's value for a nonexistent key does not cause that key to be forever there. This is different than awk's behavior.
if (`cat /vmunix` =~ /gzip/) { print "Your kernel is GNU-zip enabled!\n"; }
On some systems, however, you have to play tedious games with ``text'' versus ``binary'' files. See binmode.
If you're concerned about 8-bit ASCII data, then see the perllocale manpage.
If you want to deal with multibyte characters, however, there are some gotchas. See the section on Regular Expressions.
warn "has nondigits" if /\D/; warn "not a whole number" unless /^\d+$/; warn "not an integer" unless /^-?\d+$/; # reject +3 warn "not an integer" unless /^[+-]?\d+$/; warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2 warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/; warn "not a C float" unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;
Or you could check out http://www.perl.com/CPAN/modules/by-module/String/String-Scanf-1.1.tar.gz instead. The
POSIX module (part of the standard Perl distribution) provides the
strtol
and strtod
for converting strings to double and longs, respectively.
use FreezeThaw qw(freeze thaw); $new = thaw freeze $old;
Where $old
can be (a reference to) any kind of data structure
you'd like. It will be deeply copied.