Metaphorical Dream

2009年 12月 13日 ( 1 )

2つ目のゴリゴリ Hadoop複数台構成

Hadoopの複数台構成による動作確認をしてみたw

サンプルPGのWordcountなんだけどね(^^;
Mapする際のファイル容量103MBのtxtファイルをPerlで生成した。

んで、Slave機を1台づつ3台まで増やしたときの時間を確認。

Slave1台目=1分23秒
Slave2台目=49秒
Slave3台目=1分2秒

3台目よりも2台目の方が早いのは、ご愛嬌か(笑)

ちなみに、Master&Slave機はいずれもVMWS2.0上の仮想OS(CentOS54x64)です。

なので、物理的な絶対リソース(CPU&メモリなど)は変わらず、
単純にSlave機の台数増にて処理速度が向上していると理解できる。

さすがというか?
話題通りというか?

今度は1GBのtxtファイルと5台くらいのSlave機でやってみたいかな。


[root@nm01 ~]# hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar wordcount inputs outputs
09/12/13 18:06:32 INFO input.FileInputFormat: Total input paths to process : 1
09/12/13 18:06:32 INFO mapred.JobClient: Running job: job_200912131758_0003
09/12/13 18:06:33 INFO mapred.JobClient: map 0% reduce 0%
09/12/13 18:06:51 INFO mapred.JobClient: map 5% reduce 0%
09/12/13 18:06:54 INFO mapred.JobClient: map 6% reduce 0%
09/12/13 18:06:57 INFO mapred.JobClient: map 12% reduce 0%
09/12/13 18:07:00 INFO mapred.JobClient: map 19% reduce 0%
09/12/13 18:07:03 INFO mapred.JobClient: map 25% reduce 0%
09/12/13 18:07:07 INFO mapred.JobClient: map 33% reduce 0%
09/12/13 18:07:10 INFO mapred.JobClient: map 38% reduce 0%
09/12/13 18:07:13 INFO mapred.JobClient: map 43% reduce 0%
09/12/13 18:07:16 INFO mapred.JobClient: map 48% reduce 0%
09/12/13 18:07:19 INFO mapred.JobClient: map 55% reduce 0%
09/12/13 18:07:22 INFO mapred.JobClient: map 60% reduce 0%
09/12/13 18:07:25 INFO mapred.JobClient: map 65% reduce 0%
09/12/13 18:07:28 INFO mapred.JobClient: map 73% reduce 0%
09/12/13 18:07:31 INFO mapred.JobClient: map 82% reduce 0%
09/12/13 18:07:34 INFO mapred.JobClient: map 90% reduce 0%
09/12/13 18:07:38 INFO mapred.JobClient: map 100% reduce 0%
09/12/13 18:07:53 INFO mapred.JobClient: map 100% reduce 100%
09/12/13 18:07:55 INFO mapred.JobClient: Job complete: job_200912131758_0003
09/12/13 18:07:55 INFO mapred.JobClient: Counters: 17
09/12/13 18:07:55 INFO mapred.JobClient: Job Counters
09/12/13 18:07:55 INFO mapred.JobClient: Launched reduce tasks=1
09/12/13 18:07:55 INFO mapred.JobClient: Launched map tasks=2
09/12/13 18:07:55 INFO mapred.JobClient: Data-local map tasks=2
09/12/13 18:07:55 INFO mapred.JobClient: FileSystemCounters
09/12/13 18:07:55 INFO mapred.JobClient: FILE_BYTES_READ=1738984
09/12/13 18:07:55 INFO mapred.JobClient: HDFS_BYTES_READ=106208077
09/12/13 18:07:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1799550
09/12/13 18:07:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=38346
09/12/13 18:07:55 INFO mapred.JobClient: Map-Reduce Framework
09/12/13 18:07:55 INFO mapred.JobClient: Reduce input groups=1815
09/12/13 18:07:55 INFO mapred.JobClient: Combine output records=45141
09/12/13 18:07:55 INFO mapred.JobClient: Map input records=5060900
09/12/13 18:07:55 INFO mapred.JobClient: Reduce shuffle bytes=60502
09/12/13 18:07:55 INFO mapred.JobClient: Reduce output records=1815
09/12/13 18:07:55 INFO mapred.JobClient: Spilled Records=78157
09/12/13 18:07:55 INFO mapred.JobClient: Map output bytes=155807080
09/12/13 18:07:55 INFO mapred.JobClient: Combine input records=13775130
09/12/13 18:07:55 INFO mapred.JobClient: Map output records=13732600
09/12/13 18:07:55 INFO mapred.JobClient: Reduce input records=2611
[root@nm01 ~]#


[root@nm01 ~]# hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar wordcount inputs outputs
09/12/13 19:16:15 INFO input.FileInputFormat: Total input paths to process : 1
09/12/13 19:16:15 INFO mapred.JobClient: Running job: job_200912131907_0002
09/12/13 19:16:16 INFO mapred.JobClient: map 0% reduce 0%
09/12/13 19:16:30 INFO mapred.JobClient: map 9% reduce 0%
09/12/13 19:16:31 INFO mapred.JobClient: map 22% reduce 0%
09/12/13 19:16:33 INFO mapred.JobClient: map 27% reduce 0%
09/12/13 19:16:34 INFO mapred.JobClient: map 37% reduce 0%
09/12/13 19:16:36 INFO mapred.JobClient: map 43% reduce 0%
09/12/13 19:16:37 INFO mapred.JobClient: map 54% reduce 0%
09/12/13 19:16:39 INFO mapred.JobClient: map 59% reduce 0%
09/12/13 19:16:40 INFO mapred.JobClient: map 70% reduce 0%
09/12/13 19:16:42 INFO mapred.JobClient: map 75% reduce 0%
09/12/13 19:16:43 INFO mapred.JobClient: map 82% reduce 0%
09/12/13 19:16:45 INFO mapred.JobClient: map 88% reduce 0%
09/12/13 19:16:47 INFO mapred.JobClient: map 95% reduce 0%
09/12/13 19:16:50 INFO mapred.JobClient: map 100% reduce 0%
09/12/13 19:16:52 INFO mapred.JobClient: map 100% reduce 16%
09/12/13 19:17:02 INFO mapred.JobClient: map 100% reduce 100%
09/12/13 19:17:04 INFO mapred.JobClient: Job complete: job_200912131907_0002
09/12/13 19:17:04 INFO mapred.JobClient: Counters: 17
09/12/13 19:17:04 INFO mapred.JobClient: Job Counters
09/12/13 19:17:04 INFO mapred.JobClient: Launched reduce tasks=1
09/12/13 19:17:04 INFO mapred.JobClient: Launched map tasks=2
09/12/13 19:17:04 INFO mapred.JobClient: Data-local map tasks=2
09/12/13 19:17:04 INFO mapred.JobClient: FileSystemCounters
09/12/13 19:17:04 INFO mapred.JobClient: FILE_BYTES_READ=1738984
09/12/13 19:17:04 INFO mapred.JobClient: HDFS_BYTES_READ=106208077
09/12/13 19:17:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1799550
09/12/13 19:17:04 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=38346
09/12/13 19:17:04 INFO mapred.JobClient: Map-Reduce Framework
09/12/13 19:17:04 INFO mapred.JobClient: Reduce input groups=1815
09/12/13 19:17:04 INFO mapred.JobClient: Combine output records=45141
09/12/13 19:17:04 INFO mapred.JobClient: Map input records=5060900
09/12/13 19:17:04 INFO mapred.JobClient: Reduce shuffle bytes=60502
09/12/13 19:17:04 INFO mapred.JobClient: Reduce output records=1815
09/12/13 19:17:04 INFO mapred.JobClient: Spilled Records=78157
09/12/13 19:17:04 INFO mapred.JobClient: Map output bytes=155807080
09/12/13 19:17:04 INFO mapred.JobClient: Combine input records=13775130
09/12/13 19:17:04 INFO mapred.JobClient: Map output records=13732600
09/12/13 19:17:04 INFO mapred.JobClient: Reduce input records=2611


[root@nm01 ~]# hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar wordcount inputs outputs
09/12/13 19:03:25 INFO input.FileInputFormat: Total input paths to process : 1
09/12/13 19:03:26 INFO mapred.JobClient: Running job: job_200912131901_0001
09/12/13 19:03:27 INFO mapred.JobClient: map 0% reduce 0%
09/12/13 19:03:46 INFO mapred.JobClient: map 6% reduce 0%
09/12/13 19:03:48 INFO mapred.JobClient: map 19% reduce 0%
09/12/13 19:03:49 INFO mapred.JobClient: map 24% reduce 0%
09/12/13 19:03:52 INFO mapred.JobClient: map 30% reduce 0%
09/12/13 19:03:54 INFO mapred.JobClient: map 40% reduce 0%
09/12/13 19:03:55 INFO mapred.JobClient: map 46% reduce 0%
09/12/13 19:03:57 INFO mapred.JobClient: map 56% reduce 0%
09/12/13 19:04:00 INFO mapred.JobClient: map 79% reduce 0%
09/12/13 19:04:03 INFO mapred.JobClient: map 84% reduce 0%
09/12/13 19:04:06 INFO mapred.JobClient: map 89% reduce 0%
09/12/13 19:04:09 INFO mapred.JobClient: map 95% reduce 16%
09/12/13 19:04:12 INFO mapred.JobClient: map 100% reduce 16%
09/12/13 19:04:25 INFO mapred.JobClient: map 100% reduce 100%
09/12/13 19:04:27 INFO mapred.JobClient: Job complete: job_200912131901_0001
09/12/13 19:04:27 INFO mapred.JobClient: Counters: 17
09/12/13 19:04:27 INFO mapred.JobClient: Job Counters
09/12/13 19:04:27 INFO mapred.JobClient: Launched reduce tasks=1
09/12/13 19:04:27 INFO mapred.JobClient: Launched map tasks=2
09/12/13 19:04:27 INFO mapred.JobClient: Data-local map tasks=2
09/12/13 19:04:27 INFO mapred.JobClient: FileSystemCounters
09/12/13 19:04:27 INFO mapred.JobClient: FILE_BYTES_READ=1738984
09/12/13 19:04:27 INFO mapred.JobClient: HDFS_BYTES_READ=106208077
09/12/13 19:04:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1799550
09/12/13 19:04:27 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=38346
09/12/13 19:04:27 INFO mapred.JobClient: Map-Reduce Framework
09/12/13 19:04:27 INFO mapred.JobClient: Reduce input groups=1815
09/12/13 19:04:27 INFO mapred.JobClient: Combine output records=45141
09/12/13 19:04:27 INFO mapred.JobClient: Map input records=5060900
09/12/13 19:04:27 INFO mapred.JobClient: Reduce shuffle bytes=60502
09/12/13 19:04:27 INFO mapred.JobClient: Reduce output records=1815
09/12/13 19:04:27 INFO mapred.JobClient: Spilled Records=78157
09/12/13 19:04:27 INFO mapred.JobClient: Map output bytes=155807080
09/12/13 19:04:27 INFO mapred.JobClient: Combine input records=13775130
09/12/13 19:04:27 INFO mapred.JobClient: Map output records=13732600
09/12/13 19:04:27 INFO mapred.JobClient: Reduce input records=2611

[PR]
by mdesign21 | 2009-12-13 19:28 | IT系