Monday, July 19, 2021

Many service error message: Audit Pipeline Test Bad

Issue:

Many Service is warn with Audit Pipeline Test Bad.

Caused:

Navigator Audit Server is not responding although is still health (green) in CM

Resolution:

- Try to restart Navigator Audit Server
- Check if database Nav Audit Server is up
- Check if disk space is available

Wednesday, July 14, 2021

Kafka client failed to connect to broker with ip address

Issue:

Kafka consumer failed to connect with following error:


$ kafka-console-consumer --consumer.config kafka.client.properties --bootstrap-server 192.168.1.1:9094 --topic test --from-beginning
[2021-07-14 08:14:26,331] ERROR [Consumer clientId=consumer-console-consumer-94417-1, groupId=console-consumer-94417] Connection to node -1 (/192.168.1.1:9094) failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient)
[2021-07-14 08:14:26,332] WARN [Consumer clientId=consumer-console-consumer-94417-1, groupId=console-consumer-94417] Bootstrap broker 192.168.1.1:9094 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
[2021-07-14 08:14:26,395] ERROR Error processing message, terminating consumer process:  (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
	at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1529)
	at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535)
	at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214)
	at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186)
	at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469)
	at org.apache.kafka.common.network.SslTransportLayer.handshakeWrap(SslTransportLayer.java:486)
	at org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:349)
	at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:299)
	at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:188)
	at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:551)
	at org.apache.kafka.common.network.Selector.poll(Selector.java:488)
	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:550)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:212)
	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:236)
	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:463)
	at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1275)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1241)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216)
	at kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:437)
	at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103)
	at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77)
	at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
	at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem
	at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
	at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1728)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:330)
	at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1614)
	at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
	at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1052)
	at sun.security.ssl.Handshaker$1.run(Handshaker.java:992)
	at sun.security.ssl.Handshaker$1.run(Handshaker.java:989)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1467)
	at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:438)
	at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:522)
	at org.apache.kafka.common.network.SslTransportLayer.doHandshake(SslTransportLayer.java:376)
	at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:299)
	at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:188)
	at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:551)
	at org.apache.kafka.common.network.Selector.poll(Selector.java:488)
	at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:550)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
	at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233)
	at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1308)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1248)
	... 6 more
Caused by: java.security.cert.CertificateException: No subject alternative names matching IP address 192.168.1.1 found
	at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168)
	at sun.security.util.HostnameChecker.match(HostnameChecker.java:94)
	at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455)
	at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:436)
	at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:252)
	at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
	at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1601)
	... 24 more
Processed a total of 0 messages


Caused:

Kafka broker doesn't have IP Address entry in SAN certificates


Resolution:

Connect with hostname/FQDN that match with cert CN or SAN entry

Monday, July 12, 2021

NiFi Too many open files

Issue:

NiFi failed with following error:



2021-07-12 16:19:37,366 ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index Provenance Events
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:681)
	at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:695)
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1281)
	at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1257)
	at org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70)
	at org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202)
	at org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.FileSystemException: /var/lib/nifi/provenance_repository/lucene-8-index-1625880478135/_p_Lucene50_0.tim: Too many open files
	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
	at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
	at java.nio.channels.FileChannel.open(FileChannel.java:287)
	at java.nio.channels.FileChannel.open(FileChannel.java:335)
	at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:100)
	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:100)
	at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:100)
	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:157)
	at org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.write(Lucene50CompoundFormat.java:89)
	at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:5010)
	at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:575)
	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:514)
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:554)
	at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:417)
	at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:470)
	at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1284)
	... 9 common frames omitted

Caused:

NiFi process open files is over from ulimit


Resolution:

For NiFi Standalone

Add this to nifi-env.sh:

export MAX_FD=999999
export MAX_FD_LIMIT=999999


For NiFi in CDP

Change rlimit_fds value in CM > NiFi > Configuration

Friday, July 9, 2021

Hive Invalid function aes_encrypt

Issue:

User unable to use aes_encrypt function with following error:


0: jdbc:hive2://node01.kiosafiy.com:1> SELECT base64(aes_encrypt('ABC', '1234567890123456'));
Error: Error while compiling statement: FAILED: SemanticException [Error 10011]: Invalid function aes_encrypt (state=42000,code=10011)

Caused:

According to [1] aes_encrypt function should be built-in in Hive 1.3.0+ but for some reason it not available by default in CDH6 Distro.

Resolution:

Add aes_encrypt UDF function using this repo [2]. Here is the result:


0: jdbc:hive2://node01.kiosafiy.com:1> select base64(aes_encrypt('ABC', '1234567890123456'));
INFO  : Compiling command(queryId=hive_20210709105720_7d3c119d-891a-4f2b-88ba-c2ae620f342d): select base64(aes_encrypt('ABC', '1234567890123456'))
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20210709105720_7d3c119d-891a-4f2b-88ba-c2ae620f342d); Time taken: 0.258 seconds
INFO  : Executing command(queryId=hive_20210709105720_7d3c119d-891a-4f2b-88ba-c2ae620f342d): select base64(aes_encrypt('ABC', '1234567890123456'))
INFO  : Completed executing command(queryId=hive_20210709105720_7d3c119d-891a-4f2b-88ba-c2ae620f342d); Time taken: 0.007 seconds
INFO  : OK
+---------------------------+
|            _c0            |
+---------------------------+
| y6Ss+zCYObpCbgfWfyNWTw==  |
+---------------------------+
1 row selected (0.326 seconds)

0: jdbc:hive2://node01.kiosafiy.com:1> select aes_decrypt(unbase64('y6Ss+zCYObpCbgfWfyNWTw=='), '1234567890123456');
INFO  : Compiling command(queryId=hive_20210709110540_4493545c-16a6-451e-8f78-11ae1c5e387b): select aes_decrypt(unbase64('y6Ss+zCYObpCbgfWfyNWTw=='), '1234567890123456')
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:binary, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20210709110540_4493545c-16a6-451e-8f78-11ae1c5e387b); Time taken: 0.235 seconds
INFO  : Executing command(queryId=hive_20210709110540_4493545c-16a6-451e-8f78-11ae1c5e387b): select aes_decrypt(unbase64('y6Ss+zCYObpCbgfWfyNWTw=='), '1234567890123456')
INFO  : Completed executing command(queryId=hive_20210709110540_4493545c-16a6-451e-8f78-11ae1c5e387b); Time taken: 0.01 seconds
INFO  : OK
+------+
| _c0  |
+------+
| ABC  |
+------+
1 row selected (0.36 seconds)

References:

Unable to login to OS with sssd auth AD

Issue:

User cannot login to machine configured with sssd auth to Active Directory. Following error is found when check sssd status



# systemctl status sssd -l
● sssd.service - System Security Services Daemon
   Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-06-24 17:29:25 WIB; 1 weeks 6 days ago
 Main PID: 32616 (sssd)
   CGroup: /system.slice/sssd.service
           ├─32616 /usr/sbin/sssd -i --logger=files
           ├─32617 /usr/libexec/sssd/sssd_be --domain kiosafiy.com --uid 0 --gid 0 --logger=files
           ├─32618 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
           └─32619 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files

Jul 08 16:40:45 node103.kiosafiy.com [sssd[ldap_child[46198]]][46198]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:40:46 node103.kiosafiy.com [sssd[ldap_child[46200]]][46200]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:44:45 node103.kiosafiy.com [sssd[ldap_child[46588]]][46588]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:46:18 node103.kiosafiy.com [sssd[ldap_child[47533]]][47533]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:46:18 node103.kiosafiy.com [sssd[ldap_child[47538]]][47538]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:47:47 node103.kiosafiy.com [sssd[ldap_child[47979]]][47979]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:47:48 node103.kiosafiy.com [sssd[ldap_child[47986]]][47986]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:49:06 node103.kiosafiy.com [sssd[ldap_child[48395]]][48395]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:49:10 node103.kiosafiy.com [sssd[ldap_child[48398]]][48398]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.
Jul 08 16:50:34 node103.kiosafiy.com [sssd[ldap_child[48736]]][48736]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: Preauthentication failed. Unable to create GSSAPI-encrypted LDAP connection.

Caused:

Probably because of editing /etc/krb5.conf after join domain. It make /etc/krb5.keytab file is invalid.

Resolution:

- Leave domain with command: realm leave
- Rejoin domain again

References:

- https://access.redhat.com/solutions/3380341

Thursday, July 8, 2021

NiFi instance disconnected

Issue:



2021-07-08 09:13:55,657 ERROR org.apache.nifi.controller.StandardFlowService: Failed to load flow from cluster due to: org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
org.apache.nifi.controller.UninheritableFlowException: Failed to connect node to cluster because local flow is different than cluster flow.
	at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1026)
	at org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:539)
	at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1043)
	at org.apache.nifi.NiFi.init(NiFi.java:158)
	at org.apache.nifi.NiFi.init(NiFi.java:72)
	at org.apache.nifi.NiFi.main(NiFi.java:301)
Caused by: org.apache.nifi.controller.UninheritableFlowException: Proposed configuration is not inheritable by the flow controller because of flow differences: Found difference in Flows:
Local Fingerprint:
			"type": ["null","string"]
		},
		{
			"name": "FILE_SIZE",
			"type": ["null","string"]
		}
	]
}bbb0a0dc-413c-36d1-8464-05c1dfc0d1e8NO_VALUENO_VALUENO_VERSION_CONTROL_INFORMATION0487043f-6db3-3707
Cluster Fingerprint:
			"type": ["null","string"]
		},
		{
			"name": "FILE_SIZE",
			"type": ["null","string"]
		}
	]
}9a1935a1-3e91-1a85-9189-b185f4419739NO_VALUENO_VALUENO_VERSION_CONTROL_INFORMATION03e4304b-2263-1e0c
	at org.apache.nifi.controller.StandardFlowSynchronizer.sync(StandardFlowSynchronizer.java:315)
	at org.apache.nifi.controller.FlowController.synchronize(FlowController.java:1408)
	at org.apache.nifi.persistence.StandardXMLFlowConfigurationDAO.load(StandardXMLFlowConfigurationDAO.java:88)
	at org.apache.nifi.controller.StandardFlowService.loadFromBytes(StandardFlowService.java:812)
	at org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1001)
	... 5 common frames omitted


Caused:

File "flow.xml.gz" is not sync with other nodes.

Resolution:

Copy file "flow.xml.gz" from other node then restart NiFi instance.

Monday, June 14, 2021

Impala query memory limit exceeded

Issue:

Query submitted at: 2021-06-09 11:34:33 (Coordinator: https://node01:25000)
Query progress can be monitored at: https://node01:25000/query_plan?query_id=a546f1f46aed278a:8a9dcd1a00000000
WARNINGS: Memory limit exceeded: Error occurred on backend node06.example.com:22000 by fragment a546f1f46aed278a:8a9dcd1a0000000a
Memory left in process limit: 28.25 GB
Memory left in query limit: -1.39 MB
Query(a546f1f46aed278a:8a9dcd1a00000000): memory limit exceeded. Limit=3.00 GB Reservation=2.21 GB ReservationLimit=2.40 GB OtherMemory=809.39 MB Total=3.00 GB Peak=3.00 GB
  Fragment a546f1f46aed278a:8a9dcd1a00000041: Reservation=2.21 GB OtherMemory=14.91 MB Total=2.23 GB Peak=2.23 GB
    Runtime Filter Bank: Reservation=2.00 MB ReservationLimit=2.00 MB OtherMemory=0 Total=2.00 MB Peak=2.00 MB
    HASH_JOIN_NODE (id=2): Reservation=2.21 GB OtherMemory=92.25 KB Total=2.21 GB Peak=2.21 GB
      Exprs: Total=38.12 KB Peak=38.12 KB
      Hash Join Builder (join_node_id=2): Total=38.12 KB Peak=38.12 KB
        Hash Join Builder (join_node_id=2) Exprs: Total=38.12 KB Peak=38.12 KB
    HDFS_SCAN_NODE (id=0): Total=4.00 KB Peak=4.00 KB
      Exprs: Total=4.00 KB Peak=4.00 KB
    EXCHANGE_NODE (id=3): Reservation=12.52 MB OtherMemory=2.26 MB Total=14.78 MB Peak=14.78 MB
      KrpcDeferredRpcs: Total=2.26 MB Peak=2.26 MB
    KrpcDataStreamSender (dst_id=4): Total=912.00 B Peak=912.00 B
    CodeGen: Total=15.28 KB Peak=1.84 MB
  Fragment a546f1f46aed278a:8a9dcd1a0000000a: Reservation=0 OtherMemory=794.49 MB Total=794.49 MB Peak=808.90 MB
    HDFS_SCAN_NODE (id=1): Total=786.14 MB Peak=800.75 MB
      Exprs: Total=4.00 KB Peak=4.00 KB
      Queued Batches: Total=736.12 MB Peak=744.33 MB
    KrpcDataStreamSender (dst_id=3): Total=35.34 KB Peak=35.34 KB
    CodeGen: Total=3.26 KB Peak=444.00 KB